1 Reply Latest reply on Dec 30, 2008 11:02 AM by marx3

    How to use other Analyzer?

    fvalente

      I'm using Hibernate Search. I can index and search properly. Now I'm trying to use the BrazilianAnalyzer, but it's not working.


      Here is My code:


      @AutoCreate
      @Name("searchController")
      @Scope(ScopeType.CONVERSATION)
      public class SearchController
      {
          @In
          private FullTextEntityManager entityManager;
      
          public void search(SearchData searchData)
          {
              String[] searchFields = {"fullName", "description", "summary", "title",
                       "experience", "nickname", "qualifications", "keywords"}; 
              
              MultiFieldQueryParser parser = new MultiFieldQueryParser(searchFields, new BrazilianAnalyzer());
              parser.setAllowLeadingWildcard(true);
              
              try
              {
               String[] alist = searchPattern.split(" ");
                  String queryString = "";
               
               for (int i = 0; i < alist.length; i++)
               {
                   queryString += (String) alist[i] + "~0.8";
               }
      
               luceneQuery = parser.parse(queryString);
           }
              catch (ParseException e)
              {
                  luceneQuery = null;
              }
          }
          
          public Pagination<StructuredDocument> getStructuredDocumentSearchResults()
          {
              if (luceneQuery != null)
              {
                  FullTextQuery query = entityManager.createFullTextQuery(luceneQuery, StructuredDocument.class);
                       
                  query.enableFullTextFilter("structuredDocumentPublicStatus").setParameter("statusList", (new StructuredDocument()).getPublicStatusList());
                  
                  return new Pagination<StructuredDocument>(this.filterByCommunity(query.getResultList(), identity.getCurrentCommunity()), getPageSize());
              }
              
              return new YouKnowPagination<StructuredDocument>();
          }
      



      The search works perfectly, but it's not using the BrazilianAnalayzer. It uses the StandardAnalyzer instead. I know this because the brazilian stop words from the BrasilianAnalayser are not working. For instance, if i search de, that is one brazilian stop word, I get several results, but if I search the I get no results.


      The only way I found to use BrazilianAnalayzer was to set the hibernate.search.analyzer in the pesistence.xml with org.apache.lucene.analysis.br.BrazilianAnalyzer.


      The problem is that i need to use a different Analyzer if the user uses a different language to view the site e.g. if the person is viewing in English I want to use StandardAnalyzer insted of BrazilianAnalyzer.


      Does anyone has an idea of how to progamatically change the analyser?

        • 1. Re: How to use other Analyzer?
          marx3
          Stopwords are removed from text at indexing level, so trying to change it in searching is inconsistent.
          There are two problems:
          1)how to recognize in what language indexed text is written (to choose proper analyzer)
          2)how to store indexes in different languages

          The way I do similair such thing is to have as many "indexes" as number of languages you plan to use. So every indexed field is indexed few times under fields with different name. For example:

          |
          @Fields( {
          @Field(name = "ALL_en", analyzer = @Analyzer(impl = StandardAnalyzer.class)),
          @Field(name = "ALL_de", analyzer = @Analyzer(impl = DeutchStandardAnalyzer.class))})
          |

          Then I allow user to choose language he want to search in. For example to search only in english i search only field ALL_en and ignore ALL_de. Of course, during searching I use analyzer for choosen language too.

          And one note: I index all fields under the same name ALL_xx what allow me to easily implement search through all database, which is almost impossible if you index every field under different name.