2.2.1.2.3. Property-level analyzers

Example

In this configuration section, you will define how a property has to be analyzed. If there is an analyzer configuration for a property, this analyzer is used for indexing and searching of this property. For example:


<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <analyzers> 
        <analyzer class="org.apache.lucene.analysis.KeywordAnalyzer">
            <property>mytext</property>
        </analyzer>
        <analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer">
            <property>mytext2</property>
        </analyzer>
  </analyzers> 
</configuration>

The configuration above means that the property "mytext" for the entire workspace is indexed (and searched) with the Lucene KeywordAnalyzer, and property "mytext2" with the WhitespaceAnalyzer. Using different analyzers for different languages is particularly useful.

The WhitespaceAnalyzer tokenizes a property, the KeywordAnalyzer takes the property as a whole.

Characteristics of node scope searches

When using analyzers, you may encounter an unexpected behavior when searching within a property compared to searching within a node scope. The reason is that the node scope always uses the global analyzer.

Let's suppose that the "mytext" property contains the "testing my analyzers" text and that you have not configured any analyzers for the "mytext" property (and not changed the default analyzer in SearchIndex).

For example, if your query is as follows:

xpath = "//*[jcr:contains(mytext,'analyzer')]"

This xpath does not return a hit in the node with the property above and default analyzers.

Also a search on the node scope

xpath = "//*[jcr:contains(.,'analyzer')]"

will not give a hit. Realize that you can only set specific analyzers on a node property, and that the node scope indexing/analyzing is always done with the globally defined analyzer in the SearchIndex element.

Now, if you change the analyzer used to index the "mytext" property above to


<analyzer class="org.apache.lucene.analysis.Analyzer.GermanAnalyzer">
    <property>mytext</property>
</analyzer>

and you do the same search again, then for

xpath = "//*[jcr:contains(mytext,'analyzer')]"

you would get a hit because of the word stemming (analyzers - analyzer).

The other search,

xpath = "//*[jcr:contains(.,'analyzer')]"

still would not give a result, since the node scope is indexed with the global analyzer, which in this case does not take into account any word stemming.

In conclusion, be aware that when using analyzers for specific properties, you might find a hit in a property for some search text, and you do not find a hit with the same search text in the node scope of the property.

Note

Both index rules and index aggregates influence how content is indexed in JCR. If you change the configuration, the existing content is not automatically re-indexed according to the new rules. You, therefore, have to manually re-index the content when you change the configuration.

Copyright ©. All rights reserved. eXo Platform SAS
blog comments powered byDisqus