2.2.1.1.2. Fulltext search

Property content indexing

Each property of a node (if it is indexable) is processed with Lucene analyzer and stored in Lucene index. That is called indexing of a property. After that, you can perform a fulltext search among these indexed properties.

Lucene analyzers

The sense of analyzers is to transform all strings stored in the index in a well-defined condition. The same analyzer(s) is/are used when searching in order to adapt the query string to the index reality.

Therefore, performing the same query using different analyzers can return different results.

Now, let's see how the same string is transformed by different analyzers.

Analyzer	Parsed
org.apache.lucene.analysis.WhitespaceAnalyzer	[The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs]
org.apache.lucene.analysis.SimpleAnalyzer	[the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs]
org.apache.lucene.analysis.StopAnalyzer	[quick] [brown] [fox] [jumped] [over] [lazy] [dogs]
org.apache.lucene.analysis.standard.StandardAnalyzer	[quick] [brown] [fox] [jumped] [over] [lazy] [dogs]
org.apache.lucene.analysis.snowball.SnowballAnalyzer	[quick] [brown] [fox] [jump] [over] [lazi] [dog]
org.apache.lucene.analysis.standard.StandardAnalyzer (configured without stop word - JCR default analyzer)	[the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs]

Analyzer	Parsed
org.apache.lucene.analysis.WhitespaceAnalyzer	[XY&Z] [Corporation] [-] [xyz@example.com]
org.apache.lucene.analysis.SimpleAnalyzer	[xy] [z] [corporation] [xyz] [example] [com]
org.apache.lucene.analysis.StopAnalyzer	[xy] [z] [corporation] [xyz] [example] [com]
org.apache.lucene.analysis.standard.StandardAnalyzer	[xy&z] [corporation] [xyz@example] [com]
org.apache.lucene.analysis.snowball.SnowballAnalyzer	[xy&z] [corpor] [xyz@exampl] [com]
org.apache.lucene.analysis.standard.StandardAnalyzer (configured without stop word - jcr default analyzer)	[xy&z] [corporation] [xyz@example] [com]

Note

StandardAnalyzer is the default analyzer in JCR search engine but it does not use stop words.

You can assign your analyzer as described in Search Configuration.

How are different properties indexed?

Different properties are indexed in different ways that defines if it can be searched like fulltext by property or not.

Only two property types are indexed as fulltext searcheable: STRING and BINARY.

Property Type	Fulltext search by all properties	Fulltext search by exact property
STRING	YES	YES
BINARY	YES	NO

For example, you have the jcr:data property (it is BINARY). It is stored well, but you will never find any string with query like:

SELECT * FROM nt:resource WHERE CONTAINS(jcr:data, 'some string')

BINARY is not searchable by fulltext search on the exact property, but the next query will return result if the node has searched data.

SELECT * FROM nt:resource WHERE CONTAINS( * , 'some string')

Fulltext search query examples

Different analyzers in action

First of all, fill repository by nodes with mixin type 'mix:title' and different values of jcr:description property.

root
- document1 (mix:title) jcr:description = "The quick brown fox jumped over the lazy dogs."
- document2 (mix:title) jcr:description = "Brown fox live in forest."
- document3 (mix:title) jcr:description = "Fox is a nice animal."

Let's see analyzers effect closer. In the first case, the base JCR settings is used, so as mentioned above, the string "The quick brown fox jumped over the lazy dogs" will be transformed to set {[the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] }

// make SQL query

QueryManager queryManager = workspace.getQueryManager();

String sqlStatement = "SELECT * FROM mix:title WHERE CONTAINS(jcr:description, 'the')";

// create query

Query query = queryManager.createQuery(sqlStatement, Query.SQL);

// execute query and fetch result

QueryResult result = query.execute();

NodeIterator will return "document1".

Now change the default analyzer to org.apache.lucene.analysis.StopAnalyzer. Fill the repository (new Analyzer must process nodes properties) and run the same query again. It will return nothing, because stop words like "the" will be excluded from parsed string set.