3.1.4.1.6. Fulltext search

Fulltext search by property

Find all nodes containing a 'mix:title' mixin type and whose 'jcr:description' contains "forest" string.

Repository structure

The repository is filled with nodes of the 'mix:title' mixin type and different values of the 'jcr:description' property.

root
- document1 (mix:title) jcr:description = "The quick brown fox jumps over the lazy dog."
- document2 (mix:title) jcr:description = "The brown fox lives in a forest." // This is the node we want to find
- document3 (mix:title) jcr:description = "The fox is a nice animal."
- document4 (nt:unstructured) jcr:description = "There is the word forest, too."

Query execution

SQL

// make SQL query

QueryManager queryManager = workspace.getQueryManager();

// we want find document which contains "forest" word

String sqlStatement = "SELECT \* FROM mix:title WHERE CONTAINS(jcr:description, 'forest')";

// create query

Query query = queryManager.createQuery(sqlStatement, Query.SQL);

// execute query and fetch result

QueryResult result = query.execute();

XPath

// make SQL query

QueryManager queryManager = workspace.getQueryManager();

// we want find document which contains "forest" word

String xpathStatement = "//element(*,mix:title)[jcr:contains(@jcr:description, 'forest')]";

// create query

Query query = queryManager.createQuery(xpathStatement, Query.XPATH);

// execute query and fetch result

QueryResult result = query.execute();

Fetching result

Let's get nodes:

NodeIterator it = result.getNodes();


if(it.hasNext())

{

   Node findedNode = it.nextNode();

}

NodeIterator will return "document2".

You can also get a table:

String[] columnNames = result.getColumnNames();

RowIterator rit = result.getRows();

while (rit.hasNext())

{

   Row row = rit.nextRow();

   // get values of the row

   Value[] values = row.getValues();

}

Table content is:

jcr:description	...	jcr:path
The brown fox lives in forest.	...	/document2

Fulltext search by all properties

Find nodes with the 'mix:title' mixin type where any property contains the 'break' string.

Repository structure

Repository filled with different nodes with the 'mix:title' mixin type and different values of 'jcr:title' and 'jcr:description' properties.

root
- document1 (mix:title) jcr:title ='Star Wars' jcr:description = 'Dart rules!!'
- document2 (mix:title) jcr:title ='Prison break' jcr:description = 'Run, Forest, run ))'
- document3 (mix:title) jcr:title ='Titanic' jcr:description = 'An iceberg breaks a ship.'

Query execution

SQL

// make SQL query

QueryManager queryManager = workspace.getQueryManager();

String sqlStatement = "SELECT * FROM mix:title WHERE CONTAINS(*,'break')";

// create query

Query query = queryManager.createQuery(sqlStatement, Query.SQL);

// execute query and fetch result

QueryResult result = query.execute();

XPath

// make SQL query

QueryManager queryManager = workspace.getQueryManager();

// we want find 'document1'

String xpathStatement = "//element(*,mix:title)[jcr:contains(.,'break')]";

// create query

Query query = queryManager.createQuery(xpathStatement, Query.XPATH);

// execute query and fetch result

QueryResult result = query.execute();

Fetching result

Let's get nodes:

NodeIterator it = result.getNodes();


while(it.hasNext())

{

   Node findedNode = it.nextNode();

}

NodeIterator will return "document1" and "document2".

You can also get a table:

String[] columnNames = result.getColumnNames();

RowIterator rit = result.getRows();

while (rit.hasNext())

{

   Row row = rit.nextRow();

   // get values of the row

   Value[] values = row.getValues();

}

Table content is:

jcr:title	jcr:description	...	jcr:path
Prison break.	Run, Forest, run ))	...	/document2
Titanic	An iceberg breaks a ship.	...	/document3

Finding nt:file document by content of child jcr:content node

The nt:file node type represents a file. It requires a single child node called jcr:content. This node type represents images and other binary content in a JCRWiki entry. The node type of jcr:content is nt:resource which represents the actual content of a file.

Find node with the primary type is 'nt:file' and which whose 'jcr:content' child node contains "cats".

Normally, you cannot find nodes (in this case) using just JCR SQL or XPath queries. But you can configure indexing so that nt:file aggregates jcr:content child node.

So, change indexing-configuration.xml:


<?xml version="1.0"?>

<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.2.dtd">

<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"

               xmlns:nt="http://www.jcp.org/jcr/nt/1.0">

    <aggregate primaryType="nt:file">

        <include>jcr:content</include>

        <include>jcr:content/*</include>

        <include-property>jcr:content/jcr:lastModified</include-property>

    </aggregate>

</configuration>

Now the content of 'nt:file' and 'jcr:content' ('nt:resource') nodes are concatenated in a single Lucene document. Then, you can make a fulltext search query by content of 'nt:file'. This search includes the content of 'jcr:content' child node.

Repository structure

Repository contains different nt:file nodes.

root
- document1 (nt:file)
  - jcr:content (nt:resource) jcr:data = "The quick brown fox jumps over the lazy dog."
- document2 (nt:file)
  - jcr:content (nt:resource) jcr:data = "Dogs do not like cats."
- document3 (nt:file)
  - jcr:content (nt:resource) jcr:data = "Cats jumping high."

Query execution

SQL

// make SQL query

QueryManager queryManager = workspace.getQueryManager();

// create query

String sqlStatement = "SELECT * FROM nt:file WHERE CONTAINS(*,'cats')";

Query query = queryManager.createQuery(sqlStatement, Query.SQL);

// execute query and fetch result

QueryResult result = query.execute();

XPath

// make XPath query

QueryManager queryManager = workspace.getQueryManager();

// create query

String xpathStatement = "//element(*,nt:file)[jcr:contains(.,'cats')]";

Query query = queryManager.createQuery(xpathStatement, Query.XPATH);

// execute query and fetch result

QueryResult result = query.execute();

Fetching result

Let's get nodes:

NodeIterator it = result.getNodes();


if(it.hasNext())

{

   Node findedNode = it.nextNode();

}

NodeIterator will return "document2" and "document3".

You can also get a table:

String[] columnNames = result.getColumnNames();

RowIterator rit = result.getRows();

while (rit.hasNext())

{

   Row row = rit.nextRow();

   // get values of the row

   Value[] values = row.getValues();

}

Table content is:

jcr:path	jcr:score
/document2	1030
/document3	1030

Setting new analyzer and ignoring accent symbols

In this example, you will create a new Analyzer, set it in the QueryHandler configuration, and make query to check it.

Standard analyzer does not normalize accents like é,è,à; therefore, a word like 'tréma' will be stored to index as 'tréma'. In case you want to normalize such symbols and want to store 'tréma' word as 'trema', you can do it.

There are two ways of setting up new Analyzer:

The first way: Create a descendant class of SearchIndex with a new Analyzer (see Search configuration);

There is only one way to create a new Analyzer (if there is no previously created and accepted for your needs) and set it in Search index.

The second way: Register a new Analyzer in the QueryHandler configuration;

You will use the last one:

Create a new MyAnalyzer.

public class MyAnalyzer extends Analyzer

{

   @Override

   public TokenStream tokenStream(String fieldName, Reader reader)

   {

      StandardTokenizer tokenStream = new StandardTokenizer(reader);

      // process all text with standard filter

      // removes 's (as 's in "Peter's") from the end of words and removes dots from acronyms.

      TokenStream result = new StandardFilter(tokenStream);

      // this filter normalizes token text to lower case

      result = new LowerCaseFilter(result);

      // this one replaces accented characters in the ISO Latin 1 character set (ISO-8859-1) by their unaccented equivalents

      result = new ISOLatin1AccentFilter(result);

      // and finally return token stream

      return result;

   }

}


<workspace name="ws">

   ...

   <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">

      <properties>

         <property name="analyzer" value="org.exoplatform.services.jcr.impl.core.MyAnalyzer"/>

         ...

      </properties>

   </query-handler>

   ...

</workspace>

Check it with query:
Find nodes with the 'mix:title' mixin type where 'jcr:title' contains the "tréma" and "naïve" strings.

Repository structure

Repository filled by nodes with the 'mix:title' mixin type and different values of the 'jcr:title' property.

root
- node1 (mix:title) jcr:title = "tréma blabla naïve"
- node2 (mix:title) jcr:description = "trema come text naive"

Query execution

SQL

// make SQL query

QueryManager queryManager = workspace.getQueryManager();

// create query

String sqlStatement = "SELECT * FROM mix:title WHERE CONTAINS(jcr:title, 'tr\u00E8ma na\u00EFve')";

Query query = queryManager.createQuery(sqlStatement, Query.SQL);

// execute query and fetch result

QueryResult result = query.execute();

XPath

// make SQL query

QueryManager queryManager = workspace.getQueryManager();

// create query

String xpathStatement = "//element(*,mix:title)[jcr:contains(@jcr:title, 'tr\u00E8ma na\u00EFve')]";

Query query = queryManager.createQuery(xpathStatement, Query.XPATH);

// execute query and fetch result

QueryResult result = query.execute();

Fetching result

Let's get nodes:

NodeIterator it = result.getNodes();


if(it.hasNext())

{

   Node findedNode = it.nextNode();

}

NodeIterator will return "node1" and "node2". How is it possible? Remember that the MyAnalyzer transforms 'tréma' word to 'trema', so node2 accepts the constraints too.

Also, you can get a table:

String[] columnNames = result.getColumnNames();

RowIterator rit = result.getRows();

while (rit.hasNext())

{

   Row row = rit.nextRow();

   // get values of the row

   Value[] values = row.getValues();

}

Table content is:

cr:title	...	cr:path
trèma blabla naïve	...	/node1
trema come text naive	...	/node2