3.1.4.1.6. Fulltext search
Fulltext search by property

Find all nodes containing a 'mix:title' mixin type and whose 'jcr:description' contains "forest" string.

Repository structure

The repository is filled with nodes of the 'mix:title' mixin type and different values of the 'jcr:description' property.

  • root

    • document1 (mix:title) jcr:description = "The quick brown fox jumps over the lazy dog."

    • document2 (mix:title) jcr:description = "The brown fox lives in a forest." // This is the node we want to find

    • document3 (mix:title) jcr:description = "The fox is a nice animal."

    • document4 (nt:unstructured) jcr:description = "There is the word forest, too."

Query execution

  • SQL

    // make SQL query
    
    QueryManager queryManager = workspace.getQueryManager();
    // we want find document which contains "forest" word
    String sqlStatement = "SELECT \* FROM mix:title WHERE CONTAINS(jcr:description, 'forest')";
    // create query
    Query query = queryManager.createQuery(sqlStatement, Query.SQL);
    // execute query and fetch result
    QueryResult result = query.execute();
  • XPath

    // make SQL query
    
    QueryManager queryManager = workspace.getQueryManager();
    // we want find document which contains "forest" word
    String xpathStatement = "//element(*,mix:title)[jcr:contains(@jcr:description, 'forest')]";
    // create query
    Query query = queryManager.createQuery(xpathStatement, Query.XPATH);
    // execute query and fetch result
    QueryResult result = query.execute();

Fetching result

Let's get nodes:

NodeIterator it = result.getNodes();


if(it.hasNext())
{
   Node findedNode = it.nextNode();
}

NodeIterator will return "document2".

You can also get a table:

String[] columnNames = result.getColumnNames();

RowIterator rit = result.getRows();
while (rit.hasNext())
{
   Row row = rit.nextRow();
   // get values of the row
   Value[] values = row.getValues();
}

Table content is:

jcr:description...jcr:path
The brown fox lives in forest..../document2

Fulltext search by all properties

Find nodes with the 'mix:title' mixin type where any property contains the 'break' string.

Repository structure

Repository filled with different nodes with the 'mix:title' mixin type and different values of 'jcr:title' and 'jcr:description' properties.

  • root

    • document1 (mix:title) jcr:title ='Star Wars' jcr:description = 'Dart rules!!'

    • document2 (mix:title) jcr:title ='Prison break' jcr:description = 'Run, Forest, run ))'

    • document3 (mix:title) jcr:title ='Titanic' jcr:description = 'An iceberg breaks a ship.'

Query execution

  • SQL

    // make SQL query
    
    QueryManager queryManager = workspace.getQueryManager();
    String sqlStatement = "SELECT * FROM mix:title WHERE CONTAINS(*,'break')";
    // create query
    Query query = queryManager.createQuery(sqlStatement, Query.SQL);
    // execute query and fetch result
    QueryResult result = query.execute();
  • XPath

    // make SQL query
    
    QueryManager queryManager = workspace.getQueryManager();
    // we want find 'document1'
    String xpathStatement = "//element(*,mix:title)[jcr:contains(.,'break')]";
    // create query
    Query query = queryManager.createQuery(xpathStatement, Query.XPATH);
    // execute query and fetch result
    QueryResult result = query.execute();

Fetching result

Let's get nodes:

NodeIterator it = result.getNodes();


while(it.hasNext())
{
   Node findedNode = it.nextNode();
}

NodeIterator will return "document1" and "document2".

You can also get a table:

String[] columnNames = result.getColumnNames();

RowIterator rit = result.getRows();
while (rit.hasNext())
{
   Row row = rit.nextRow();
   // get values of the row
   Value[] values = row.getValues();
}

Table content is:

jcr:titlejcr:description...jcr:path
Prison break.Run, Forest, run )).../document2
TitanicAn iceberg breaks a ship..../document3

Finding nt:file document by content of child jcr:content node

The nt:file node type represents a file. It requires a single child node called jcr:content. This node type represents images and other binary content in a JCRWiki entry. The node type of jcr:content is nt:resource which represents the actual content of a file.

Find node with the primary type is 'nt:file' and which whose 'jcr:content' child node contains "cats".

Normally, you cannot find nodes (in this case) using just JCR SQL or XPath queries. But you can configure indexing so that nt:file aggregates jcr:content child node.

So, change indexing-configuration.xml:


<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.2.dtd">
<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
               xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
    <aggregate primaryType="nt:file">
        <include>jcr:content</include>
        <include>jcr:content/*</include>
        <include-property>jcr:content/jcr:lastModified</include-property>
    </aggregate>
</configuration>

Now the content of 'nt:file' and 'jcr:content' ('nt:resource') nodes are concatenated in a single Lucene document. Then, you can make a fulltext search query by content of 'nt:file'. This search includes the content of 'jcr:content' child node.

Repository structure

Repository contains different nt:file nodes.

  • root

    • document1 (nt:file)

      • jcr:content (nt:resource) jcr:data = "The quick brown fox jumps over the lazy dog."

    • document2 (nt:file)

      • jcr:content (nt:resource) jcr:data = "Dogs do not like cats."

    • document3 (nt:file)

      • jcr:content (nt:resource) jcr:data = "Cats jumping high."

Query execution

  • SQL

    // make SQL query
    
    QueryManager queryManager = workspace.getQueryManager();
    // create query
    String sqlStatement = "SELECT * FROM nt:file WHERE CONTAINS(*,'cats')";
    Query query = queryManager.createQuery(sqlStatement, Query.SQL);
    // execute query and fetch result
    QueryResult result = query.execute();
  • XPath

    // make XPath query
    
    QueryManager queryManager = workspace.getQueryManager();
    // create query
    String xpathStatement = "//element(*,nt:file)[jcr:contains(.,'cats')]";
    Query query = queryManager.createQuery(xpathStatement, Query.XPATH);
    // execute query and fetch result
    QueryResult result = query.execute();

Fetching result

Let's get nodes:

NodeIterator it = result.getNodes();


if(it.hasNext())
{
   Node findedNode = it.nextNode();
}

NodeIterator will return "document2" and "document3".

You can also get a table:

String[] columnNames = result.getColumnNames();

RowIterator rit = result.getRows();
while (rit.hasNext())
{
   Row row = rit.nextRow();
   // get values of the row
   Value[] values = row.getValues();
}

Table content is:

jcr:pathjcr:score
/document21030
/document31030

Setting new analyzer and ignoring accent symbols

In this example, you will create a new Analyzer, set it in the QueryHandler configuration, and make query to check it.

Standard analyzer does not normalize accents like é,è,à; therefore, a word like 'tréma' will be stored to index as 'tréma'. In case you want to normalize such symbols and want to store 'tréma' word as 'trema', you can do it.

There are two ways of setting up new Analyzer:

  • The first way: Create a descendant class of SearchIndex with a new Analyzer (see Search configuration);

There is only one way to create a new Analyzer (if there is no previously created and accepted for your needs) and set it in Search index.

  • The second way: Register a new Analyzer in the QueryHandler configuration;

You will use the last one:

  1. Create a new MyAnalyzer.

    public class MyAnalyzer extends Analyzer
    
    {
       @Override
       public TokenStream tokenStream(String fieldName, Reader reader)
       {
          StandardTokenizer tokenStream = new StandardTokenizer(reader);
          // process all text with standard filter
          // removes 's (as 's in "Peter's") from the end of words and removes dots from acronyms.
          TokenStream result = new StandardFilter(tokenStream);
          // this filter normalizes token text to lower case
          result = new LowerCaseFilter(result);
          // this one replaces accented characters in the ISO Latin 1 character set (ISO-8859-1) by their unaccented equivalents
          result = new ISOLatin1AccentFilter(result);
          // and finally return token stream
          return result;
       }
    }
  2. Register the new MyAnalyzer in the configuration.

    
    <workspace name="ws">
       ...
       <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
          <properties>
             <property name="analyzer" value="org.exoplatform.services.jcr.impl.core.MyAnalyzer"/>
             ...
          </properties>
       </query-handler>
       ...
    </workspace>
  3. Check it with query:

    Find nodes with the 'mix:title' mixin type where 'jcr:title' contains the "tréma" and "naïve" strings.

Repository structure

Repository filled by nodes with the 'mix:title' mixin type and different values of the 'jcr:title' property.

  • root

    • node1 (mix:title) jcr:title = "tréma blabla naïve"

    • node2 (mix:title) jcr:description = "trema come text naive"

Query execution

  • SQL

    // make SQL query
    
    QueryManager queryManager = workspace.getQueryManager();
    // create query
    String sqlStatement = "SELECT * FROM mix:title WHERE CONTAINS(jcr:title, 'tr\u00E8ma na\u00EFve')";
    Query query = queryManager.createQuery(sqlStatement, Query.SQL);
    // execute query and fetch result
    QueryResult result = query.execute();
  • XPath

    // make SQL query
    
    QueryManager queryManager = workspace.getQueryManager();
    // create query
    String xpathStatement = "//element(*,mix:title)[jcr:contains(@jcr:title, 'tr\u00E8ma na\u00EFve')]";
    Query query = queryManager.createQuery(xpathStatement, Query.XPATH);
    // execute query and fetch result
    QueryResult result = query.execute();

Fetching result

Let's get nodes:

NodeIterator it = result.getNodes();


if(it.hasNext())
{
   Node findedNode = it.nextNode();
}

NodeIterator will return "node1" and "node2". How is it possible? Remember that the MyAnalyzer transforms 'tréma' word to 'trema', so node2 accepts the constraints too.

Also, you can get a table:

String[] columnNames = result.getColumnNames();

RowIterator rit = result.getRows();
while (rit.hasNext())
{
   Row row = rit.nextRow();
   // get values of the row
   Value[] values = row.getValues();
}

Table content is:

cr:title...cr:path
trèma blabla naïve.../node1
trema come text naive.../node2

Copyright ©. All rights reserved. eXo Platform SAS
blog comments powered byDisqus