3.1.4.1.7. Indexing rules and additional features

Highlighting search result

It is also called "Excerpt" (see Excerpt configuration in the Search Configuration section and in the Searching Repository).

The goal of this query is to find words "eXo" and "implementation" with fulltext search and high-light these words in the result value.

Basic info

High-lighting is not the default feature so you must set it in jcr-config.xml, also excerpt provider must be defined:


<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">

   <properties>

      ...

      <property name="support-highlighting" value="true" />

      <property name="excerptprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.WeightedHTMLExcerpt"/>

      ...

   <properties>

</query-handler>

Also, remember that you can make indexing rules as in the example below:

Write rules for all nodes with the 'nt:unstructed' primary node type where 'rule' property equals to the "excerpt" string. For those nodes, you will exclude the "title" property from high-lighting and set the "text" property as highlightable. Indexing-configuration.xml must contain the next rule:


<index-rule nodeType="nt:unstructured" condition="@rule='excerpt'">

   <property useInExcerpt="false">title</property>

   <property>text</property>

</index-rule>

Repository structure

You have a single node with the 'nt:unstructured' primary type.

document (nt:unstructured)
- rule = "excerpt"
- title = "eXoJCR"
- text = "eXo is a JCR implementation"

Query execution

SQL

// make SQL query

QueryManager queryManager = workspace.getQueryManager();

// create query

String sqlStatement = "SELECT rep:excerpt() FROM nt:unstructured WHERE CONTAINS(*, 'eXo implementation')";

Query query = queryManager.createQuery(sqlStatement, Query.SQL);

// execute query and fetch result

QueryResult result = query.execute();

XPath

// make XPath query

QueryManager queryManager = workspace.getQueryManager();

// create query

String xpathStatement = "//element(*,nt:unstructured)[jcr:contains(., 'eXo implementation')]/rep:excerpt(.)";

Query query = queryManager.createQuery(xpathStatement, Query.XPATH);

// execute query and fetch result

QueryResult result = query.execute();

Fetching result

Now, see on the result table:

String[] columnNames = result.getColumnNames();

RowIterator rit = result.getRows();

while (rit.hasNext())

{

   Row row = rit.nextRow();

   // get values of the row

   Value[] values = row.getValues();

}

Table content is

rep:excerpt()	jcr:path	jcr:score
<div><span><strong>eXo<strong>is JCR<strong>implementation<strong><span><div>	/testroot/node1	335

As you see, words "eXo" and "implementation" are highlighted.

Also, you can get exactly the "rep:excerpt" value:

RowIterator rows = result.getRows();

Value excerpt = rows.nextRow().getValue("rep:excerpt(.)");

// excerpt will be equal to "<div><span\><strong>eXo</strong> is a JCR <strong>implementation</strong></span></div>"

Indexing boost value

In this example, you will set different boost values for predefined nodes, and check effect by selecting those nodes and order them by jcr:score.

The default boost value is 1.0. Higher boost values (a reasonable range is 1.0 - 5.0) will yield a higher score value and appear as more relevant.

Note

See Search configuration.

Indexing configuration

In the indexing-config.xml, set boost values for nt:ustructured nodes 'text' property.


<!-- 

This rule actualy do nothing. 'text' property has default boost value.

-->

<index-rule nodeType="nt:unstructured" condition="@rule='boost1'">

   <!-- default boost: 1.0 -->

   <property>text</property>

</index-rule>



<!-- 

Set boost value as 2.0 for 'text' property in nt:unstructured nodes where property 'rule' equal to 'boost2'

-->

<index-rule nodeType="nt:unstructured" condition="@rule='boost2'">

   <!-- boost: 2.0 -->

   <property boost="2.0">text</property>

</index-rule>



<!-- 

Set boost value as 3.0 for 'text' property in nt:unstructured nodes where property 'rule' equal to 'boost3'

-->

<index-rule nodeType="nt:unstructured" condition="@rule='boost3'">

   <!-- boost: 3.0 -->

   <property boost="3.0">text</property>

</index-rule>

Repository structure

Repository contains many nodes with the "nt:unstructured" primary type. Each node contains the 'text' property and the 'rule' property with different values.

root
- node1(nt:unstructured) rule='boost1' text='The quick brown fox jump...'
- node2(nt:unstructured) rule='boost2' text='The quick brown fox jump...'
- node3(nt:unstructured) rule='boost3' text='The quick brown fox jump...'

Query execution

SQL

// make SQL query

QueryManager queryManager = workspace.getQueryManager();

// create query

String sqlStatement = "SELECT * FROM nt:unstructured WHERE CONTAINS(text, 'quick') ORDER BY jcr:score() DESC";

Query query = queryManager.createQuery(sqlStatement, Query.SQL);

// execute query and fetch result

QueryResult result = query.execute();

XPath

// make XPath query

QueryManager queryManager = workspace.getQueryManager();

// create query

String xpathStatement = "//element(*,nt:unstructured)[jcr:contains(@text, 'quick')] order by @jcr:score descending";

Query query = queryManager.createQuery(xpathStatement, Query.XPATH);

// execute query and fetch result

QueryResult result = query.execute();

Fetching result

Let's get nodes:

NodeIterator it = result.getNodes();


if(it.hasNext())

{

   Node findedNode = it.nextNode();

}

NodeIterator will return nodes in next order "node3", "node2", "node1".

Exclusion from node scope index

This example will exclude some 'text' property of the nt:unstructured node from indexing. Therefore, node will not be found by the content of this property, even if it accepts all constraints.

First of all, add rules to the indexing-configuration.xml file:


<index-rule nodeType="nt:unstructured" condition="@rule='nsiTrue'">

    <!-- default value for nodeScopeIndex is true -->

    <property>text</property>

</index-rule>



<index-rule nodeType="nt:unstructured" condition="@rule='nsiFalse'">

    <!-- do not include text in node scope index -->

    <property nodeScopeIndex="false">text</property>

</index-rule>

Note

See Search configuration.

Repository structure

Repository contains the "nt:unstructured" nodes with the same 'text' property and different 'rule' properties (even null).

root
- node1 (nt:unstructured) rule="nsiTrue" text="The quick brown fox ..."
- node2 (nt:unstructured) rule="nsiFalse" text="The quick brown fox ..."
- node3 (nt:unstructured) text="The quick brown fox ..." // as you see this node not mentioned in indexing-coniguration

Query execution

SQL

// make SQL query

QueryManager queryManager = workspace.getQueryManager();

// create query

String sqlStatement = "SELECT * FROM nt:unstructured WHERE CONTAINS(*,'quick')";

Query query = queryManager.createQuery(sqlStatement, Query.SQL);

// execute query and fetch result

QueryResult result = query.execute();

XPath

// make XPath query

QueryManager queryManager = workspace.getQueryManager();

// create query

String xpathStatement = "//element(*,nt:unstructured)[jcr:contains(., 'quick')]";

Query query = queryManager.createQuery(xpathStatement, Query.XPATH);

// execute query and fetch result

QueryResult result = query.execute();

Fetching result

Get nodes:

NodeIterator it = result.getNodes();


if(it.hasNext())

{

   Node findedNode = it.nextNode();

}

NodeIterator will return "node1" and "node3". Node2, as you see, is not in result set.

Also, you can get a table:

String[] columnNames = result.getColumnNames();

RowIterator rit = result.getRows();

while (rit.hasNext())

{

   Row row = rit.nextRow();

   // get values of the row

   Value[] values = row.getValues();

}

Table content is:

jcr:primarytype	jcr:path	jcr:score
nt:unstructured	/node1	3806
nt:unstructured	/node3	3806

Regular expressions as property name in indexing rule

This example explains how to configure indexing in the next way. All properties of nt:unstructured nodes must be excluded from search, except properties whoes names end with the 'Text' string. First of all, add rules to the indexing-configuration.xml file:


<index-rule nodeType="nt:unstructured"">

   <property isRegexp="true">.*Text</property>

</index-rule>

Note

See Search Configuration.

Now, check this rule with a simple query by selecting all nodes with the 'nt:unstructured' primary type and with the 'quick' string (fulltext search by full node).

Repository structure

Repository contains the "nt:unstructured" nodes with different 'text'-like named properties.

root
- node1 (nt:unstructured) Text="The quick brown fox ..."
- node2 (nt:unstructured) OtherText="The quick brown fox ..."
- node3 (nt:unstructured) Textle="The quick brown fox ..."

Query execution

SQL

// make SQL query

QueryManager queryManager = workspace.getQueryManager();

// create query

String sqlStatement = "SELECT * FROM nt:unstructured WHERE CONTAINS(*,'quick')";

Query query = queryManager.createQuery(sqlStatement, Query.SQL);

// execute query and fetch result

QueryResult result = query.execute();

XPath

// make XPath query

QueryManager queryManager = workspace.getQueryManager();

// create query

String xpathStatement = "//element(*,nt:unstructured)[jcr:contains(., 'quick')]";

Query query = queryManager.createQuery(xpathStatement, Query.XPATH);

// execute query and fetch result

QueryResult result = query.execute();

Fetching result

Get nodes:

NodeIterator it = result.getNodes();


if(it.hasNext())

{

   Node findedNode = it.nextNode();

}

NodeIterator will return "node1" and "node2". "node3", as you see, is not in result set.

Also, you can get a table:

String[] columnNames = result.getColumnNames();

RowIterator rit = result.getRows();

while (rit.hasNext())

{

   Row row = rit.nextRow();

   // get values of the row

   Value[] values = row.getValues();

}

Table content is:

jcr:primarytype	jcr:path	jcr:score
nt:unstructured	/node1	3806
nt:unstructured	/node2	3806

Synonym provider

Find all mix:title nodes where title contains synonyms to 'fast' word.

Note

See also about the synonym provider configuration in Searching for repository content.

The synonym provider must be configured in the indexing-configuration.xml file:


<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">

   <properties>

      ...

      <property name="synonymprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.PropertiesSynonymProvider" />

      <property name="synonymprovider-config-path" value="../../synonyms.properties" />

      ...

   </properties>

</query-handler>

The synonym.properties file contains the next synonyms list:

ASF=Apache Software Foundation
quick=fast
sluggish=lazy

Repository structure

Repository contains mix:title nodes, where jcr:title has different values.

root
- document1 (mix:title) jcr:title="The quick brown fox jumps over the lazy dog."

Query execution

SQL

// make SQL query

QueryManager queryManager = workspace.getQueryManager();

// create query

String sqlStatement = "SELECT * FROM mix:title WHERE CONTAINS(jcr:title, '~fast')";

Query query = queryManager.createQuery(sqlStatement, Query.SQL);

// execute query and fetch result

QueryResult result = query.execute();

XPath

// make XPath query

QueryManager queryManager = workspace.getQueryManager();

// create query

String xpathStatement = "//element(*,mix:title)[jcr:contains(@jcr:title, '~fast')]";

Query query = queryManager.createQuery(xpathStatement, Query.XPATH);

// execute query and fetch result

QueryResult result = query.execute();

Fetching result

Get nodes:

NodeIterator it = result.getNodes();


if(it.hasNext())

{

   Node findedNode = it.nextNode();

}

NodeIterator will return expected document1. This is a purpose of synonym providers. Find by a specified word, but return by all synonyms.

Checking spell

Check the correct spelling of phrase 'quik OR (-foo bar)' according to data already stored in index.

Note

See also SpellChecker configuration in Searching for repository content.

SpellChecker must be settled in query-handler config.

See the test-jcr-config.xml file as below:


<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">

   <properties>

      ...

   <property name="spellchecker-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker$FiveSecondsRefreshInterval" />

      ...

   </properties>

</query-handler>

Repository structure

Repository contains node with the "The quick brown fox jumps over the lazy dog" string property.

root
- node1 property="The quick brown fox jumps over the lazy dog."

Query execution

Query looks for the root node only, because spell checker looks for suggestions by full index. So complicated query is redundant.

SQL

// make SQL query

QueryManager queryManager = workspace.getQueryManager();

// create query

String sqlStatement = "SELECT rep:spellcheck() FROM nt:base WHERE jcr:path = '/' AND SPELLCHECK('quik OR (-foo bar)')";

Query query = queryManager.createQuery(sqlStatement, Query.SQL);

// execute query and fetch result

QueryResult result = query.execute();

XPath

// make XPath query

QueryManager queryManager = workspace.getQueryManager();

// create query

String xpathStatement = "/jcr:root[rep:spellcheck('quik OR (-foo bar)')]/(rep:spellcheck())";

Query query = queryManager.createQuery(xpathStatement, Query.XPATH);

// execute query and fetch result

QueryResult result = query.execute();

Fetching result

Get suggestion of the correct spelling as follows:

RowIterator it = result.getRows();

Row r = rows.nextRow();

Value v = r.getValue("rep:spellcheck()");

String correctPhrase = v.getString();

So, correct spelling for phrase "quik OR (-foo bar)" is "quick OR (-fox bar)".

Finding similar nodes

Find similar nodes to node by the '/baseFile/jcr:content' path.

In this example, the baseFile node will contain text where "terms" word happens many times. That is a reason why the existence of this word will be used as a criteria of node similarity (for the baseFile node).

Note

See also similarity and configuration in Searching for repository content.

Highlighting support must be added to the test-jcr-config.xml configuration file:

<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">

   <properties>

      ...

      <property name="support-highlighting" value="true" />

      ...

   </properties>

</query-handler>

Repository structure

Repository contains many "nt:file" nodes:

root
- baseFile (nt:file)
  - jcr:content (nt:resource) jcr:data="Similarity" is determined by looking up terms that are common to nodes. There are some conditions that must be met for a term to be considered. This is required to limit the number possibly relevant terms.
    Only terms with at least 4 characters are considered.
    Only terms that occur at least 2 times in the source node are considered.
    Only terms that occur in at least 5 nodes are considered."
- target1 (nt:file)
  - jcr:content (nt:resource) jcr:data="Similarity is determined by looking up terms that are common to nodes."
- target2 (nt:file)
  - jcr:content (nt:resource) jcr:data="There is no you know what"
- target3 (nt:file)
  - jcr:content (nt:resource) jcr:data=" Terms occur here"

Query execution

SQL

// make SQL query

QueryManager queryManager = workspace.getQueryManager();

// create query

String sqlStatement = "SELECT * FROM nt:resource WHERE SIMILAR(.,'/baseFile/jcr:content')";

Query query = queryManager.createQuery(sqlStatement, Query.SQL);

// execute query and fetch result

QueryResult result = query.execute();

XPath

// make XPath query

QueryManager queryManager = workspace.getQueryManager();

// create query

String xpathStatement = "//element(*, nt:resource)[rep:similar(., '/testroot/baseFile/jcr:content')]";

Query query = queryManager.createQuery(xpathStatement, Query.XPATH);

// execute query and fetch result

QueryResult result = query.execute();

Fetching result

Let's get nodes:

NodeIterator it = result.getNodes();


if(it.hasNext())

{

   Node findedNode = it.nextNode();

}

NodeIterator will return "/baseFile/jcr:content","/target1/jcr:content" and "/target3/jcr:content".

As you see the base node is also in the result set.

You can also get a table:

String[] columnNames = result.getColumnNames();

RowIterator rit = result.getRows();

while (rit.hasNext())

{

   Row row = rit.nextRow();

   // get values of the row

   Value[] values = row.getValues();

}

The table content is:

jcr:path	...	jcr:score
/baseFile/jcr:content	...	2674
/target1/jcr:content	...	2674
/target3/jcr:content	...	2674