14.2. JCR asynchronous re-indexing

Warning

You are looking at documentation for an older release. Not what you want? See the current release documentation.

Indexing on start-up

The easiest way to trigger a JCR re-indexing at start-up is to stop the server and manually remove the indexes that need to be recreated. When the server starts, the missing indexes will be detected and the necessary re-indexing operations will begin.

JCR supports direct RDBMS re-indexing. This is usually faster than ordinary re-indexing and can be configured via the rdbms-reindexing QueryHandler parameter set to "true" (Refer to the Query-handler configuration overview for more information).

The start-up is usually blocked until the indexing process finishes. Block time depends on the amount of persisted data in the repositories. You can resolve this issue by using an asynchronous approach to start-up indexation which involves on performing all operations on indexes in the background without blocking the repository. This approach is controlled by the value of the async-reindexing parameter in QueryHandler configuration. Setting async-reindexing to "true" activates asynchronous indexation and makes JCR start without active indexes. But you can still execute queries on JCR without exceptions and check the index status via QueryManagerImpl:

boolean online =
      ((QueryManagerImpl)Workspace.getQueryManager()).getQueryHandeler().isOnline();
    

An "OFFLINE" state means that the index is currently recreating. When the state has been changed, the corresponding log event is printed. From the start of the background task, the index is switched to "OFFLINE" with the following log event:

[INFO] Setting index OFFLINE (repository/production[system]).

When the process has been finished, two events are logged:

[INFO] Created initial index for 143018 nodes (repository/production[system]).
      [INFO] Setting index ONLINE (repository/production[system]).
    

These two log lines indicate the end of process for the workspace given in brackets. Calling isOnline() as mentioned above will also return true.

Hot asynchronous workspace re-indexing via JMX

Note

First of all, you can not launch hot re-indexing via JMX if the index is already in offline mode. This means that the index is currently invoked in some operations, like re-indexing at start-up, copying in cluster to another node or something else. It is also important to note that hot asynchronous re-indexing via JMX and "on start-up" re-indexing are completely different features. You can not perform start-up re-indexing using the getHotReindexingState command in the JMX interface. However there are some common JMX operations:

  • getIOMode: return the current index IO mode (READ_ONLY / READ_WRITE), belongs to clustered configuration states.

  • getState: return the current state (ONLINE / OFFLINE).

Some hard system faults, errors during upgrades, migration issues and some other factors may corrupt the index. End customers would most likely want the production systems to fix index issues during runtime without delays and restarts. The current version of JCR supports the "Hot Asynchronous Workspace Reindexing" feature. It allows administrators to launch the process in background without stopping or blocking the whole application by using any JMX-compatible console.(See the "JConsole in action" screenshot below).

The server can still work as expected while the index is being recreated. This depends on the flag "allow queries", which is passed via the JMX interface to invoke the re-indexing operation. If the flag is set to "true", the application is still working. However, there is one critical limitation that you must be aware of. If the index is frozen while the background task is running, queries are performed on the index present at the moment of task start-up and data written into the repository after start-up will not be available through the search until the process finishes. Data added during re-indexation is also indexed, but will be available only when the task is done. To resume, JCR takes the "snapshot" of indexes on the asynchronous task start-up and uses it for searches. When the operation finishes, the stale indexes are replaced with the new ones, including the newly added data. If the allow queries" flag is set to "false", all queries will throw out an exception while the task is running. The current state can be acquired using the following JMX operation:

Copyright ©. All rights reserved. eXo Platform SAS
blog comments powered byDisqus