2.2.3.2. Asynchronous re-indexing

Managing a big set of data using JCR in production environment sometimes requires special operations with Indexes stored on File System. One of those maintenance operations is a recreation of it or "re-indexing". There are various usecases when re-indexing is important to do. They include hardware faults, hard restarts, data-corruption, migrations and JCR updates that brings new features related to index. Usually, index re-creation requested on server's startup or in runtime.

Note

First of all, you can not launch Hot re-indexing via JMX if index is already in offline mode. It means that index is currently invoked in some operations, like re-indexing at startup, copying in cluster to another node or whatever. Another important notice is Hot Asynchronous Reindexing via JMX and "on startup" re-indexing are completely different features. So you can not get the state of startup re-indexing using the getHotReindexingState command in JMX interface, but there are some common JMX operations:

  • getIOMode: return the current index IO mode (READ_ONLY / READ_WRITE), belongs to clustered configuration states.

  • getState: return the current state (ONLINE / OFFLINE).

On startup indexing

Common usecase for updating and re-creating the index is to stop the server and manually remove indexes for workspaces requiring it. When the server is started, missing indexes are automatically recovered by re-indexing.

JCR Supports direct RDBMS re-indexing, that is usually faster than ordinary and can be configured via the rdbms-reindexing QueryHandler parameter set to "true" (Refer to the Query-handler configuration overview for more information).

Another new feature is the asynchronous indexing on startup. Usually the startup is blocked until the process is finished. Block can take any period of time, depending on amount of data persisted in repositories. However, this can be resolved by using an asynchronous approach of startup indexation. In brief, it performs all operations with index in background, without blocking the repository. This is controlled by the value of "async-reindexing" parameter in QueryHandler configuration. With asynchronous indexation active, JCR starts with no active indexes present. Queries on JCR still can be executed without exceptions but no results will be returned until the index creation has been completed. Checking index state is possible via QueryManagerImpl:

boolean online =
      ((QueryManagerImpl)Workspace.getQueryManager()).getQueryHandeler().isOnline();
    

"OFFLINE" state means that index is currently re-creating. When the state has been changed, the corresponding log event is printed. From the start of background task, index is switched to "OFFLINE" with the following log event:

[INFO] Setting index OFFLINE (repository/production[system]).

When the process has been finished, two events are logged:

[INFO] Created initial index for 143018 nodes (repository/production[system]).
      [INFO] Setting index ONLINE (repository/production[system]).
    

Those two log lines indicate the end of process for workspace given in brackets. Calling isOnline() as mentioned above will also return true.

Hot asynchronous workspace reindexing via JMX

Some hard system faults, error during upgrades, migration issues and some other factors may corrupt the index. Most likely end customers would like the production systems to fix index issues in run-time without delays and restarts. The current version of JCR supports "Hot Asynchronous Workspace Reindexing" feature. It allows end-user (Service Administrator) to launch the process in background without stopping or blocking the whole application by using any JMX-compatible console (see the "JConsole in action" screenshot below).

The server can continue working as expected while index is re-created. This depends on the flag "allow queries", passed via JMX interface to re-index operation invocation. If the flag is set, the application continues working. However, there is one critical limitation that the end-users must be aware. If the index is frozen while background task is running, it means queries are performed on index present on the moment of task startup and data written into repository after startup will not be available through the search until the process finished. Data added during re-indexation is also indexed, but will be available only when task is done. Briefly, JCR makes the snapshot of indexes on asynch task startup and uses it for searches. When the operation is finished, the stale indexes are replaced with the new creation, including newly added data. If the "allow queries" flag is set to "false", all queries will throw an exception while the task is running. The current state can be acquired using the following JMX operation:

Copyright ©. All rights reserved. eXo Platform SAS
blog comments powered byDisqus