Managing a big set of data using JCR in production environment sometimes requires special operations with Indexes stored on File System. One of those maintenance operations is a recreation of it or "re-indexing". There are various usecases when re-indexing is important to do. They include hardware faults, hard restarts, data-corruption, migrations and JCR updates that brings new features related to index. Usually, index re-creation requested on server's startup or in runtime.
First of all, you can not launch Hot re-indexing via JMX if index is already in offline mode. It means that index is currently invoked in some operations, like re-indexing at startup, copying in cluster to another node or whatever. Another important notice is Hot Asynchronous Reindexing via JMX and "on startup" re-indexing are completely different features. So you can not get the state of startup re-indexing using the getHotReindexingState command in JMX interface, but there are some common JMX operations:
getIOMode: return the current index IO mode (READ_ONLY / READ_WRITE), belongs to clustered configuration states.
getState: return the current state (ONLINE / OFFLINE).
Common usecase for updating and re-creating the index is to stop the server and manually remove indexes for workspaces requiring it. When the server is started, missing indexes are automatically recovered by re-indexing.
JCR Supports direct RDBMS re-indexing, that is usually
faster than ordinary and can be configured via the
rdbms-reindexing
QueryHandler parameter
set to "true" (Refer to
the
Query-handler configuration overview
for more information).
Another new feature is the asynchronous indexing on startup.
Usually the startup is blocked until
the process is finished. Block can take any period of time, depending on
amount of data persisted in repositories. However, this can be resolved by
using an asynchronous approach of startup indexation. In brief,
it performs all operations with index in background, without blocking
the repository. This is controlled by the value of "async-reindexing"
parameter in QueryHandler configuration. With asynchronous indexation
active, JCR starts with no active indexes present. Queries on JCR still
can be executed without exceptions
but no results will be returned until the index creation has been completed. Checking index state is possible via
QueryManagerImpl
:
boolean online = ((QueryManagerImpl)Workspace.getQueryManager()).getQueryHandeler().isOnline();
"OFFLINE" state means that index is currently re-creating. When the state has been changed, the corresponding log event is printed. From the start of background task, index is switched to "OFFLINE" with the following log event:
[INFO] Setting index OFFLINE (repository/production[system]).
When the process has been finished, two events are logged:
[INFO] Created initial index for 143018 nodes (repository/production[system]). [INFO] Setting index ONLINE (repository/production[system]).
Those two log lines indicate the end of process for workspace
given in brackets. Calling
isOnline()
as mentioned above will also
return true.
Hot asynchronous workspace reindexing via JMX
Some hard system faults, error during upgrades, migration issues and some other factors may corrupt the index. Most likely end customers would like the production systems to fix index issues in run-time without delays and restarts. The current version of JCR supports "Hot Asynchronous Workspace Reindexing" feature. It allows end-user (Service Administrator) to launch the process in background without stopping or blocking the whole application by using any JMX-compatible console (see the "JConsole in action" screenshot below).
The server can continue working as expected while index is
re-created. This depends on the flag "allow queries", passed via JMX
interface to re-index operation invocation. If the flag is set, the
application continues working. However, there is one critical limitation that the
end-users must be aware. If the index is frozen while background task is
running, it means queries are performed on index present on the
moment of task startup and data written into repository after startup
will not be available through the search until the process finished. Data added
during re-indexation is also indexed, but will be available only when
task is done. Briefly, JCR makes the snapshot of indexes on asynch task
startup and uses it for searches. When the operation is finished, the stale indexes
are replaced with the new creation, including newly added data. If the "allow
queries
" flag is set to "false", all queries will throw an exception while
the task is running. The current state can be acquired using the following JMX
operation:
getHotReindexingState(): return information about latest invocation: start time, if in progress or finish time if done.