Shared index is consistent and stable enough, but slow, while local index is fast, but requires much time for re-synchronization when cluster node is leaving a cluster for a small period of time. RSync-based index solves this problem along with local file system advantages in term of speed.
This strategy is the same shared index, but stores actual data on local file system, instead of shared. Eventually triggering a synchronization job, that woks on the level of file blocks, synchronizing only modified data. Diagram shows it in action. Only single node in the cluster is responsible for modifying index files, this is the Coordinator node. When data persisted, corresponding command fired, starting synchronization jobs all over the cluster.
Mandatory requirement for Rsync-based indexing strategy is an
installed and properly configured RSync utility. It must be
accessible by calling "rsync
" without defining its full path. In
addition, each cluster node should have a running RSync Server
supporting the "rsync://
" protocol. For more details, refer to
RSync and operation system documentations. Sample RSync Server
configuration will be shown below. There are some additional
limitations also. Path for index for each workspace must be the same
across the cluster, for example,
"/var/data/index/<repository-name>/<workspace-name>
".
Next limitation is RSync Server configuration. It must share some of
index's parent folders. For example, "/var/data/index
". In other
words, index is stored inside of RSync Server shared folder.
Configuration details are give below.
Configuration has much in common with shared index, it just requires some additional parameters for RSync options. If they are present, JCR switches from shared to RSync-based index. Here is an example configuration:
<query-handler
class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
<properties>
<property name="index-dir" value="/var/data/index/repository1/production" />
<property name="changesfilter-class"
value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" />
<property name="jbosscache-configuration" value="jar:/conf/portal/cluster/jbosscache-indexer.xml" />
<property name="jgroups-configuration" value="jar:/conf/portal/cluster/udp-mux.xml" />
<property name="jgroups-multiplexer-stack" value="false" />
<property name="jbosscache-cluster-name" value="JCR-cluster-indexer" />
<property name="jbosscache-shareable" value="true" />
<property name="max-volatile-time" value="60" />
<property name="rsync-entry-name" value="index" />
<property name="rsync-entry-path" value="/var/data/index" />
<property name="rsync-port" value="8085" />
<property name="rsync-user" value="rsyncexo" />
<property name="rsync-password" value="exo" />
</properties>
</query-handler>
Let's start with authentication: "rsync-user
" and
"rsync-password
". They are optional and can be skipped if RSync
Server is configured to accept anonymous identity. Before reviewing
other RSync index, options need to have a look at RSync Server
configuration. Sample RSync Server (rsyncd) Configuration is as follows:
uid = nobody gid = nobody use chroot = no port = 8085 log file = rsyncd.log pid file = rsyncd.pid [index] path = /var/data/index comment = indexes read only = true auth users = rsyncexo secrets file= rsyncd.secrets
This sample configuration shares the "/var/data/index
" folder as
an "index
" entry. Those parameters should match corresponding
"rsync-entry-name
",
"rsync-entry-path
", "rsync-port
" properties in JCR configuration.
Make sure
"index-dir
" is a descendant folder of RSync shared folder and those
paths are the same on each cluster node.