2.2.3.1.4. RSync index

Shared index is consistent and stable enough, but slow, while local index is fast, but requires much time for re-synchronization when cluster node is leaving a cluster for a small period of time. RSync-based index solves this problem along with local file system advantages in term of speed.

This strategy is the same shared index, but stores actual data on local file system, instead of shared. Eventually triggering a synchronization job, that woks on the level of file blocks, synchronizing only modified data. Diagram shows it in action. Only single node in the cluster is responsible for modifying index files, this is the Coordinator node. When data persisted, corresponding command fired, starting synchronization jobs all over the cluster.

System requirements

Mandatory requirement for Rsync-based indexing strategy is an installed and properly configured RSync utility. It must be accessible by calling "rsync" without defining its full path. In addition, each cluster node should have a running RSync Server supporting the "rsync://" protocol. For more details, refer to RSync and operation system documentations. Sample RSync Server configuration will be shown below. There are some additional limitations also. Path for index for each workspace must be the same across the cluster, for example, "/var/data/index/<repository-name>/<workspace-name>". Next limitation is RSync Server configuration. It must share some of index's parent folders. For example, "/var/data/index". In other words, index is stored inside of RSync Server shared folder. Configuration details are give below.

Configuration

Configuration has much in common with shared index, it just requires some additional parameters for RSync options. If they are present, JCR switches from shared to RSync-based index. Here is an example configuration:


<query-handler
      class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
      <properties>
      <property name="index-dir" value="/var/data/index/repository1/production" />
      <property name="changesfilter-class"
      value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" />
      <property name="jbosscache-configuration" value="jar:/conf/portal/cluster/jbosscache-indexer.xml" />
      <property name="jgroups-configuration" value="jar:/conf/portal/cluster/udp-mux.xml" />
      <property name="jgroups-multiplexer-stack" value="false" />
      <property name="jbosscache-cluster-name" value="JCR-cluster-indexer" />
      <property name="jbosscache-shareable" value="true" />
      <property name="max-volatile-time" value="60" />
      <property name="rsync-entry-name" value="index" />
      <property name="rsync-entry-path" value="/var/data/index" />
      <property name="rsync-port" value="8085" />
      <property name="rsync-user" value="rsyncexo" />
      <property name="rsync-password" value="exo" />
      </properties>
      </query-handler>
    

Let's start with authentication: "rsync-user" and "rsync-password". They are optional and can be skipped if RSync Server is configured to accept anonymous identity. Before reviewing other RSync index, options need to have a look at RSync Server configuration. Sample RSync Server (rsyncd) Configuration is as follows:

uid = nobody
      gid = nobody
      use chroot = no
      port = 8085
      log file = rsyncd.log
      pid file = rsyncd.pid
      [index]
      path = /var/data/index
      comment = indexes
      read only = true
      auth users = rsyncexo
      secrets file= rsyncd.secrets
    

This sample configuration shares the "/var/data/index" folder as an "index" entry. Those parameters should match corresponding "rsync-entry-name", "rsync-entry-path", "rsync-port" properties in JCR configuration.

Note

Make sure "index-dir" is a descendant folder of RSync shared folder and those paths are the same on each cluster node.

Copyright ©. All rights reserved. eXo Platform SAS
blog comments powered byDisqus