4.3.2. CMIS search and index

The CMIS standard defines a query language based on a subset of the SQL-92 grammar (ISO/IEC 9075: 1992 -- Database Language SQL), with a few extensions to enhance its filtering capability for the CMIS data model, such as existential quantification for multi-valued property, full-text search, and folder membership.

Warning

CMIS search is disabled by default in eXo CMIS. Uncomment the indexDir parameter if you need the query support in CMIS. To discover the search capability, check the table below.

CMIS Relational View

The relational view of a CMIS repository consists of a collection of virtual tables that are defined on the top of the CMIS data model. A virtual table exists for every queryable object type (content type if you prefer) in the repository. Each row in these virtual tables corresponds to an instance of the corresponding object type (or one of its subtypes). A column exists for every property that the object type has.

Query Capabilities

CapabilityValue
capabilityQuerybothcombined (if indexDir is configured; otherwise none)
capabilityJoinnone
capabilityPWCSearchablefalse
capabilityAllVersionsSearchablefalse

Configuration

To be able to provide full-text search capabilities, xCMIS uses its own index. The following is the configuration parameter:

ParameterDefaultDescription
indexDirnoneThe location of the index directory. This parameter is mandatory for the default implementation.

For example, to set up the index directory:


<component>
    <type>org.exoplatform.ecms.xcmis.sp.DriveCmisRegistry</type>
    <init-params>
        <value-param>
        <name>indexDir</name>
        <value>${gatein.jcr.index.data.dir}/cmis-index${container.name.suffix}</value>
        </value-param>
        ...
    </init-params>
</component>

Indexing atomicity and durability

To be able to provide index consistency and recovery in case of unexpected crashes or damages, XCMIS uses write-ahead logging (WAL) technique. Write-ahead logging is a standard approach to transaction logging. Briefly, WAL's centre concept is "changes of data files (indexes)" that must be written only after those changes have been logged, that is, when the change log records have been flushed to permanent storage. If you follow this procedure, you do not need to flush data pages to disk on every transaction commit, because it is known in the event of a crash, and the index can be recovered by using the log: any changes that have not been applied to the data pages can be redone from the log records. (This is roll-forward recovery, also known as REDO.)

A major benefit of using WAL is a significantly reduced number of disk writes, because only the log file needs to be flushed to disk at the time of transaction commit, rather than every data file changed by the transaction.

When you start Indexer, it will check uncommitted transaction logs. If at least one log exists, recovering process will be started. Indexer will read all logs and extract added, updated and removed UUIDs into a set. Then, indexer walks through this set and checks objects against UUID. If the object exists, the indexer will put it into the added document list. In other cases, UUID will be added to the removed documents list. After that, depending on the list of added and removed documents, changes will be applied to the index.

When you run the indexer to check the number of documents in the index. If there are no documents in the index or the previous re-indexation was not successful, then re-indexation of all content will be started. The first step is cleaning old index data. Uncommitted transaction logs and old persistent data are removed. These data are useless, because re-indexation of all content will be started. Then, the indexer walks through all objects and makes Lucene document for each one. Then batches with less than 100 elements will be saved to the index. After re-indexation, all logs (WAL) are removed, and all data mentioned on these change logs are already indexed.

Note

If you, as an administrator, get an exception with the message "Can't remove reindex flag.", it means that the index restoring was finished but file-flag was not removed (see index directory, file named as "reindexProcessing"). You can manually remove this file-flag, and avoid a new reindex of repository on the JCR start.

Copyright ©. All rights reserved. eXo Platform SAS
blog comments powered byDisqus