Whenever relational database is used to store multilingual text data of eXo Java Content Repository, it is necessary to adapt configuration in order to support UTF-8 encoding. Here is a short instruction for several supported RDBMS with examples.
Modify the repository-configuration.xml
file which can be found in various locations.
The jdbcjcr
datasource used in examples can be
configured via the InitialContextInitializer
component.
In order to run multilanguage JCR on an Oracle backend Unicode
encoding for characters set should be applied to the database. Other
Oracle globalization parameters do not make any impact. The only property
to modify is NLS_CHARACTERSET
.
We have tested NLS_CHARACTERSET
=
AL32UTF8
and it works well for many European and
Asian languages.
Example of the database configuration:
NLS_LANGUAGE AMERICAN NLS_TERRITORY AMERICA NLS_CURRENCY $ NLS_ISO_CURRENCY AMERICA NLS_NUMERIC_CHARACTERS ., NLS_CHARACTERSET AL32UTF8 NLS_CALENDAR GREGORIAN NLS_DATE_FORMAT DD-MON-RR NLS_DATE_LANGUAGE AMERICAN NLS_SORT BINARY NLS_TIME_FORMAT HH.MI.SSXFF AM NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR NLS_DUAL_CURRENCY $ NLS_COMP BINARY NLS_LENGTH_SEMANTICS BYTE NLS_NCHAR_CONV_EXCP FALSE NLS_NCHAR_CHARACTERSET AL16UTF16
JCR does not use the NVARCHAR columns so that the value of the
NLS_NCHAR_CHARACTERSET
parameter does not matter for JCR.
Create database with Unicode encoding and use Oracle dialect for the Workspace Container:
<workspace name="collaboration">
<container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer">
<properties>
<property name="source-name" value="jdbcjcr" />
<property name="dialect" value="oracle" />
<property name="multi-db" value="false" />
<property name="max-buffer-size" value="200k" />
<property name="swap-directory" value="target/temp/swap/ws" />
</properties>
.....
DB2 Universal Database (DB2 UDB) supports UTF-8 and UTF-16/UCS-2. When a Unicode database is created, CHAR, VARCHAR, LONG VARCHAR data are stored in UTF-8 form. It is enough for JCR multi-lingual support.
Example of UTF-8 database creation:
DB2 CREATE DATABASE dbname USING CODESET UTF-8 TERRITORY US
Create database with UTF-8 encoding and use db2 dialect for Workspace Container on DB2 v.9 and higher:
<workspace name="collaboration">
<container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer">
<properties>
<property name="source-name" value="jdbcjcr" />
<property name="dialect" value="db2" />
<property name="multi-db" value="false" />
<property name="max-buffer-size" value="200k" />
<property name="swap-directory" value="target/temp/swap/ws" />
</properties>
.....
For DB2 v.8.x support change the property "dialect" to db2v8.
JCR MySQL-backend requires special dialect MySQL-UTF8 to be used for internationalization support. But the database default charset should be latin1 to use limited index space effectively (1000 bytes for MyISAM engine, 767 for InnoDB). If database default charset is multibyte, a JCR database initialization error is thrown concerning index creation failure. In other words, JCR can work on any singlebyte default charset of database, with UTF8 supported by MySQL server. But we have tested it only on latin1 database default charset.
Repository configuration, workspace container entry example:
<workspace name="collaboration">
<container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer">
<properties>
<property name="source-name" value="jdbcjcr" />
<property name="dialect" value="mysql-utf8" />
<property name="multi-db" value="false" />
<property name="max-buffer-size" value="200k" />
<property name="swap-directory" value="target/temp/swap/ws" />
</properties>
.....
You will also need to indicate the charset name either at the server level using the
--character-set-server
server parameter
(See more details
here) or at the datasource configuration level by adding a new property as below:
<property name="connectionProperties"
value="useUnicode=yes;characterEncoding=utf8;characterSetResults=UTF-8;" />
On PostgreSQL/PostgrePlus-backend, multilingual support can be enabled in different ways:
Using the locale features of the operating system to provide locale-specific collation order, number formatting, translated messages, and other aspects. UTF-8 is widely used on Linux distributions by default, so it can be useful in such case.
Providing a number of different character sets defined in the PostgreSQL/PostgrePlus server, including multiple-byte character sets, to support storing text of any languages, and providing character set translation between client and server. It is recommended that you use the UTF-8 database charset, it will allow any-to-any conversations and make this issue transparent for the JCR.
Create database with UTF-8 encoding and use a PgSQL dialect for Workspace Container:
<workspace name="collaboration">
<container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer">
<properties>
<property name="source-name" valBut some of our customersue="jdbcjcr" />
<property name="dialect" value="pgsql" />
<property name="multi-db" value="false" />
<property name="max-buffer-size" value="200k" />
<property name="swap-directory" value="target/temp/swap/ws" />
</properties>
.....