YaCy and Solr Cloud

This is an advanced Solr for YaCy installation which uses the SolrCloud architecture. If you want to read and understand this, you should be (at least a little bit) familiar with debian, Solr and tomcat.

In this example, we install a shard of 4 Solr instances within the same server.

Software Installation

We install tomcat, zookeeper and YaCy as standard debian packages and Solr as web app for tomcat.

Tomcat Installation

We will install tomcat as a standard debian system service using apt:

apt-get install tomcat6 tomcat6-examples tomcat6-admin tomcat6-docs

The tomcat web service on port 8080 will start automatically and you can open the default page at http://localhost:8080 The optional packages tomcat6-examples tomcat6-admin tomcat6-docs are great to develop and test applications, but it is also possible to omit them. If you installed the optional packages, then you can test them:

<?xml version='1.0' encoding='utf-8'?>
<tomcat-users>
   <role rolename="admin"/>
   <role rolename="admin-gui"/>
   <role rolename="manager"/>
   <role rolename="manager-gui"/>
   <user username="admin" password="tomcat" roles="admin,admin-gui,manager,manager-gui"/>
</tomcat-users>

After setting this, you must restart tomcat with

/etc/init.d/tomcat6 restart

and then you can log in the manager and host-manager servlet with the user 'admin' and the password 'tomcat'. Please replace the default password 'tomcat' with your own.

The relevant paths for the result of this installation are:

tomcat users:     /etc/tomcat6/tomcat-users.xml
CATALINA_HOME:    /usr/share/tomcat6
CATALINA_BASE:    /var/lib/tomcat6
default web page: /var/lib/tomcat6/webapps/ROOT/index.html

Zookeeper Installation

The SolrCloud peers need a common configuration system which is provided by zookeeper. Zookeeper can be installed with

apt-get install zookeeper zookeeperd

This will create a new user named 'zookeeper'. The relevant paths are at

Zookeeper config: /etc/zookeeper/conf (linked to /etc/zookeeper/conf_example)
Zookeeper data:   /var/lib/zookeeper/
Zookeeper binary: /usr/share/zookeeper/

To check if Zookeeper is running, start the Zookeeper shell:

/usr/share/zookeeper/bin/zkCli.sh

and run shell scripts like

ls /
ls /zookeeper

Because solr is started within tomcat and needs to know the host address of zookeeper, we must assign this to tomcat as a jvm option. Open the file /usr/share/tomcat6/bin/catalina.sh and add the following lines at the begining of the document (right after the comments):

# added zookeeper host information used by tomcat to find Solr shards for the SolrCloud
CATALINA_OPTS=$CATALINA_OPTS -DzkHost=localhost:2181

..and restart tomcat

/etc/init.d/tomcat6 restart

Solr Installation

Download a solr release from http://lucene.apache.org/solr/ (Solr 4.5.1. worked while Solr 4.6.0 did not work!) i.e.

cd /opt
wget http://apache.mirrors.spacedump.net/lucene/solr/4.5.1/solr-4.5.1.tgz
tar xfz solr-4.5.1.tgz
ln -s solr-4.5.1 solr
ln -s solr-4.5.1/dist/solr-4.5.1.war solr.war

Because Solr uses a different logging in jetty as implemented in solr, we must add slf4j adapters to the tomcat library

cd /usr/share/tomcat6/lib/
wget http://www.slf4j.org/dist/slf4j-1.6.6.zip
apt-get install unzip
unzip slf4j-1.6.6.zip 
cp slf4j-1.6.6/{jcl-over-slf4j-1.6.6.jar,slf4j-1.6.6/log4j-over-slf4j-1.6.6.jar,slf4j-1.6.6/slf4j-api-1.6.6.jar,slf4j-1.6.6/slf4j-jdk14-1.6.6.jar} .

and restart tomcat:

/etc/init.d/tomcat6 restart

YaCy Installation

Follow the YaCy for Debian installation instructions and select 'webportal' as network to join into (we consider that you do this not create a standalone-YaCy, not a peer-to-peer participant; you can of course also use this for a 'freeworld' peer as well). The relevant paths are at

YaCy data:          /var/lib/yacy
YaCy log:           /var/log/yacy
YaCy binary:        /usr/share/yacy/
Solr conf for YaCy: /usr/share/yacy/defaults/solr

Software Configuration

The SolrCloud needs a common configuration of the index cores used by YaCy. YaCy uses two cores, 'collection1' and 'webgraph'. Both are defined with a generic index schema and they are exact clones of each other. It may be also possible to defines these cores with non-generic, exact defined schema.xml files, but we will not do that right now because it makes things much more complex.

Zookeeper Client for Solr

First, we need a Zookeeper client for Solr because Solr provides it's own client app to upload the relevant configuration files. We must fabricate this client using the libraries inside the Solr war-file and additional libraries for logging. We use the already installed war file, you must adopt the paths here if you used a more recent version of Solr:

unzip -q /opt/solr.war -d /tmp/solr-war/
mkdir /usr/share/zookeeper/solr-cli-lib
cp /tmp/solr-war/WEB-INF/lib/* /usr/share/zookeeper/solr-cli-lib/ # solr libs
cp /opt/solr/example/lib/ext/* /usr/share/zookeeper/solr-cli-lib/ # logger libs
rm -Rf /tmp/solr-war

Now we can take advantage of the SolrCloud ZooKeeper CLI commands.

Create Solr Configuration of Solr Cores for YaCy Inside Zookeeper

For a detailed description of the set-up of Solr Clusters and a SolrCloud configuration, see the SolrCloud Wiki of apache.org, the SolrCloud Installation in Tomcat, a Guide to SolrCloud Configuration and a SolrCloud Cluster (Single Collection) Deployment. To upload the solr configuration in Zookeeper, we fabricate a config directory using the solr example config and the YaCy genric schema file schema.xml:

cp -R /opt/solr/example/solr/collection1/conf /opt/yacyconf
cp /usr/share/yacy/defaults/solr/schema.xml /opt/yacyconf/

We can then use that to upload the configuration to zookeeper:

java -classpath .:/usr/share/zookeeper/solr-cli-lib/* org.apache.solr.cloud.ZkCLI -zkhost localhost:2181 -cmd upconfig -confdir /opt/yacyconf -confname yacygeneric

That configuration is good for both collections, 'collection1' and 'webgraph'. We can link this configuration therefore to both collections:

java -classpath .:/usr/share/zookeeper/solr-cli-lib/* org.apache.solr.cloud.ZkCLI -zkhost localhost:2181 -cmd linkconfig -collection collection1 -confname yacygeneric
java -classpath .:/usr/share/zookeeper/solr-cli-lib/* org.apache.solr.cloud.ZkCLI -zkhost localhost:2181 -cmd linkconfig -collection webgraph -confname yacygeneric

Lets see whats inside of zookeeper now, i.e. how the collection1 is linked against the generic schema:

/usr/share/zookeeper/bin/zkCli.sh get /collections/collection1

Create Tomcat Configuration of Solr Web Services

We want to use four Solr servers as a SolrCloud, each with two cores ('collection1' and 'webgraph'). We create subdirectories for the servers inside of /var/opt/solrcloud/:

mkdir /var/opt/solrcloud/
mkdir /var/opt/solrcloud/solr0
mkdir /var/opt/solrcloud/solr1
mkdir /var/opt/solrcloud/solr2
mkdir /var/opt/solrcloud/solr3

In each of these directories, put a file named solr.xml. The description for the creation of that file in the web is mainly void, since there is a new xml structure for solr.xml for Solr 4.4 and beyond, especially for Core Discovery with SolrCloud. Put the following content into /var/opt/solrcloud/solr0/solr.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<solr>
  <int name="coreLoadThreads">4</int>
  <solrcloud>
    <str name="host">localhost</str>            
    <int name="hostPort">8080</int>             
    <str name="hostContext">solr0</str>         
    <str name="zkHost">localhost:2181</str>     
    <int name="zkClientTimeout">${solr.zkclienttimeout:30000}</int>
    <str name="shareSchema">${shareSchema:false}</str>
    <str name="genericCoreNodeNames">${genericCoreNodeNames:true}</str>
  </solrcloud>
  <shardHandlerFactory name="shardHandlerFactory" class="HttpShardHandlerFactory">
    <int name="socketTimeout">${socketTimeout:0}</int>
    <int name="connTimeout">${connTimeout:0}</int>
  </shardHandlerFactory>
</solr>

Finally, make the path /var/opt/solrcloud/ writable for tomcat6:

chown -R tomcat6 /var/opt/solrcloud/
chgrp -R tomcat6 /var/opt/solrcloud/

To deploy Solr with the YaCy configuration you must create a Tomcat Context fragment for each Solr instance. A Tomcat Context Fragment is a file in /var/lib/tomcat6/conf/Catalina/localhost. Therefore, we must create four files, one for each Solr server, in this directory: write a file to /var/lib/tomcat6/conf/Catalina/localhost/solr0.xml with the following content:

<?xml version="1.0" encoding="utf-8"?>
<Context docBase="/opt/solr.war" debug="0" crossContext="true">
  <Environment name="solr/home" type="java.lang.String" value="/var/opt/solrcloud/solr0" override="true"/>
</Context>

and copy this to solr1.xml .. solr3.xml and patch the solr/home attribute to solr1 .. solr3. If you patch these files using emacs, make sure that you delete all files ending with '~' because they will cause an error. Finally, restart tomcat:

/etc/init.d/tomcat6 restart

Create the SolrCloud

We can now open the Solr web service at http://localhost:8080/solr0 Open this web page to check if the service is up and running. Then we can use that web service to instantiate the SolrCloud:

curl 'http://localhost:8080/solr0/admin/collections?action=CREATE&name=collection1&numShards=4&replicationFactor=1'
curl 'http://localhost:8080/solr0/admin/collections?action=CREATE&name=webgraph&numShards=4&replicationFactor=1'

Assign the SolrCloud to YaCy

When the SolrCloud is ready and running, it can be assigned to YaCy as storage server. Open the servlet at http://localhost:8090/IndexFederated_p.html and select the flag "Use remote Solr server(s)". As server address, enter one of the Solr servers, like http://192.168.4.10:8080/solr0 Finally, uncheck the flag "Use deep-embedded local Solr".

Converted from https://wiki.yacy.net/index.php?title=Dev:SolrCloud, may be outdated