5 Replies Latest reply on Nov 18, 2015 10:15 AM by asaf zinger

    Lucene over Infinispan index corruption

    asaf zinger Newbie

      We are using lucene over inifispan with 2 active nodes, infinispan is persistent to files.

      Sometimes restarting one of the nodes causes to a lucene index corruption. we see this exception in the logs when trying to read from the index:

       

      1. java.io.FileNotFoundException: Error loading metadata for index file: _78.si|M|skywareAccountsIndex

              at org.infinispan.lucene.impl.DirectoryImplementor.openInput(DirectoryImplementor.java:134)

              at org.infinispan.lucene.impl.DirectoryLuceneV4.openInput(DirectoryLuceneV4.java:101)

              at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:113)

              at org.apache.lucene.codecs.lucene46.Lucene46SegmentInfoReader.read(Lucene46SegmentInfoReader.java:49)

              at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:361)

              at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:57)

              at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:907)

              at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:53)        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:67)

       

       

      at

       

      the infinspan configuration is:

      <replicated-cache name="ACCOUNT_LUCENE_METADATA_CACHE" mode="SYNC">

                              <expiration interval="-1" />

                              <locking striping="false" />

                              <persistence passivation="false">

                                      <file-store shared="false" preload="true" fetch-state="true" purge="false"

                                              path="/var/skyfence/management/management_dbs/cache" />

                              </persistence>

                      </replicated-cache>

       

       

                      <replicated-cache name="ACCOUNT_LUCENE_DATA_CACHE" mode="SYNC">

                              <expiration interval="-1" />

                              <locking striping="false" />

                              <persistence passivation="false">

                                      <file-store shared="false" preload="false" fetch-state="true" purge="false"

                                              path="/var/skyfence/management/management_dbs/cache" />

                              </persistence>

                      </replicated-cache>

       

      I understand from the lucene documentation that lucene over file-system should be fault tolerant for crashes, is that also the case for lucene over infinspan?

      Any idea what are we doing wrong?

        • 1. Re: Lucene over Infinispan index corruption
          Gustavo Fernandes Novice

          Hi, what kind of architecture does your system have? Is data stored in Infinispan or in a relational database?  Can you post the full indexing configuration and also which Infinispan version you are using?

          • 2. Re: Lucene over Infinispan index corruption
            asaf zinger Newbie

            Hi,

            We are using infinispan 7.2.1.Final.

             

            We have 2 servers (running over tomcat) with lucene over infinspan. the infinispan persistence configuration is to files or database (depends on the index)

             

            Here is the configuration:

            <?xml version="1.0" encoding="UTF-8"?>

            <infinispan>

             

             

              <jgroups>

              <stack-file name="tcpStack" path="infinispan-tcp-discovery.xml" />

              </jgroups>

             

             

              <cache-container default-cache="CM_SERVICE_TYPES">

              <jmx duplicate-domains="true" />

              <transport stack="tcpStack" cluster="sampleCluster" />

            <replicated-cache name="SKYWARE_SERVICE_LUCENE_METADATA_CACHE"

              mode="SYNC">

              <expiration interval="-1" />

              <locking striping="false" />

              <persistence passivation="false">

              <file-store shared="false" preload="true" fetch-state="true" purge="false"

              path="/var/skyfence/management/management_dbs/cache" />

              </persistence>

              </replicated-cache>

             

             

              <replicated-cache name="SKYWARE_SERVICE_LUCENE_DATA_CACHE"

              mode="SYNC">

              <expiration interval="-1" />

              <locking striping="false" />

              <persistence passivation="false">

              <file-store shared="false" preload="false" fetch-state="true" purge="false"

              path="/var/skyfence/management/management_dbs/cache" />

              </persistence>

              </replicated-cache>

             

             

              <replicated-cache name="SKYWARE_SERVICE_LUCENE_LOCKING_CACHE"

              mode="SYNC" start="EAGER" />

             

            </cache-container>

             

             

            </infinispan>

            • 3. Re: Lucene over Infinispan index corruption
              Gustavo Fernandes Novice

              Thanks, could you also provide info on how do you coordinate the writing to the Lucene index among the two active nodes?

              • 4. Re: Lucene over Infinispan index corruption
                asaf zinger Newbie

                Thans, sure

                we implemented a distributed lock over postgres.

                when ever an index if opened for writing we lock it across the cluster so it is not possible to write in the same time from two different nodes. read operations are not locked

                • 5. Re: Lucene over Infinispan index corruption
                  asaf zinger Newbie

                  After further investigation we found out that when using ASYNC infinispan configuration lucene index is more likely to get corrupted even tough we are writting to the index from only one node:

                  <replicated-cache name="ACCOUNT_LUCENE_METADATA_CACHE"

                                  mode="ASYNC">

                                  <expiration interval="-1" />

                                  <locking striping="false" />

                                  <persistence passivation="false">

                                                  <file-store shared="false" preload="true" fetch-state="true" purge="false"

                                                                  path="/var/skyfence/management/management_dbs/cache" />

                                  </persistence>

                  </replicated-cache>

                   

                  <replicated-cache name="ACCOUNT_LUCENE_DATA_CACHE"

                                  mode="ASYNC">

                                  <expiration interval="-1" />

                                  <locking striping="false" />

                                  <persistence passivation="false">

                                                  <file-store shared="false" preload="false" fetch-state="true" purge="false"

                                                                  path="/var/skyfence/management/management_dbs/cache" />

                                  </persistence>

                  </replicated-cache>

                   

                  <replicated-cache name="ACCOUNT_LUCENE_LOCKING_CACHE"

                                  mode="SYNC" start="EAGER" />

                  We were not able to reproduce the index corruption when using SYNC mode but we are still trying.

                  Is there an explanation why this can happen?