1 Reply Latest reply on Dec 11, 2014 6:11 AM by david.novak

    Distributed persistent cache: Severe errors and loosing data

    david.novak

      Hi everybody,

       

      I'm using Ispn 7.0.2 with the distribution mode together with persisent data in file-store (config attached). I start one node, insert data (20M objects, together 124GB).

      When I start a second node, it should take over 1/2 of the data and now either:

      1) the communication between the nodes failes generating endless stream of severe log entries (part of the log attached)

      2) or the communication seems to finish OK (with a number of WARNINGs, in the third attachment), size of the disk stores seems to be OK (124GB and 62GB) but the cache seems to loose half of the data: running cache.size() from either of the nodes returns about 10M objects (but manager.getMembers().size() returns "2"). The entries are really 'missing' when I try cache.get(entry_id) from either node, neither of them finds the entry.

       

      Please, am I doing anything wrong or do you think that I should report it to JIRA?

       

      Thank you a lot!

        • 1. Re: Distributed persistent cache: Severe errors and loosing data
          david.novak

          It seems that I have isolated the error which occures only if I am use distribution mode with await-initial-transfer="false".

          In my case, I fill the first node then start the second and the initial transfer takes a long time; in the meanwhile, because the second node is "not waiting for the initial transfer", it starts to communicate with the first node and there comes the bug which first results in warning:

           

          WARNING: Node1-47758, site-id=FIMU, rack-id=defaultRack, machine-id=null: I was suspected by Node2-28630, site-id=FIMU, rack-id=defaultRack, machine-id=null; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK

           

          and then to an infinite chain or errors:

           

          Dec 10, 2014 9:08:18 AM org.infinispan.persistence.file.SingleFileStore$1 call

          ERROR: ISPN000252: Error executing parallel store task

          org.infinispan.persistence.spi.PersistenceException: java.nio.channels.ClosedByInterruptException

            at org.infinispan.persistence.file.SingleFileStore._load(SingleFileStore.java:475)

            at org.infinispan.persistence.file.SingleFileStore.access$600(SingleFileStore.java:63)

            at org.infinispan.persistence.file.SingleFileStore$1.call(SingleFileStore.java:518)

            at org.infinispan.persistence.file.SingleFileStore$1.call(SingleFileStore.java:514)

            at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

            at java.util.concurrent.FutureTask.run(FutureTask.java:166)

            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

            at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

            at java.util.concurrent.FutureTask.run(FutureTask.java:166)

            at org.infinispan.util.concurrent.WithinThreadExecutor.execute(WithinThreadExecutor.java:22)

            at java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:181)

            at org.infinispan.executors.ExecutorAllCompletionService.submit(ExecutorAllCompletionService.java:31)

            at org.infinispan.persistence.file.SingleFileStore.process(SingleFileStore.java:514)

            at org.infinispan.statetransfer.OutboundTransferTask.run(OutboundTransferTask.java:171)

            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

            at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

            at java.util.concurrent.FutureTask.run(FutureTask.java:166)

            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

            at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

            at java.util.concurrent.FutureTask.run(FutureTask.java:166)

            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

            at java.lang.Thread.run(Thread.java:722)

          Caused by: java.nio.channels.ClosedByInterruptException

            at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)

            at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:679)

            at org.infinispan.persistence.file.SingleFileStore._load(SingleFileStore.java:473)

            ... 22 more