6 Replies Latest reply on Nov 1, 2012 12:18 AM by clebert.suconic

    Journal file replication NPE issue

    dengyong

      HornetQ: 2.3 BETA

      Topology: Two HornetQ has been configured to form HA live backup group. The HA mode is date replication

       

      Steps:

       

      1. Start master

      2. Start backup

      3. Send some durable messages to master

      4. Stop master

      5. backup take over and then stop the backup

      6. Retry some rounds of above

       

      Finally, seems HornetQ journal data on two servers will be kind of out of sync.

      During journal file replication, there will be NPE thrown in backup server.

       

       

       

       

       

      18:26:37,455 INFO  [org.hornetq.integration.bootstrap] HQ101001: Starting HornetQ Server

       

      18:26:39,080 INFO  [org.hornetq.core.server] HQ111045: Configuration option clustered is deprecated. Consult the manual for details.

       

      18:26:39,096 WARN  [org.hornetq.core.server] HQ112054: AIO was not located on this platform, it will fall back to using pure Java NIO. If your platform is Linux, install LibAIO to enable the AIO journal

       

      18:26:39,205 INFO  [org.hornetq.core.server] HQ111001: backup server is starting with configuration HornetQ Configuration (clustered=true,backup=true,sharedStore=false,journalDirectory=../data/journal,bindingsDirectory=../data/bindings,largeMessagesDirectory=../data/large-messages,pagingDirectory=../data/paging)

       

      18:26:39,236 WARN  [org.hornetq.core.server] HQ112216: Moving data directory ../data/bindings to ..\data\bindings1

       

      18:26:39,236 WARN  [org.hornetq.core.server] HQ112216: Moving data directory ../data/paging to ..\data\paging1

       

      18:26:39,236 WARN  [org.hornetq.core.server] HQ112216: Moving data directory ../data/large-messages to ..\data\large-messages1

       

      18:26:39,299 INFO  [org.hornetq.core.server] HQ111017: Using NIO Journal

       

      18:26:39,314 WARN  [org.hornetq.core.server] HQ112008: Security risk! HornetQ is running with the default cluster admin user and default password. Please see the HornetQ user guide, cluster chapter, for instructions on how to change this.

       

      18:26:42,314 INFO  [org.hornetq.core.server] HQ111037: Waiting to become backup node

       

      18:26:42,330 INFO  [org.hornetq.core.server] HQ111038: ** got backup lock

       

      18:26:42,330 INFO  [org.hornetq.core.server] HQ001111: HornetQ Backup Server version 2.3.0.BETA1 (HornetQ sting, 122) [3a6fc34f-1dc2-11e2-a6fc-c979bbfccd4c] started, waiting live to fail before it gets active

       

      18:26:44,611 WARN  [org.hornetq.core.server] HQ112130: error handling packet PACKET(ReplicationStartSyncMessage)[type=120, channelID=2, packetObject=ReplicationStartSyncMessage] for replication: HornetQException[errorType=IO_ERROR message=HQ149001: failed to rename file hornetq-data-2.hq.tmp to hornetq-data-2.hq]

          at org.hornetq.core.journal.impl.AbstractSequentialFile.renameTo(AbstractSequentialFile.java:159) [hornetq-journal.jar:]

          at org.hornetq.core.journal.impl.JournalFilesRepository.createFile(JournalFilesRepository.java:624) [hornetq-journal.jar:]

          at org.hornetq.core.journal.impl.JournalFilesRepository.createRemoteBackupSyncFile(JournalFilesRepository.java:572) [hornetq-journal.jar:]

          at org.hornetq.core.journal.impl.JournalImpl.createFilesForBackupSync(JournalImpl.java:3067) [hornetq-journal.jar:]

          at org.hornetq.core.replication.ReplicationEndpoint.handleStartReplicationSynchronization(ReplicationEndpoint.java:585) [hornetq-core.jar:]

          at org.hornetq.core.replication.ReplicationEndpoint.handlePacket(ReplicationEndpoint.java:214) [hornetq-core.jar:]

          at org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:548) [hornetq-core-client.jar:]

          at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:550) [hornetq-core-client.jar:]

          at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:511) [hornetq-core-client.jar:]

          at org.hornetq.core.client.impl.ClientSessionFactoryImpl$DelegatingBufferHandler.bufferReceived(ClientSessionFactoryImpl.java:1627) [hornetq-core-client.jar:]

          at org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.messageReceived(HornetQChannelHandler.java:72) [hornetq-core-client.jar:]

          at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:95) [jboss-mc.jar:]

          at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [jboss-mc.jar:]

          at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792) [jboss-mc.jar:]

          at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281) [jboss-mc.jar:]

          at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.decode(HornetQFrameDecoder2.java:169) [hornetq-core-client.jar:]

          at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.messageReceived(HornetQFrameDecoder2.java:134) [hornetq-core-client.jar:]

          at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75) [jboss-mc.jar:]

          at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [jboss-mc.jar:]

          at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [jboss-mc.jar:]

          at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [jboss-mc.jar:]

          at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [jboss-mc.jar:]

          at org.jboss.netty.channel.socket.oio.OioWorker.process(OioWorker.java:72) [jboss-mc.jar:]

          at org.jboss.netty.channel.socket.oio.AbstractOioWorker.run(AbstractOioWorker.java:73) [jboss-mc.jar:]

          at org.jboss.netty.channel.socket.oio.OioWorker.run(OioWorker.java:52) [jboss-mc.jar:]

          at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102) [jboss-mc.jar:]

          at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [jboss-mc.jar:]

          at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:176) [jboss-mc.jar:]

          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_26]

          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_26]

          at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_26]

        • 1. Re: Journal file replication NPE issue
          clebert.suconic

          It seems there's an authorization issue at the directory you used?

           

          I also improved some of the notifications on IOExceptions.. perhaps if you used master (from git) you would get some better report on what the root-cause of the exception.

          • 2. Re: Journal file replication NPE issue
            dengyong

            No, It is not authorization issue. I am using windows xp OS.

            I have some investigation and figure out the reason. It is because data folder "data/journal" fail to be renamed in org.hornetq.core.server.impl.HornetQServerImpl.SharedNothingBackupActivation.moveServerData(). At that time, server lock file server.lock in "data/journal" is opened.

            1 of 1 people found this helpful
            • 3. Re: Journal file replication NPE issue
              dengyong

              yong deng wrote:

               

              No, It is not authorization issue. I am using windows xp OS.

              I have some investigation and figure out the reason. It is because data folder "data/journal" fail to be renamed in org.hornetq.core.server.impl.HornetQServerImpl.SharedNothingBackupActivation.moveServerData(). At that time, server lock file server.lock in "data/journal" is opened.

              Bug HORNETQ-1074 is filed. The link https://issues.jboss.org/browse/HORNETQ-1074

              • 4. Re: Journal file replication NPE issue
                clebert.suconic

                The question is why you had the failure on renaming the server... it could be an authorization issue, right?

                • 5. Re: Journal file replication NPE issue
                  dengyong

                  Clebert Suconic wrote:

                   

                  The question is why you had the failure on renaming the server... it could be an authorization issue, right?

                  No, it is not authorization issue.

                  It is because when renaming "data/journal" folder, server.lock file in "data/journal" folder is opened by HornetQ. On windows OS, if you rename a folder and a file in the folder is opened, it will fail.

                   

                   

                  My previous reply should answer your question.I re-paste my previous reply here.

                   

                  No, It is not authorization issue. I am using windows xp OS.

                  I have some investigation and figure out the reason. It is because data folder "data/journal" fail to be renamed in org.hornetq.core.server.impl.HornetQServerImpl.SharedNothingBackupActivation.moveServerData(). At that time, server lock file server.lock in "data/journal" is opened.

                  • 6. Re: Journal file replication NPE issue
                    clebert.suconic

                    ok... I renamed your post to make it clearer it's an issue on windows. it makes sense!

                     

                     

                    I have asked Francisco to take a look (assigned it to him).