Hi Sacha/Bela,
I have copied part of server. log from my jboss running in cluster mode . I have two Jboss3.2.2 running in cluster mode. Each jboss is running in heavy load and write/read/remove some data to/from distributed state for each request.
Server.log from Node 1 :
a) After 2-3 hours of test starts getting the below error
2004-04-02 11:07:46,490 DEBUG [org.javagroups.XSAM_Partition_cluster_GlobeTesting] [Fri Apr 2 11:07:46 CEST 2004] [ERROR] FD.Monitor.run(): ping_dest is null
2004-04-02 11:07:46,728 INFO [STDOUT] [SOAK Queuing ] Connter : 444000
2004-04-02 11:07:48,984 DEBUG [org.javagroups.XSAM_Partition_cluster_GlobeTesting] [Fri Apr 2 11:07:48 CEST 2004] [ERROR] FD.Monitor.run(): ping_dest is null
2004-04-02 11:07:49,349 INFO [STDOUT] [SOAK Queuing ] Connter : 444100
2004-04-02 11:07:51,466 INFO [STDOUT] [SOAK Queuing ] Connter : 444200
2004-04-02 11:07:51,494 DEBUG [org.javagroups.XSAM_Partition_cluster_GlobeTesting] [Fri Apr 2 11:07:51 CEST 2004] [ERROR] FD.Monitor.run(): ping_dest is null
2004-04-02 11:07:53,670 INFO [STDOUT] [SOAK Queuing ] Connter : 444300
2004-04-02 11:07:54,004 DEBUG [org.javagroups.XSAM_Partition_cluster_GlobeTesting] [Fri Apr 2 11:07:54 CEST 2004] [ERROR] FD.Monitor.run(): ping_dest is null
2004-04-02 11:08:09,539 DEBUG [org.javagroups.XSAM_Partition_cluster_GlobeTesting] [Fri Apr 2 11:08:09 CEST 2004] [ERROR] FD.Monitor.run(): ping_dest is null
2004-04-02 11:08:09,829 INFO [org.jboss.ha.framework.interfaces.HAPartition.XSAM_Partition_cluster_GlobeTesting] Suspected member: jfk:63378 (additional data: 19 bytes)
2004-04-02 11:08:10,220 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.XSAM_Partition_cluster_GlobeTesting]
b) Then I can see one its showing one node as dead but the other node is still running.
New cluster view (id: 2, delta: -1) : [172.16.11.138:21099]
2004-04-02 11:08:10,224 INFO [XSAM_Partition_cluster_GlobeTesting:ReplicantManager] Dead members: 1
2004-04-02 11:08:10,225 DEBUG [XSAM_Partition_cluster_GlobeTesting:ReplicantManager] trying to remove deadMember 172.16.11.145:21099 for key HAJNDI
2004-04-02 11:08:10,225 DEBUG [XSAM_Partition_cluster_GlobeTesting:ReplicantManager] 172.16.11.145:21099 was removed
2004-04-02 11:08:10,225 DEBUG [XSAM_Partition_cluster_GlobeTesting:ReplicantManager] notifyKeyListeners
2004-04-02 11:08:10,225 DEBUG [XSAM_Partition_cluster_GlobeTesting:ReplicantManager] notifying 1 listeners for key change: HAJNDI
2004-04-02 11:08:10,225 DEBUG [XSAM_Partition_cluster_GlobeTesting:ReplicantManager] trying to remove deadMember 172.16.11.145:21099 for key DCacheBridge-DefaultJGBridge
2004-04-02 11:08:10,225 DEBUG [XSAM_Partition_cluster_GlobeTesting:ReplicantManager] 172.16.11.145:21099 was removed
2004-04-02 11:08:10,226 DEBUG [XSAM_Partition_cluster_GlobeTesting:ReplicantManager] notifyKeyListeners
2004-04-02 11:08:10,226 DEBUG [XSAM_Partition_cluster_GlobeTesting:ReplicantManager] notifying 1 listeners for key change: DCacheBridge-DefaultJGBridge
2004-04-02 11:08:10,274 INFO [STDOUT] [SOAK Queuing ] Connter : 444400
2004-04-02 11:08:13,349 INFO [STDOUT] [SOAK Queuing ] Connter : 444500
2004-04-02 11:08:16,127 INFO [STDOUT] [SOAK Queuing ] Connter : 444600
2004-04-02 11:08:18,844 INFO [STDOUT] [SOAK Queuing ] Connter : 444700
2004-04-02 11:08:21,708 INFO [STDOUT] [SOAK Queuing ] Connter : 444800
2004-04-02 11:08:24,732 INFO
c) Again the machine came back in cluster
[org.jboss.ha.framework.interfaces.HAPartition.lifecycle.XSAM_Partition_cluster_GlobeTesting] New cluster view (id: 3, delta: 1) : [172.16.11.138:21099, 172.16.11.145:65263]
2004-04-02 11:08:24,733 INFO [XSAM_Partition_cluster_GlobeTesting:ReplicantManager] Dead members: 0
2004-04-02 11:08:24,770 INFO [STDOUT] [SOAK Queuing ] Connter : 444900
2004-04-02 11:08:24,956 ERROR
d) Then this exception
[org.jboss.ha.framework.interfaces.HAPartition.XSAM_Partition_cluster_GlobeTesting] GetState failed
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:762)
at java.util.HashMap$EntryIterator.next(HashMap.java:804)
at java.util.HashMap.writeObject(HashMap.java:956)
at sun.reflect.GeneratedMethodAccessor100.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:324)
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:795)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1294)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1245)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1052)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:278)
at java.util.HashMap.writeObject(HashMap.java:958)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:324)
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:795)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1294)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1245)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1052)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:278)
at java.util.HashMap.writeObject(HashMap.java:958)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:324)
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:795)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1294)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1245)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1052)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:278)
at org.jboss.ha.framework.server.HAPartitionImpl.objectToByteBuffer(HAPartitionImpl.java:146)
at org.jboss.ha.framework.server.HAPartitionImpl.getState(HAPartitionImpl.java:346)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.passUp(MessageDispatcher.java:462)
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:292)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:515)
at org.jgroups.JChannel.up(JChannel.java:860)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:314)
at org.jgroups.stack.ProtocolStack.receiveUpEvent(ProtocolStack.java:330)
at org.jgroups.stack.Protocol.passUp(Protocol.java:470)
at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:103)
at org.jgroups.stack.UpHandler.run(Protocol.java:55)
Server.log from Node 2:
a) First throws the below error continue for sometime.
2004-04-02 11:08:22,201 DEBUG [org.javagroups.XSAM_Partition_cluster_GlobeTesting] [Fri Apr 2 11:08:22 CEST 2004] [ERROR] UDP.down(): dest address of message is null, and sending to default address fails as mcast_addr is null, too ! Discarding message MethodCall (name=DistributedState._set, number of args=3)
Args:
[XSAMTransactionLog (java.lang.String)]
[0123456789:020404112151 (java.lang.String)]
[{MT_MSG_IDS=[Ljava.lang.String;@123f331, MT_NO_OF_MSG=1, MT_TIMESTAMP=1080896902190} (java.util.TreeMap)]
2004-04-02 11:08:22,213 DEBUG [org.javagroups.XSAM_Partition_cluster_GlobeTesting] [Fri Apr 2 11:08:22 CEST 2004] [ERROR] MessageDispatcher.up(): corr == null
2004-04-02 11:08:22,219 DEBUG [org.javagroups.XSAM_Partition_cluster_GlobeTesting] [Fri Apr 2 11:08:22 CEST 2004] [ERROR] GroupRequest.execute(): both corr and transport are null, cannot send group request
2004-04-02 11:08:22,221 DEBUG [org.javagroups.XSAM_Partition_cluster_GlobeTesting] [Fri Apr 2 11:08:22 CEST 2004] [ERROR] GroupRequest.execute(): both corr and transport are null, cannot send group request
2004-04-02 11:08:22,227 DEBUG [org.javagroups.XSAM_Partition_cluster_GlobeTesting] [Fri Apr 2 11:08:22 CEST 2004] [ERROR] GroupRequest.execute(): both corr and transport are null, cannot send group request
Â…
Â…
Â…
Â…
b) then the error below and after sometimes continue “message with seqno=23 not found in” error
2004-04-02 11:08:24,840 DEBUG [org.javagroups.XSAM_Partition_cluster_GlobeTesting] [Fri Apr 2 11:08:24 CEST 2004] [ERROR] MessageDispatcher.up(): corr == null
2004-04-02 11:08:24,840 DEBUG [org.javagroups.XSAM_Partition_cluster_GlobeTesting] [Fri Apr 2 11:08:24 CEST 2004] [ERROR] MessageDispatcher.up(): corr == null
2004-04-02 11:08:24,840 DEBUG [org.javagroups.XSAM_Partition_cluster_GlobeTesting] [Fri Apr 2 11:08:24 CEST 2004] [ERROR] MessageDispatcher.up(): corr == null
2004-04-02 11:08:25,395 DEBUG [org.javagroups.XSAM_Partition_cluster_GlobeTesting] [Fri Apr 2 11:08:25 CEST 2004] [ERROR] NAKACK.handleXmitReq(): (requester=jfk:65263) message with seqno=22 not found in sent_msgs ! sent_msgs=94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 102 101 100 99 98 97 96 95
2004-04-02 11:08:25,396 DEBUG [org.javagroups.XSAM_Partition_cluster_GlobeTesting] [Fri Apr 2 11:08:25 CEST 2004] [ERROR] NAKACK.handleXmitReq(): (requester=jfk:65263) message with seqno=23 not found in sent_msgs ! sent_msgs=94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 102 101 100 99 98 97 96 95
Â…
Â…
004-04-02 11:08:26,004 DEBUG [org.javagroups.XSAM_Partition_cluster_GlobeTesting] [Fri Apr 2 11:08:26 CEST 2004] [ERROR] NAKACK.handleXmitReq(): (requester=jfk:65263) message with seqno=34 not found in sent_msgs ! sent_msgs=
Please give your valuable suggestion to resolve this issue. When “with seqno=34 not found in sent_msgs ! sent_msgs=” error starts coming jboss created 20 MB big server.log file in each 3-4 minutes.
Thanks in advance,
--monu