Get Socket closed exception, the nodes leaves the group and doesn't re-join anymore
nsemchenkov Dec 11, 2010 4:00 AMThe JBoss server works fine as a cluster member for a while and then all of a sudden gets "Socket closed" exception. After that its not a member of the cluster anymore.
Please note, we have set shun=false trying to keep it in the cluster but it didnt help
In the past we had JBoss 4.2 running on the same hardware and had no issues. So It's probably not a networking problem
Please find below an excerpt from the cluster.log
cluster.log:
2010-12-10 09:16:14,160 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948) 2010-12-10 09:16:14,161 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948) 2010-12-10 09:16:14,161 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948) 2010-12-10 09:16:14,548 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:7901 (own address=10.170.1.24:7900) 2010-12-10 09:17:06,591 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.24:7901: java.net.SocketException: Socket closed 2010-12-10 09:17:06,628 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:47583 (own address=10.170.1.24:38487) 2010-12-10 09:17:06,638 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:47583 (number=0) 2010-12-10 09:17:06,640 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:7901 (own address=10.170.1.24:7900) 2010-12-10 09:17:06,644 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948) 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0) 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948) 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0) 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948) 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0) 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948) 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0) 2010-12-10 09:17:06,646 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948) 2010-12-10 09:17:06,646 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0) 2010-12-10 09:17:06,646 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948) 2010-12-10 09:17:06,646 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0) 2010-12-10 09:17:06,852 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.25:7900: java.net.SocketException: Socket closed 2010-12-10 09:17:07,960 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.17:7900: java.net.SocketException: Socket closed 2010-12-10 09:17:08,113 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.16:7900: java.net.SocketException: Socket closed 2010-12-10 09:17:08,167 WARN [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:54399; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK 2010-12-10 09:17:08,176 WARN [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:54399; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK 2010-12-10 09:17:08,177 WARN [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:54399; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK 2010-12-10 09:17:08,177 WARN [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:54399; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK 2010-12-10 09:17:08,213 WARN [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:33585; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK 2010-12-10 09:17:08,256 WARN [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:33585; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK 2010-12-10 09:17:08,297 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.15:7901: java.net.SocketException: Broken pipe 2010-12-10 09:17:08,309 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding 2010-12-10 09:17:08,309 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding 2010-12-10 09:17:08,310 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399, 10.170.1.15:33585, 10.170.1.24:39588] 2010-12-10 09:17:08,310 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399, 10.17 0.1.15:33585, 10.170.1.24:39588] 2010-12-10 09:17:08,310 WARN [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399 , 10.170.1.15:33585, 10.170.1.24:39588]; discarding view 2010-12-10 09:17:08,311 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up 2010-12-10 09:17:08,311 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399, 10.170.1.15:33585, 10.170.1.24:39588] 2010-12-10 09:17:08,311 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399, 10.17 0.1.15:33585, 10.170.1.24:39588] 2010-12-10 09:17:08,312 WARN [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399 , 10.170.1.15:33585, 10.170.1.24:39588]; discarding view 2010-12-10 09:17:08,312 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up 2010-12-10 09:17:08,313 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding 2010-12-10 09:17:08,313 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding 2010-12-10 09:17:08,313 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding 2010-12-10 09:17:08,314 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|11] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.170.1.15:54399, 10.170.1.24:39588] 2010-12-10 09:17:08,314 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|11] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.17 0.1.15:54399, 10.170.1.24:39588] 2010-12-10 09:17:08,314 WARN [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|11] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585 , 10.170.1.15:54399, 10.170.1.24:39588]; discarding view 2010-12-10 09:17:08,315 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.170.1.15:54399, 10.170.1.24:39588] 2010-12-10 09:17:08,315 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.170.1.15:54399, 10.170.1.24:39588] 2010-12-10 09:17:08,315 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.17 0.1.15:54399, 10.170.1.24:39588] 2010-12-10 09:17:08,315 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.17 0.1.15:54399, 10.170.1.24:39588] 2010-12-10 09:17:08,315 WARN [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585 , 10.170.1.15:54399, 10.170.1.24:39588]; discarding view 2010-12-10 09:17:08,315 WARN [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585 , 10.170.1.15:54399, 10.170.1.24:39588]; discarding view 2010-12-10 09:17:08,316 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up 2010-12-10 09:17:08,316 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up 2010-12-10 09:17:08,317 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up 2010-12-10 09:17:08,313 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding 2010-12-10 09:17:08,378 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.170.1.15:54399, 10.170.1.24:39588] 2010-12-10 09:17:08,378 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.17 0.1.15:54399, 10.170.1.24:39588] 2010-12-10 09:17:08,379 WARN [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585 , 10.170.1.15:54399, 10.170.1.24:39588]; discarding view 2010-12-10 09:17:08,379 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up 2010-12-10 09:17:08,678 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.15:7900: java.net.SocketException: Socket closed 2010-12-10 09:17:08,877 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.16:7901: java.net.SocketException: Socket closed 2010-12-10 09:17:12,126 WARN [org.jgroups.protocols.VIEW_SYNC] discarding view as I (10.170.1.24:60948) am not member of view ([10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.170.1.15:54399, 10.170.1.24:39588]) 2010-12-10 09:17:12,648 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
We use default JGroups configuration from JBoss 5.1 server.
Can anybody advise how we can investigate this problem?