12 Replies Latest reply on Mar 20, 2009 2:21 AM by krishnan366

    Cluster setup

    krishnan366

      Hi,
      I've two instances of jboss 5.0.0 GA up and running in two different machines.

      I am using EJB 3.0 stateless session beans to access jboss cache . I do have the @Clustered annotation in my bean.I am using HA_JNDI to invoke the remote ejb from client.

      When using both the instances of the server , the cache is getting replicated in both the instances , but on server start up both the nodes does does not seem to recognise each other.. From the server logs I see

      07:21:15,536 INFO [DEPartition] Number of cluster members: 1
      07:21:15,536 INFO [DEPartition] Other members: 0


      The entire setup works like a cluster , like if one of the node is down , the request is routed to the other server..etc.. but just the message seems to be incorrect.

      I ran the tests mentioned in the community ( JGroups Probe and Draw) . It does display both the nodes..

      How do I actually verify if this setup is correct? and if the cluster is correctly configured.?



        • 1. Re: Cluster setup
          brian.stansberry

          Look for "New cluster view for partition DEPartition" in your logs; that's what's logged when the view changes as another node joins the group.

          You can also look in the jmx-console, mbean jboss:partitionName=DEPartition,service=HAPartition, attribute CurrentView.

          Getting the message you reported on both nodes is a bit odd, unless you started both nodes at the same time (w/in say 5 seconds of each other). If you start both nodes at the same time, you can get that as neither node has started so neither can be discovered by the other, leading each to form a one-node group. The JGroups MERGE2 protocol (http://www.jboss.org/community/docs/DOC-10896) eventually detects this and combines the 2 one-node groups into a single group.

          • 2. Re: Cluster setup
            krishnan366

            Hi ,
            Thanks for the response.

            I don't have the message

            "New cluster view for partition DEPartition"
            in both my server's logs. Do I need to change jboss-log4j.xml to be able to see this message?

            I do see the message
            [org.jboss.cache.RPCManagerImpl] (main) Received new cluster view
            but this just lists the same ip address and does not reflect the node 2.

            I did wait for a while before starting the second server , say a minute or so..

            Also in the currentView in jmx console I just see one IP address listed. Should this list both the IP address?

            Do I need to change any other files for the cluster set up?




            • 3. Re: Cluster setup
              brian.stansberry

              If you only have one item in the view, then the nodes didn't form a cluster.

              You said you already tried some of this stuff, but not sure if you tried it all, so:

              http://www.jboss.org/community/docs/DOC-12375

              http://www.jgroups.org/manual/html/ch02.html, section 2.8 and on.

              • 4. Re: Cluster setup
                krishnan366

                Hi Brian,
                I haven't gone through all the tests yet , I'll do that and will get back to you..

                In the meanwhile I tried setting up a cluster instance as follows

                a) 2 instances of jboss 5.0.0 GA running in my local machine(2 instances in the same machine).
                b) I've generated 2 different IP addresses using microsoft loop back adaptor.
                c) Started my servers with the -c , -b and ServerPeerID options.
                d) In the application ,I've ejb 3.0 stateless session beans trying to connect to jboss cache.

                e) The cluster is set up in the 2 instances , as the message is displayed as expected.


                i) but in the jboss cache displays the message

                WARN [TxInterceptor] Commit failed. Clearing stale locks.


                And the cache data is not available in the database.Is this error because of the loop back adapter?

                2) how is setting up a loop back adaptor different from using 2 ip address of the localhost(one LAN IP and the otherone for the wireless) . Without the loop hole adaptor , cluster does not get formed.


                • 5. Re: Cluster setup
                  brian.stansberry

                   

                  "krishnan366" wrote:

                  i) but in the jboss cache displays the message

                  WARN [TxInterceptor] Commit failed. Clearing stale locks.


                  And the cache data is not available in the database.Is this error because of the loop back adapter?


                  If the two nodes are properly forming a cluster, then your loopback adaptor setup looks to be working. Which reduces the likelihood that it's the cause of your problem.
                  Beyond that, you'd need to give a lot more information on the problem than the above to expect any kind of diagnosis.


                  2) how is setting up a loop back adaptor different from using 2 ip address of the localhost(one LAN IP and the otherone for the wireless) . Without the loop hole adaptor , cluster does not get formed.


                  If you use two different IPs associated with different NICs you need to add a mulitcast route to your routing table.

                  Also, as an FYI to anyone reading this, using localhost as one node's address and an IP on another adapter for the other address will not work on Windows.

                  • 6. Re: Cluster setup
                    krishnan366

                    Hi Brian,
                    I am using a JDBCCacheLoader and on eviction of any of the nodes, I get the message as mentioned in my previous post . and the nodes are not stored in the database because of the above message.

                    I just removed the configuration from my cache-config.xml.


                    <clustering mode="replication">

                    <!--
                    Defines whether to retrieve state on startup
                    -->
                    <stateRetrieval timeout="20000" fetchInMemoryState="false"/>

                    <!--
                    Network calls are synchronous.
                    -->
                    <sync replTimeout="20000"/>
                    </clustering>


                    Without this the nodes are persisted and work without any issue. Is something wrong in the above configuration that causes the issue?

                    • 7. Re: Cluster setup
                      krishnan366

                      Reposting the xml




                      <clustering mode="replication" clusterName="DataCacheCluster">
                      
                      
                       <stateRetrieval timeout="20000" fetchInMemoryState="false"/>
                       <sync replTimeout="20000"/>
                      
                       </clustering>


                      • 8. Re: Cluster setup
                        brian.stansberry

                        Please post your full JBoss Cache config file.

                        Are these caches writing to a single shared database?

                        • 9. Re: Cluster setup
                          krishnan366

                          Please find the complete configuration file.

                          Yes, they are writing to a single shared DB

                          <?xml version="1.0" encoding="UTF-8"?>
                          <jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.0">
                          
                           <locking
                           isolationLevel="READ_COMMITTED"/>
                           <transaction
                           transactionManagerLookupClass="org.jboss.cache.transaction.GenericTransactionManagerLookup"
                           syncRollbackPhase="false"
                           syncCommitPhase="false"/>
                          
                          
                           <clustering mode="replication">
                          
                          
                           <stateRetrieval timeout="20000" fetchInMemoryState="false"/>
                           <sync replTimeout="20000"/>
                          
                           </clustering>
                          
                           <eviction wakeUpInterval="500" >
                          
                           <default algorithmClass="org.jboss.cache.eviction.FIFOAlgorithm" eventQueueSize="100000">
                           <property name="maxNodes" value="100" />
                           <property name="minTimeToLive" value="36000"/>
                           </default>
                          
                           <region name="/regionA">
                           <property name="maxNodes" value="3" />
                           <property name="minTimeToLive" value="2000"/>
                          
                           </region>
                           </eviction>
                          
                          
                           <loaders passivation="false" shared="true">
                           <preload><node fqn="/"></node></preload>
                           <loader class="org.jboss.cache.loader.JDBCCacheLoader" async="false" fetchPersistentState="false"
                           ignoreModifications="false" purgeOnStartup="true">
                           <properties>
                           cache.jdbc.datasource=java:/ProjectOracleDS
                           location=./
                           cache.jdbc.table.drop=false
                           cache.jdbc.table.name=cache_loader
                           cache.jdbc.table.primarykey=FQN
                           cache.jdbc.fqn.column=FQN
                           cache.jdbc.node.column=cache_data
                           cache.jdbc.parent.column=parent_fqn
                           cache.jdbc.sql-concat=1||2
                           </properties>
                          
                           </loader>
                           </loaders>
                          
                          
                          
                          
                          </jbosscache>
                          


                          I got get this exception

                          org.jboss.cache.lock.TimeoutException: Unable to acquire lock on Fqn



                          • 10. Re: Cluster setup
                            brian.stansberry

                            Your config looks fine.

                            You need to provide a lot more context about what is going on. A one line snippet from an exception stack trace with no real information about what your app is doing at the time is useless.

                            • 11. Re: Cluster setup
                              krishnan366

                              Actually I am trying to put data into the cache . since the JDBCCacheLoader is configured and passivation is false , I expect the data to be available in the database as well.

                              But I don;t get to see the data in the database , I assume that the the database write fails and hence the exception

                              WARN [TxInterceptor] Commit failed. Clearing stale locks.



                              I don't face any problem when I comment the lines
                              <clustering mode="replication">


                              <stateRetrieval timeout="20000" fetchInMemoryState="false"/>
                              <sync replTimeout="20000"/>

                              </clustering>

                              from my cache config.





                              • 12. Re: Cluster setup
                                krishnan366

                                I commented the lines below and the cache works fine as expected



                                <clustering mode="replication">
                                
                                
                                 <stateRetrieval timeout="20000" fetchInMemoryState="false"/>
                                 <sync replTimeout="20000"/>
                                
                                 </clustering>


                                So shouldn't I have the above in my cache config for a cluster set up?