13 Replies Latest reply on Apr 26, 2016 8:16 AM by shawkins

    Warning messages in clustered usage of Teiid Embedded

    pranavk

      Hi,

       

      I am using Teiid Embedded 8.13.Final in clustered mode in my application. For this I am using infinispan-replicated-config.xml file provided in the Teiid runtime jar, and I have modified the Jgroups.xml where I have set "initial_hosts" attribute of TCPPING to have addresses of the 2 nodes in my cluster.

      To switch on clustering I have set the both the infinispan-replicated-config file and the Jgroups file in the embedded configuration before I call EmbeddedServer.start() so that both Teiid and Jgroups use the same jgroups file for initialization.

      Performing these steps with 8.12 having the older Infinispan and Jgroups worked fine. After the upgrade when I switch on a single instance of my jetty based server (in which Teiid is used in embedded mode), I keep seeing the following messages again and again. I see these messages even when only one of the 2 server instances get up. When both are up, there is even more cross talk that starts happening.

       

      Are you able to see this as well? Any suggestions on this?

       

      I see that many people faced this issue with Wildfly 9 and Jgroups. Ill attach a few links.

       

      Thanks,

      Pranav

        • 1. Re: Warning messages in clustered usage of Teiid Embedded
          pranavk

          Forgot to add the messages above.

          WARN  [logger] JGRP000012: discarded message from different cluster teiid-cluster (our cluster is teiid-replicator). Sender was Node-A-8509 (received 3 identical messages from Node-A-8509 in the last 100796 ms

          WARN  [logger] JGRP000012: discarded message from different cluster teiid-replicator (our cluster is teiid-cluster). Sender was PRANAVPC-29332 (received 3 identical messages from PRANAVPC-29332 in the last 103125 ms

          WARN  [logger] JGRP000012: discarded message from different cluster teiid-cluster (our cluster is teiid-replicator). Sender was Node-A-8509 (received 3 identical messages from Node-A-8509 in the last 92501 ms

          WARN  [logger] JGRP000012: discarded message from different cluster teiid-replicator (our cluster is teiid-cluster). Sender was PRANAVPC-29332 (received 3 identical messages from PRANAVPC-29332 in the last 89036 ms

          WARN  [logger] JGRP000012: discarded message from different cluster teiid-cluster (our cluster is teiid-replicator). Sender was Node-A-8509 (received 3 identical messages from Node-A-8509 in the last 101848 ms)

          WARN  [logger] JGRP000012: discarded message from different cluster teiid-replicator (our cluster is teiid-cluster). Sender was PRANAVPC-29332 (received 3 identical messages from PRANAVPC-29332 in the last 72367 ms)

          WARN  [logger] JGRP000012: discarded message from different cluster teiid-cluster (our cluster is teiid-replicator). Sender was Node-A-8509 (received 3 identical messages from Node-A-8509 in the last 83794 ms)

           



          The links I talked about above are:

          https://issues.jboss.org/browse/WFLY-5189

          https://issues.jboss.org/browse/WFLY-4971

          http://stackoverflow.com/questions/32223235/wildfly-9-jgrp000012-discarded-message-from-different-cluster-hq-cluster-our

           

          The second link mentions a solution which involves setting 'log_discard_msgs' property to false in the jgroups file. This stopped the warning messages for me, but deploying in a clustered environment started to fail. I'll try and figure that out.

          This also mentions a solution involving bifurcation of jgroup stacks for both clusters - teiid-cluster (infinispan) and teiid-replicator (jgroups) in our case. Should that be done in Teiid?

          WFLY-5189 shows the issue as resolved with Wildfly 10 (which includes Jgroups 3.6.5). Would there be any plans to move to that with Teiid 9?

           

           

          Thanks

          • 2. Re: Warning messages in clustered usage of Teiid Embedded
            shawkins

            > I'll try and figure that out.

             

            It seems odd that setting that property would cause a failure on start.  Please let us know what you find out.

             

            > Should that be done in Teiid?

             

            I thought the JGroups wanted to make it easier to share stacks, not create multiples - as that adds a lot of overhead.  If possible we should share the stack.

             

            > Would there be any plans to move to that with Teiid 9?

             

            We'll stay with WildFly 9 with Teiid 9.0, then move to WildFly 10 in a subsequent minor or major revision depending upon the scope of changes needed by the platform change.  Either way I'd expect we'll have a release based upon WildFly 10 in about 6 months.

            • 3. Re: Warning messages in clustered usage of Teiid Embedded
              pranavk

              Thanks Steve

               

              > It seems odd that setting that property would cause a failure on start.  Please let us know what you find out.

              Yes, setting the property had nothing to do with the failure. Instead, I observed that once I start Teiid server in the cluster mode (with or without that property ofcourse) inside my jetty based server, deployment failed to go through when I tried deploying on one of the server instances. It seems to get hung up. I could not go deep into that today but its probably when the call comes to JgroupsObjectReplicator class where it replicates the GlobalTableStoreImpl object.

              Are you seeing this behavior at your end?

               

              PS: Deployment goes fine when one server is started in cluster mode, and the second server isnt started. This means they are hanging up while replicating coordinating when connected through the Jgroups cluster.

              • 4. Re: Warning messages in clustered usage of Teiid Embedded
                pranavk

                Hi Steve, Is this this issue replicable at your end?

                 

                Any pointers would be great.

                 

                Thanks,

                Pranav

                • 5. Re: Warning messages in clustered usage of Teiid Embedded
                  shawkins

                  > Hi Steve, Is this this issue replicable at your end?

                   

                  You mean a startup issue?  No.  We would need all of the relevant configuration details and ideally thread dumps to assist there.  Is this always occurring for you - or is it somehow related to the logging issue?

                   

                  As for the logging issue the simplest change does seem to be the ignore property or you could make increase the severity of the org.jgroups logging context.

                  • 6. Re: Warning messages in clustered usage of Teiid Embedded
                    pranavk

                    Thanks for the suggestions.

                     

                    > or is it somehow related to the logging issue?

                    Logging does not seem to be an issue anymore, that property seems to work fine for now.

                     

                    > You mean a startup issue?

                    I am not facing an issue at startup. I started the 2 different physical machines of my cluster in sequential order. That went fine. Using Java's utility jconsole I could see that Infinispan cluster had formed properly as well. I'll share the config details soon once I have access to those machines.

                    Now after the 2 nodes had started, if I deployed a VDB in one of the nodes, it was then that the call got hung up somewhere. I see that the replicate method of JgroupsObjectReplicator is called at this instance for some sync up process related to Mat views (I suppose). Could that be an area where the two Teiid instances are not able to sync up? Is the VDB deployment going fine for you in this scenario where the nodes are up?

                    • 7. Re: Warning messages in clustered usage of Teiid Embedded
                      shawkins

                      > Now after the 2 nodes had started, if I deployed a VDB in one of the nodes, it was then that the call got hung up somewhere

                       

                      Do you then continue to deploy the vdb at all nodes?  We effectively have an assumption of a homogeneous cluster in terms of deployments.

                      • 8. Re: Warning messages in clustered usage of Teiid Embedded
                        pranavk

                        Yes, in our application VDB’s are deployed at all nodes in one go. A deploy action at one server instance dispatches a synchronous ’deploy’ event at all the nodes. Hence once a VDB is deployed on all nodes, only then the call returns for a successful deployment in clustered mode.

                        This mechanism worked fine for us till 8.12. Since upgrade to 8.13, this piece is failing.

                         

                        And just some background: The only change I have made to the application to make it different from the default Teiid functioning is, that I modify the JGroups file - tcp-shared.xml and Infinispan’s infinispan-replicated-config.xml files (teiid-runtime.jar) using a DOM parser. In the tcp-shared.xml file I feed in the initial_hosts attribute with the IP addresses of cluster members, and in the infinispan-file i just modify the reference path to the modified tcp-shared.xml. We of course ran with these modifications in 8.12 as well.

                        • 9. Re: Warning messages in clustered usage of Teiid Embedded
                          pranavk

                          Attached conf files.

                           

                          My application ensures a homogeneous deployment as mentioned above. But just to break it down further, I just tried the basic scenario where I used the attached infinispan and jgroups configuration file. (I have only modified the initial_hosts attribute in the jgroups file, and the stack-file path in the infinispan file for my application). I set these 2 files in the embedded configuration before calling EmbeddedServer.start().

                          For this scenario, I switched off my application clustering which ensured homogeneity, and I created a Salesforce based source model, put in it in a VDB (VDB_test1) on both nodes - keeping all the names same at both the nodes. When I attempted to deploy on one node, the deploy call got hung up  while I immediately got a message on the other node saying - fork-channel for id=VDS_test1.1 not found; discarding message.

                          As described earlier, the deployment never goes through for me in this case too, as was the case with the homogeneous setup of my application ensuring a synchronous deployment (as explained in the post above).

                           

                          Are you able to reproduce it at your end with this info?

                          • 10. Re: Warning messages in clustered usage of Teiid Embedded
                            shawkins

                            Yes, I think I'm seeing the same behavior.  If the sequence goes start server 1, deploy vdb server 1, start server 2, deploy vdb server 2- everything works (which is the case covered by a unit test).  If the sequence is start server 1, start server 2, deploy vdb server 1 - then the deployment won't complete for 5 minutes as it's subject to a timeout from CompositeGlobalTableStore.createInstance.  Attempting to do the deployment in parallel seems to just be a matter of timing as to whether the same timeout is hit.  So we definitely need to improve this either by making the timeout smaller or by better detecting this case.  Can you log something for this?

                            • 11. Re: Warning messages in clustered usage of Teiid Embedded
                              shawkins

                              I went ahead and logged [TEIID-4169] Deployment/start sequence issue - JBoss Issue Tracker since I had the changes ready.  This will be addressed in 9.0 and 8.13.4.

                              • 12. Re: Warning messages in clustered usage of Teiid Embedded
                                pranavk

                                Thanks Steve. Exactly, the first sequence you mentioned was going fine but the second one (the more logical use case) did not.

                                 

                                But I am still wondering, was there a change made to this code piece/behavior or could it be a change introduced due to the Jgroups upgrade (coming with wildfly 9)? Because the same parallel deployment mechanism (the synchronous deployment I talked about above) worked perfectly for me till 8.12.

                                • 13. Re: Warning messages in clustered usage of Teiid Embedded
                                  shawkins

                                  > But I am still wondering, was there a change made to this code piece/behavior or could it be a change introduced due to the Jgroups upgrade (coming with wildfly 9)?

                                   

                                  It's likely that there was a JGroups behavioral change.  We went from version 3.2.x to 3.6.x.  Our code directly on top of JGroups did not change much - especially in a way that would affect this behavior.