5 Replies Latest reply on Oct 15, 2009 3:29 PM by clebert.suconic

    A few observations on the committed replication code

    timfox

      Looks good, a few questions though:

      1) You don't need to create a new ReplicationContext on every invocation on the session - you could re-use the previous one (a session is single threaded)

      2) Currently replication context runs the tasks when pending count reaches zero. This doesn't seem correct to me, since pending count could reach zero before all replication responses for an operation come back. E.g.

      You're assuming this occurs:

      send a message
      replicate op1
      pending count is now 1
      replicate op2
      pending count is now 2
      response for op1 comes back
      pending count is now 1
      response for op2 comes back
      pending count is now 0
      response packet is now sent back to user

      However replication response for op1 could come back *before* op2 is sent

      I.e.

      send a message
      replicate op1
      pending count is now 1
      response for op1 comes back
      pending count is now 0 !!
      response packet is sent back to user
      replicate op2
      pending count is now 1
      response for op2 comes back
      pending count is now 0 again

      We discussed this before, you need to add a method that is called by the session so it won't send the response packet until all responses return.

      3) ReplicationContext has heavy synchronization - this seems unnecessary and could cause heavy contention - you could use a ConcurrentLinkedQueue like we did previously with the replication response queue

      4) Why are replication actions being executed on a different executor? Why not the current thread?


        • 1. Re: A few observations on the committed replication code
          clebert.suconic


          1) I will think about this... will post more about that later today.

          2) Yes, we discussed about that, but on all the scenarios I though/tested after we talked, we would be okay. .
          When the counter goes to 0, that means everything is already replicated, so I could just flush the Callbacks, even there are more transactions to come.

          3 and 4) This is because I wanted to flush the executors as soon as the counters were = 0. I could remove that if I start having a complete event.

          • 2. Re: A few observations on the committed replication code
            timfox

             

            "clebert.suconic@jboss.com" wrote:

            1) I will think about this... will post more about that later today.

            2) Yes, we discussed about that, but on all the scenarios I though/tested after we talked, we would be okay. .
            When the counter goes to 0, that means everything is already replicated,


            Why do you think that? See my example in the previous post - you cannot guarantee it won't happen.


            so I could just flush the Callbacks, even there are more transactions to come.

            3 and 4) This is because I wanted to flush the executors as soon as the counters were = 0. I could remove that if I start having a complete event.


            I don't understand what you mean by flush the executors...

            • 3. Re: A few observations on the committed replication code
              timfox

              Also:

              // TODO: Verify Exception handling here with Tim

              • 4. Re: A few observations on the committed replication code
                clebert.suconic

                 

                send a message
                replicate op1
                pending count is now 1
                response for op1 comes back
                pending count is now 0 !!
                response packet is sent back to user
                replicate op2
                pending count is now 1
                response for op2 comes back
                pending count is now 0 again


                The response is not sent back to the user until the end.

                So.. this would follow the regular flow as without replication.

                Op1 will never be sent response back to user. Operation1 could be add to queue, what is okay. I mean.. it would follow the regular flow of execution.


                Also, for 1) The Replication is not only used at the context of Session. Say.. Paging or some of the clustering operations. So, that's why I'm using a Thread Local for that.

                • 5. Re: A few observations on the committed replication code
                  clebert.suconic

                  A real example:


                  While routing a message, you have this replicated:

                  I - AddRecord for the AddMessage
                  II - AddUpdate for the Reference

                  III - add a callback to Queue.add();

                  IV - Return the answer to the client



                  If the counter is 0 or III (I mean if I and II are already replicated), it's ok to add to the queue already. the same way it would be okay if it wasn't replicated. That's just respecting the regular flow of the events.