0 Replies Latest reply on Feb 17, 2017 6:05 AM by vladsz83

    Inter-site replications, offline event. Backup operations queue.

    vladsz83

      Hello!

       

      Could anyone please give me answers to my questions regarding cross-site replication.

      My goal is to keep a backup site. I configured the cluster, launched primary and backup sites which seem to work in general.

       

      My test is pretty simple:

      - A node in backup site starts and periodically prints size of its clone of the cache.

      - A node in primary site starts and fills the cache. Afterwards it does many randomly choosen puts/removes and periodically prints size of the cache.

      - During this process I break/restore the connection between the sites using different time intervals and watch whether cache size in the backup diverge from the primary one.

       

      I can't manage the following:

       

      1) Are configs of state transfer and backing-up actual? I don't see that options like

       

      takeOffline().afterFailures(1).minTimeToWait(1).backup()

      .stateTransfer().maxRetries(1).waitTime(1)

       

      are relevant. They seem do nothing. Even with this configuration my backup might receive all the operations (up to 300k-500k puts and many removes) after 5 minutes of connection absence while the primary node has been working.

       

      2) How to catch the site online/offline event? I need to know when it happens. I couldn't find offline state of the backup in the XSiteAdmin-bean until I set it manually. Looks like my backup never went offline in opinion of the main site.

       

      3) What is the queue of the operations to be passed to backup? Is it a cache? How to configure its length? As I mentioned before, I see that Infinispan is able to keep an operations log to back them up later. But how long? What is the threshold? Depending on various test conditions my backup might not get all the operations after connection breakage and requires the full state transfer.

       

      4) Is #3 related also related to config of any protocol within the stack? There are buffers, bundling and so on.

       

      5) How to avoid tonns of errors like

       

      siteMaster: exception sending bundled msgs: java.net.SocketTimeoutException: connect timed out

      PM org.jgroups.protocols.TP$BaseBundler sendMessageList

       

      Why does it spam even after I manually brought the backup offline ?

       

      The connection might be lost for hours and even more.

       

      ?

       

      Any ideas? Thanks