3 Replies Latest reply on Aug 11, 2017 6:06 AM by ryanemerson

    Infinispan 8.2.6 data inconsistency after recovering from split-brain-problem

    stehlik.michal

      Hi All,

       

      we have system with 8 {A, B, C, D, E, F, G, H} controllers in cloud. Some caches are distributed with owners="5". All caches are transactional. Each controller holds it's data in special configuration file. All data from this special configuration file is then put into cache. When we build cloud, everything works fine, all data are available. From time to time, it happened, that one of this controllers is disconnected from network (eg.: controller A). After this, we are in state that we have two separate clouds {A} and {B, C, D, E, F, G, H}, from point of view {B, C, D, E, F, G, H} all data are available in that topology. From point of view {A} some of the data are not available. After some time, controller {A} joins back to cloud. Old topology was {A} and {B, C, D, E, F, G, H}, new topology is {A, B, C, D, E, F, G, H}. We have created @Merged listener for CacheManager. This listener is then triggered and based on some internal rules, it commands controller {A} to put it's data again to cloud, since it's data stored in {B, C, D, E, F, G, H} can be outdated.

       

      We observing problem that when controller {A} join back, some data from caches are gone. Data which are not stored on controller {A}. Which data and how many records are lost is random, but we can reproduce this problem every time. Do you have some hints, what we should check? To me it's weird that random data (not belonging controller {A}) are affected when merge happened.

       

      Message was edited by: Michal Stehlik