6 Replies Latest reply on Nov 4, 2014 11:33 AM by wdfink

    Data file not shrinking + huge startup time

    david.novak

      Dear all,

      I am using Ispn6.0.2 with persistence in a single file, it has around 20million entries - some 120GB on SSD disk. So far, I was using local mode, the startup time being a few minutes for loading metadata of the entries to memory. Switching to distribution mode and starting a second node, the first node handed over around 1/2 of its data, which took some 4 hours (I'm OK with that) and then the initialization process ended alright.

       

      The problem is that the data file of the first node did not shrink, so I have two data files on two nodes: the first with 120GB and the second with 60GB. I closed the caches and killed the process. Next time, I started both nodes up, the second node started in minutes but the first one is starting again a few hours, probably checking all its entries and trying to hand them over to the second one (both processes have CPU and I/O activity).

       

      Could anybody help with what am I doing wrong, please? The configuration is attached. Thank you very much.

       

      David

        • 1. Re: Data file not shrinking + huge startup time
          rvansa

          Yes, SingleFileStore cannot shrink, that's a known problem and you'll find many posts about this.

           

          When starting the cluster, the second node always needs to get the most fresh data from the first node, it cannot rely on local data as these can be outdated.

          1 of 1 people found this helpful
          • 2. Re: Data file not shrinking + huge startup time
            david.novak

            Thanks Radim.

             

            Is there any way to start the nodes "simultaneously", so that none of them is "the second" or some configuration setting so that the check would not take part? Even if I use two (or more) nodes from the beginning (and thus the SingleFIleStore would be well distributed), would all but the first nodes check the freshness anyway (which takes hours)? Is it the same in Ispn7.0?

             

            Thank you, David

            • 3. Re: Data file not shrinking + huge startup time
              rvansa

              No, not yet. See [ISPN-3351] Controlled cluster shutdown with data restore from persistent storage - JBoss Issue Tracker - in JIRA this is scoped to 7.0, but it won't get to it - it's likely that this will be implemented in 8.0, as its quite demanded feature (and I also wonder why this was not implemented long ago).

              • 4. Re: Data file not shrinking + huge startup time
                david.novak

                I am still fighting with the distribution mode together with persisent data. Now I am using this scenario:

                I start empty node, start second empty nodes, insert 20M objects, kill them both (cache.stop() called) . Then:

                1) When I start the first (wait until it reads its SingleFileStore), start second (wait) and then test the number of objects (by summing up "cache.size()" on both nodes), a few objects is missing. No error logged during the whole proces...

                2) I add new node (wait until it takes over 1/3 of the whole database = 7M):

                   a) summing the cache.size() from three nodes gives me 27M. I understand that the SingleFileStore does not shrink but the cache shoud not have replicated data when number of owners is "1" - the config is attached to the first question in this post.

                   b) more data objects are missing in the store as I add nodes...

                Additionally: Can a node "politely" leave the system and hand its data over before it leaves?

                 

                Thank you for any hints.

                • 5. Re: Data file not shrinking + huge startup time
                  rvansa

                  1) could you try to do use cache.get() to see which entry is missing? Additionally, use entry retriever to iterate through the whole set?

                   

                  The node should leave politely when cacheManager.stop() is called. If that does not work well with numOwners=1, you're welcome to file JIRA.

                  • 6. Re: Data file not shrinking + huge startup time
                    wdfink