6 Replies Latest reply on Jan 15, 2014 8:31 AM by lestat79

    Garbage Collection DatabaseBinaryStore deletes all entries

    lestat79

      I'm using modeshape 3.6 and storing my binaries through the DatabaseBinaryStory (mysql). When the garbage collection for unused binaries runs it deletes all my content instead of just deleting the unused binaries.

      I took a loot at the queries, and found that this statement is run:

       

      # Remove all rows that have been unused for less than the supplied time

      remove_expired = DELETE FROM {0} WHERE usage_time < ?

       

      This does not seem correct, as all binaries are at least one hour old when garbage collection is run. Should this not take in account the usage_flag?

      Also, when I delete a node that has a binary, the usage_flag is not set to 0 as you would expect.

       

      Any suggestion why this is setup like this or what I'm doing wrong here?

        • 1. Re: Garbage Collection DatabaseBinaryStore deletes all entries
          rhauch

          There may be a problem with the code.

           

          But first, can you share the rest of your ModeShape and Infinispan configuration files? One reason why this might happen is that you're not properly persisting (or even evicting) the main repository content inside Infinispan, which means all of the references to any binary values persisted in the BinaryStore might actually go away, meaning the BinaryStore thinks all of the binary values are unused and thus, are removed when garbage collection occurs.

           

          How are you setting the binary values on properties? If you just call ValueFactory.createBinary(...) and don't do anything with the Binary value, then it will get garbage collected. Be sure that you're actually setting at least one property with each Binary value that you create, and then saving the session that you used to set those properties. If you forget any one of these steps, you're not properly persisting the references to the Binary values, and thus garbage collection will discover this and will remove the unused Binary values.

           

          Finally, an obvious reason why this might happen is that you're actually removing all of the properties that are holding onto the binary values. Perhaps double check that you're not doing this.

           

          Okay, if after reviewing all of that and it looks like you're still doing things correctly, then you could play with the "remove_expired" SQL statement. For example, you might try copying the database properties file into your classpath and updating the "removed_expired" property value to this:

           

               remove_expired = DELETE FROM {0} WHERE usage_time > ? AND usage_flag=0

          • 2. Re: Re: Garbage Collection DatabaseBinaryStore deletes all entries
            lestat79

            I attached our config files, these also include cluster setup. But the gargage collection problem also occurs without the clustering.

             

            As for saving, I'm sure we are doing this correctly. We also build a simple browser app, and after uploading a binary (nt:file) we can access the binary content by browsing through the nodes. If saving was not done correctly, this would not be possible, correct?

             

            Changing the query would stop the garbage collecting from deleting all binaries, but as the binaries do not get marked unused when I try to delete them (nodes do get deleted) it would mean that binaries would never be cleanup up.

             

            I am a bit puzzled by the fact that binaries do not get flagged unused after delete and that garbage collection ignores the unused flag. These seem like bugs. Can you confirm that this is not expected behaviour?

            • 3. Re: Re: Garbage Collection DatabaseBinaryStore deletes all entries
              rhauch

              Lars Weissenborn wrote:

               

              I attached our config files, these also include cluster setup. But the gargage collection problem also occurs without the clustering.

               

              As for saving, I'm sure we are doing this correctly. We also build a simple browser app, and after uploading a binary (nt:file) we can access the binary content by browsing through the nodes. If saving was not done correctly, this would not be possible, correct?

               

              Yes, that sounds like you are storing the data correctly, then.

              Changing the query would stop the garbage collecting from deleting all binaries, but as the binaries do not get marked unused when I try to delete them (nodes do get deleted) it would mean that binaries would never be cleanup up.

              No, that is an incorrect assumption. Binaries are garbage collected on a background process that by default runs once per day around midnight local time. (See the default binary GC configuration settings.) For testing you can configure that to be more frequent; see this test configuration for an example. You cannot make it more frequent than an hour, but you could programmatically configure the start clock time be a few minutes after you will start up the repository.

              I am a bit puzzled by the fact that binaries do not get flagged unused after delete and that garbage collection ignores the unused flag. These seem like bugs. Can you confirm that this is not expected behaviour?

              Configure the binary GC to be more frequent, and the binaries should get flagged. GC may take a while to run; but you can turn on debug to get the GC-related debug messages.

              • 4. Re: Re: Garbage Collection DatabaseBinaryStore deletes all entries
                lestat79

                It's not that garbage collection is not triggering, it's that it's behaviour seems incorrect. Here are the steps I'm taking:

                 

                • I start with an empty workspace, no nodes expect system nodes.
                • Add a file node (nt:file) to the workspace with its child nt:content and it's binary (a pdf in this case)
                • After save, if I query the string_table in db, I do find entries for the added nodes.
                • In content_store (binary store table), A new entry is added with the binary as payload and usage_flag set to 1

                 

                cid: f94d07ae15cb502c6d181de01ddabcbd5d6de248

                mime_type: application/pdf

                ext_text: some text

                usage_flag: 1

                usage_time: 2014-01-14 08:31:59

                playload: BLOB

                 

                • In our webapp, I navigate to the node and download binary. Everything oke thus far.
                • I add a second nt:file (one that I will not delete!)
                • I delete the first file node (delete node, save session)
                • In string_table, the node is correctly deleted
                • In content_store table, nothing has changed (usage_flag=1 for both binaries). This seems incorrect. Shouldn't the binary be marked as unused? No events in logs either
                • So now I wait for garbage collection to trigger (I have to wait at leas 1 hour as garbage collection will not clean up binaries that are younger than that.
                • There are two binaries in content_store. Garbage collections triggers.
                • In logs, I can see that garbage collection has run. This is the statement that was run: DELETE FROM content_store WHERE usage_time < ?
                • When I check content_store, both binaries have been deleted.

                 

                This of course, should not have happened. only the first binary is unused, but as the query does not take the usage_flag in account all binaries older than 1 hour get deleted. And even if the query would take into account the usage_flag, it wouldn't make any difference as the binaries do not get marked unused at all.

                • 5. Re: Re: Garbage Collection DatabaseBinaryStore deletes all entries
                  rhauch

                  Can you please log a bug in our JIRA for this one? It obviously needs more work to diagnose and fix. If you have a test case, please attach it since it would be very useful. Thanks.