14 Replies Latest reply on Mar 19, 2010 4:32 PM by clebert.suconic

    Journal as a separate project

    clebert.suconic

      The Journal itself has minimal dependencies on the core. The only dependencies I remember are HornetQException and Logging. (The native methods has actually references to those areas).

       

       

      There are other projects that currently need this kind of Journal. We could either make it a separate JAR with another maven, or make it a separate project.

       

      I think the best would be a separate project, so if a project needs a new feature or a specific fix, we could make it a new release without wait the full release cycle of HornetQ.

       

      Any considerations and ideas?

       

       

      I will need to schedule time for this activity.

        • 1. Re: Journal as a separate project
          timfox

          Eventually, we can roll it out as a separate project, however initally we can just add an extra ant target to create a jar, which we can upload to Maven as part of the release process.

           

          Can someone add a JIRA task for this?

          • 2. Re: Journal as a separate project

            Just a side note the current hornetq maven artifacts have no dependency details. I remember some discussion a while back about maven support or even moving hornetq to use maven directly. As much as we all hate maven managing this kind of multi-module project is much simpler and if the goal is to make using the core-api easier having a single dependency is a pom makes it simple to get started.  Then if I want the JMS support I just add that dependency. Whereas now I need to now what the dependencies for hornetq-core are.

             

            Tim if you want to create me a branch I will have a go at maven-ising it.....

             

            Sorry to hi-hack this thread

             

            --Aaron

            • 3. Re: Journal as a separate project
              clebert.suconic
              • 4. Re: Journal as a separate project
                ataylor

                I think we should do as Tim suggests. just add an extra target and upload the jar (or whatever) into the maven repos.

                 

                Regarding the dependency on HornetQException, is there another way to do this. This means that you cant build the native journal libs without the HornetQ API and we cant build HornetQ without the native libs. I am not sure how this would work if the journal was completely separate.

                • 5. Re: Journal as a separate project
                  clebert.suconic

                  Yes, I will have to remove the dependency on HornetQException.

                   

                  I think the dependency on HornetQLogger is still fine, as the logger is a separate project. I will check on that.

                  • 6. Re: Journal as a separate project
                    clebert.suconic
                    • 7. Re: Journal as a separate project
                      jhalliday

                      I've been looking at the current journal implementation and I've found the dependencies on other modules to be more extensive than they were when I looked at an earlier prototype. For example, there is now a transitive dependency on netty, caused by the use of HornetQBuffer in EncodingSupport. Why do I need to pull in a networking library just to write to disk? Anyhow, the point is I think disentangling the journal code from core is going to be a bigger job than expected.

                       

                      I'm also struggling to understand some of the methods (what the heck is a perf blast?) and config options (e.g. JournalImpl constructor params), how to pick sensible default values for them and the likely performance implications of changing those values. There is going to have to be some kind of documentation at that level to make the journal useful standalone.

                       

                      I can't speak for other potential users, but from my perspective I can live with pulling in the whole of hornetq-core for now (It's just a mvn config) during development, but I can't make much more progress without better docs, so if this has to be triaged to fit in the current release schedule then that's my preferred work priority.

                      • 8. Re: Journal as a separate project
                        jhalliday

                        As part of a longer term endeavor to make the journal useful for other applications, some API enhancements may also be desirable. For example, I'd like a saveOrUpate function which would intelligently select between addRecord and updateRecord as needed so that I don't have to keep track of that myself, duplicating work the journal already does internally. And whilst I'm on the subject, some documentation of the APIs thread safety properties would be helpful. As far as I can tell, Bad Things may happen if I e.g. call addRecord and then call updateRecord without waiting for completion of the addRecord for the same record id, as there is no ordering guarantee.

                        • 9. Re: Journal as a separate project
                          clebert.suconic

                          I will look on the dependency as part of the JIRA.

                           

                          Also, in regard to the saveOrUpdate you need... please keep in mind that on the journal.. updates are always appends with the same ID.

                           

                          When you replay the journal you're going to receive both add and updates.

                           

                          Also, if this is a long living information, I would advice a delete and a new add (It could be done as part of a same journal transaction):

                           

                          - Say the first append is on a file. Now the second append is on a different journal file. You will have a dependency between these two files.. you will probably kick in the compactor sooner If the record is never deleted 

                          • 10. Re: Journal as a separate project
                            timfox

                            As clebert mentions, we'll do all of this as part of the JIRA.

                             

                            Problem we have right now, as always, is limited resources, and unfortunately this doesn't come up to high on the priority list.

                            • 11. Re: Journal as a separate project
                              jhalliday

                              > Also, in regard to the saveOrUpdate you need... please keep in mind that on the journal.. updates are always appends with the same ID.

                               

                              I'm not too clear what your point is there. You seem to be just confirming that there is no effective difference between add and update, so why should the API make me choose between them? I want to tell the journal 'put this on disk' and have it figure out the rest - it has all the info it needs to do that.

                               

                              > Also, if this is a long living information, I would advice a delete and a new add (It could be done as part of a same journal transaction):

                               

                              Thanks for the hint, it's handy in the short term. But in terms of the API design discussion: Don't bother me with your implementation details! As a user I don't care about multiple files, compactors, etc. At most I'll go out of my way to provide a 'boolean isLongLived' flag with the write method call, but deciding how to implement that optimally is the Journal's problem. A Journal impl that reserves space for updates in the same file as the inital write may actually suffer a performance hit if the user utilises a remove and re-add model as you suggest.

                              • 12. Re: Journal as a separate project
                                clebert.suconic

                                The journal is a replay, right?

                                 

                                The journal itself doesn't know if the update is a replacement or an information add.

                                 

                                On the first record you could be informing someone's name. On the update record you could be information someone's address. So, you would in fact need both records to recover the whole data.

                                 

                                 

                                 

                                I will look into providing a addOrReplace function, that will delete the previous record if existent and replace by a new one. This would solve both the problem at long living data and with updates that will contain the whole data (every information stored previously should be removed).

                                • 13. Re: Journal as a separate project
                                  jhalliday

                                  > The journal is a replay, right?

                                   

                                  It's a data storage mechanism that happens to use use a replay implementation. Having to know details of that impl is overcomplicating matters for many use cases. Mostly I just want to recover the final state, not all the intermediate events that lead to it.

                                   

                                  > The journal itself doesn't know if the update is a replacement or an information add.

                                   

                                  The difference between:

                                   

                                  void appendAddRecord(long id, byte[] record)

                                   

                                  and

                                   

                                  void appendUpdateRecord(long id, byte[] record)

                                   

                                  is that the latter will barf if you call it before having called appendAddRecord. Other than that they do the same thing: put the byte[] onto the disk. All I'm after is:

                                   

                                  void appendWriteRecord(long id, byte[] record) {

                                    if(alreadySeenId(id)) {

                                      appendUpdateRecord(long id, byte[] record)

                                    } else {

                                      appendAddRecord(long id, byte[] record)

                                  }

                                   

                                  > On the first record you could be informing someone's name. On the update record you could be information someone's address. So, you would in fact need both records to recover the whole data.

                                   

                                  In my view that represents bad design of the app. The id key represents the data you want to retrieve, which is an entire person. You serialize the person (name and address) into the byte[]. The update can overwrite the original data (as long as it does so atomically). If you want to treat the name and address as independent entities, give them different id keys.

                                   

                                  BTW, the implementation of RecordInfo.equals is not consistent with what you say. There may be more than one event (RecordInfo) for the same id and they are NOT the same - they may represent different events e.g. multiple updates.

                                   

                                  The fundamental problem here is that I don't want an event stream, I want a data store. If that data store is implemented using an event stream that's fine, but that's an implementation detail not something the API should force me to be aware of. Take persisting your system config params for example: you don't need to re-read all the previous values at startup, you just need the latest set. The existing code is a great starting point for building that data store, but it's not the whole finished article because that's not its intended purpose in life.

                                   

                                  So what I really want here is not additions to the Journal API to turn it into something it was never intended to be, but instead a new higher level data store API whose implementation is written in terms of calls to the Journal. The question is: what's the best home for that code? If you guys have no need for it it's pointless putting it in HornetQ, but if the 'standalone journal library' is going to become something generally useful to other projects then there needs to be a way to add utility code like that. Which rather implies that just repacking the journal classes as a separate artifact may not be the best way to go in the long run.

                                  • 14. Re: Journal as a separate project
                                    clebert.suconic

                                    All we need is the replace method that would have the semantic you're expecting. Then the same journal could be used to replay events or just replace data as you expect. On that case the journal would just be a data store.

                                     

                                    I may change some of the method names to couple with that better.

                                     

                                     

                                    I would say it's a better choice to reuse the journal between hornetq and other usages. (even if we have to adapt some of the method names..etc).

                                     

                                    You would be having hornetQ as good tester of the same project you're using.