1 2 Previous Next 18 Replies Latest reply on Nov 10, 2012 5:11 PM by zeeman

    Infinispan with Hibernate and Hibernate Search

    zeeman

      I would like to use Infinispan in a clustered environment for a large project. The stack: AS7.1, Hibernate 4, and Hibernate search 4.1 and Seam 3. I know, too much gambling on Redhat products

       

      The requirment is to use Infinispan as a 2nd level distributed Hibernate cache. And to use Infinispan with Hibernate search to offer distributed indexing/searching.

       

      There are three app servers, two db servers behind a load balancer (think of them as one DB). Each app server has its own 2nd level cache and Lucene index.

       

      A write on any app server needs to be sent to other nodes so data stays in sync. The tricky part here is that both caching and indexing need updated data from other app server nodes. It makes sense to send that data once.

       

      1. Is the above a good architecture for high availability and is it feasible with Infinispan? Hibernate Search offers JMS master/slave, but that's a single point of failure on the master.
      2. How can the Infinishspan cache (with a file store) be used with Hibernate Search to store indexes? Hibernate search manual recommends to still use JMS. But I cannot due to #1
      3. What are the changes needed to make it a distributed 2nd level cache?
      4. Do I need to manually listen for data updates from other nodes; update the 2nd cache and index myself?
      5. Do you recommend another approach?
        • 1. Re: Infinispan with Hibernate and Hibernate Search
          sannegrinovero

          Hi zeeman,

          this is certainly possible, and there are many options.

           

          First thing you should be aware of is that AS7.1 provides Infinispan out of the box to make it as easy as possible to be used as 2nd level cache for Hibernate, but to cache database access only.

           

          To store the index via Hibernate Search + Infinispan you might be able to reuse the same Infinispan instance provided by AS7 but I didn't test that yet and we're still working on it; an easy alternative for now is to have Search start and manage a dedicated Infinispan instance; that shouldn't be much of a problem as they would need differently configured caches anyway.

           

          Do you need a strong guarantee for the index updates to be processed immediately? This if often a relaxable requirement, and if you can live with it the JMS master/slave approach is not a single point of failure, as wen the master is down updates will be safely stored in the JMS transactional queue, and will be applied correctly as soon as you restore the master node.

           

          3. https://docs.jboss.org/author/display/ISPN/Using+Infinispan+as+JPA-Hibernate+Second+Level+Cache+Provider

           

          4. no it's automatic - keep in mind however that usually the second level cache invalidates stale copies on other nodes (removes them from the cache, so they will be eventaully reloaded fresh) rather than updating them, as that is usually more effective.

           

          5. your architecture looks fine. There are alternatives, but let me know first what your requirements are in terms of immediate visibility of index changes - we might then discuss your options around the JGroups backend as well if needed.

          1 of 1 people found this helpful
          • 2. Re: Infinispan with Hibernate and Hibernate Search
            zeeman

            Hi Sanne,

             

            I already use Infinispan as 2nd level cache from AS7.1. But it's in local mode, I need to make it cluster aware. Is it something I need to configure in AS7.1 Infinispan? Nothing mentioned from your link.

             

            Index updates don't need to be immediate, few minutes behind is fine. As long as the cache and the index are not out of sync (index something not in cache or vice versa).

             

            I just like to keep things simple and efficient. I thought if I can use the same cache transport, allow nodes to be notified of changes from other nodes; then transfer the data once; and update the cache and indexes.

             

            Which approach do you recommend to accomplish the above?

             

            A question on the side; One requirment is to allow the cache to be evicted on demand. For example, if some values are changed in DB via a batch process, use JMX to evicat the cache. That should be possible with Infinispan right?

            • 3. Re: Infinispan with Hibernate and Hibernate Search
              zeeman

              Any update?

              • 4. Re: Infinispan with Hibernate and Hibernate Search
                galder.zamarreno

                All you need to do to cluster the 2LC in AS 7.1 is to cluster two AS 7.1 instances (https://docs.jboss.org/author/x/TABKAQ). This will automatically distribute the 2LC too. No need to modify anything in your Hibernate config. Btw, remember what's said in https://docs.jboss.org/author/x/LoJ7 with regards to 2LC.

                 

                Re: cache eviction

                 

                The best way is for you to execute: SessionFactory.getCache().evictEntityRegion(Class) for each entity class that you wanna evict. Do the same thing for all collections involved. This evition will evict caches cluster wide, if clustered.

                • 5. Re: Infinispan with Hibernate and Hibernate Search
                  galder.zamarreno

                  Actually, you can simply do: SF.getCache().evictEntityRegions(), evictCollectionRegions() and evictQueryRegions() which is simpler.

                  • 6. Re: Infinispan with Hibernate and Hibernate Search
                    zeeman

                    Thanks Galder for your answers.

                     

                    Yes, AS7.1 would cluster 2LC, but I did not want to send the data twice for cache and for index updates. I wanted to use the same transport so items to be updated in 2LC and Hibernate Search/Lucene are sent once.

                     

                    As you mentioned it's be simple to cluster 2LC and Lucene index separately but I'm trying to find an elegant way of sending updated items once on the wire. Then update the 2LC and Lucene index. So the question becomes what's the right design to do that?

                     

                    It's a large application with 200 tables. Only few tables are cachable, there will be many index updates. Will it be a good idea to send the items to be updated twice? (for 2LC and Lucene index). What do you suggest?

                    Index updates don't need to show up immediately (few minutes behind is ok).

                    • 7. Re: Infinispan with Hibernate and Hibernate Search
                      galder.zamarreno

                      zeeman wrote:

                       

                      Yes, AS7.1 would cluster 2LC, but I did not want to send the data twice for cache and for index updates. I wanted to use the same transport so items to be updated in 2LC and Hibernate Search/Lucene are sent once.

                       

                      As you mentioned it's be simple to cluster 2LC and Lucene index separately but I'm trying to find an elegant way of sending updated items once on the wire. Then update the 2LC and Lucene index. So the question becomes what's the right design to do that?

                      Well, to be honest, you should consider each use case separately. 2LC is there to cache data that's accessed frequently and to avoid having to go to the database. So, that means that you should be careful with what you cache. IOW, you should not cache everything. Cache what's gonna be needed often. So, not all Hibernate updates should result in a 2LC update. If you have entities/collections that keep changing, keeping a clustering a 2LC might make more harm than good. Also, remember that with 2LC, by default we don't send the entire cache updates around. Instead, we send invalidation messages which are pretty lightweight and only carry key information on them


                      Then you have the use case of indexing where I guess you do need all the updates and there you cannot use invalidation. You need replication or distribution.

                       

                      So, the burden of keeping the two solutions separate is not that high if you look at inner details. It might not be ideal but looks Ok to me.

                       

                      There's always room for improvement though, so if you wanna dig into it and come up with specific actions to improve things, feel free! It's open source

                      • 8. Re: Infinispan with Hibernate and Hibernate Search
                        sannegrinovero

                        [...]

                        I wanted to use the same transport so items to be updated in 2LC and Hibernate Search/Lucene are sent once.

                        [...]
                        Will it be a good idea to send the items to be updated twice? (for 2LC and Lucene index).

                        We won't send repeated data: Hibernate 2LC is going to send only the primary keys of entities, while the Hibernate Search backend is going to send only the String-encoded tokens which need to go in the index.

                         

                        Conceptually I agree that what you would like to do is cleaner as you won't need two JGroups channels open and wouldn't need to separate them via cluster name and different ports (so different JGroups configuration files), but in terms of functionality this won't buy you any difference, including no changes in expected bandwith usage.

                         

                        To simplify configuration we have https://hibernate.onjira.com/browse/HSEARCH-882

                        this was delayed for a couple of releases already; I would really like to push it in Search 4.1 but it's not looking likely.. if you would like to help on that we could make it.

                        • 9. Re: Infinispan with Hibernate and Hibernate Search
                          zeeman

                          Thank you guys for your insight. After reviewing my project's domain model usage here is what is exactly needed, would greatly appreicate it if you can point me in the right direction:

                           

                          1. There are two cached tables (environment config ) in 2LC that will be updated only during newer release deployments. A way to manually evict entries would be perfect (I,e via JMX).
                          2. There are 10 tables that are heavily used by the app, they need to be cached in 2LC and changes visible among app server nodes immediately.
                          3. There are about 120 tables in the project and most of them are indexed via Hibernate Search. Updated entities need to be visible in indexes for all nodes. Indexes are expected to grow large fast. Few seconds latency is fine. The app stores some values in indexes to avoid DB calls.

                           

                          The main requirment is high availability (At any point in time a server is is always available to serve requests) and efficiency (user equest < 1sec).

                           

                          Which Infinispan config make most sense?

                          Which HSearch config make most sense (backend, one index with JMS, or each server with its own index, etc...)?

                           

                          Sanne, I would not mind helping in hsearch-882 if it's  what would make sense for my requirments.

                           

                          Thanks again!

                          • 10. Re: Infinispan with Hibernate and Hibernate Search
                            sannegrinovero

                            Hi zeeman,

                            questions 1 and 2 sound like they are easily solved using Infinispan's 2LC, which is not under my expertise so I'll focus on question 3.

                             

                            The key point is "few seconds latency is fine": that's good as the async indexing backend is preferrable above the sync one.

                             

                            Then you can either synchronize indexes via filesystem replicas or a shared index on Infinispan (as described on master/slave indexing on the Hibernate Search documentation).

                             

                            If your main requirement is high availability you should go for JMS for the backend queue (since it has a persistent queue and stores/takes are transactional); you might want to use JGroups instead of JMS if you already use JGroups because you're using Infinispan.. it's likely going to be faster but you won't have the persistent queue.

                             

                            For the index storage: shared filesystem is easier and widely tested, but is typically used with 30 minutes / 1 hour period of index copies.

                            If you need index replicas to be in sync in "a couple of seconds", then you have no other choice than using Infinispan to store it, which is good in terms of performance but adds some complexity in your configuration.

                             

                            Note that storing the index in Infinispan will likely need a CacheLoader to permanently store the index when the cluster is shut down (unless you want to reindex); when using Hibernate I guess you have a JDBC datasource, so you could use the same database to store the index caches. Nice and easy for backups as you have to snapshot one database containing data+index.

                            1 of 1 people found this helpful
                            • 11. Re: Infinispan with Hibernate and Hibernate Search
                              zeeman

                              Hi Sanne,

                               

                              Thank you for your detailed answer.

                               

                              From what I got from your latest reply is that Hibernate search with async Inifinispan backend + Cache store will be the ideal way to for for my requirements.

                               

                              I have a Postgres DB, I would rather not store index in DB. Use the file system. But with this approach if each AS7 node will have its own Infinispan instance and file system cache store, then won't be there too much copying of Lucene indexes as entity updates happen?

                              I could see this working like you said, if index is copied after 15-30min. But that would be too much latency. If all infinispan instances use a shared file system for cache store, that becomes a single point of failure. This is exactly what I have been struggling with.

                               

                              Going JMS route, we still have the same problem of copying indexes from master to slave too often. Infinispan is the only option I have as you said for few seconds latency. But how to store the index becomes the problem.

                               

                              I think what is needed is the ability to use a shared file system with Inifinispan, but to allow ininispan to accept an alternate backup shared file system in case the first one fails.Or something of that sort.

                               

                              Am I complicating this too much? Is not what I'm looking for just a typical high availability solution? How do people do it? Or at least something very close to high availability

                              • 12. Re: Infinispan with Hibernate and Hibernate Search
                                sannegrinovero

                                If all infinispan instances use a shared file system for cache store, that becomes a single point of failure. This is exactly what I have been struggling with.

                                 

                                Consider that your cache store *is* a backup of Infinispan's contents. You might very well delete some files or have the disks crash and your app will still work via the Infinispan stored in-memory index and distributed replicas.

                                When setting up High availability you have to ask yourself how many copies you need, and how each one is reliable.. if two replicas is not good enough for you, maybe you need 3..

                                 

                                Technically your system is already "highly available": just by using Infinispan you have multiple replicas; I'm suggesting to use a CacheLoader just so that it's easier to do maintenance as you might want to shut down all nodes, in which case Infinispan would "forget" all it's state since it's in memory.

                                 

                                Also consider that the index is actually not critical. You can always rebuild it from the data, so what is important for you is the Posgres content.

                                 

                                I would personally avoid the filesystem cacheloader unless you can load-test it before, as it's not the fastest implementation available. Assuming your current point of failure is Postgres anyway, and you're having it scale in some way, I would consider it as a good candidate for storing. Databases do know how to store reliabily .

                                 

                                If you want to play with very high reliability, you could use the S3 CacheLoader and have it store index segments on Amazon's S3 .. which means in at least 3 different datacenters. This is just an example: you have options, but it seems more like you need to clarify your requirements as it will always be a tradeoff, involving also performance and overall complexity.

                                • 13. Re: Infinispan with Hibernate and Hibernate Search
                                  galder.zamarreno

                                  zeeman wrote:

                                   

                                  Thank you guys for your insight. After reviewing my project's domain model usage here is what is exactly needed, would greatly appreicate it if you can point me in the right direction:

                                   

                                  1. There are two cached tables (environment config ) in 2LC that will be updated only during newer release deployments. A way to manually evict entries would be perfect (I,e via JMX).

                                  I'm not aware of Hibernate having such thing, but you can easily build that.

                                  zeeman wrote:

                                   

                                  1. There are 10 tables that are heavily used by the app, they need to be cached in 2LC and changes visible among app server nodes immediately.

                                  The default Infinispan config for 2LC provides guarantees that data will be invalidated (as opposed to be replicated) immediately. Invalidation means that updates remove data from other nodes' memory, so next time data is needed, they'd go to the database. The default configuration (for Hibernate 4.x), can be found in https://github.com/hibernate/hibernate-orm/blob/master/hibernate-infinispan/src/main/resources/org/hibernate/cache/infinispan/builder/infinispan-configs.xml

                                   

                                  Changes to this config are possible but have to be done with care. The advanced configuration explained in https://docs.jboss.org/author/x/FgY5 shows you how you can use an alternative Infinispan configuration, and how you can tweak the cache that each entity/collection uses, but I'd strongly suggest you start with the defaults, and only when you want to start tweaking performance you worry about customizing it further.

                                  • 14. Re: Infinispan with Hibernate and Hibernate Search
                                    zeeman

                                    Thank you Galder and Sanne.

                                     

                                    I'll setup a test environment and test what you suggested. I'll report back my findings, I'm sure I'll be running into issues.

                                    1 2 Previous Next