1 Reply Latest reply on Nov 18, 2011 5:39 AM by Sanne Grinovero

    Hibernate search - Infinispan - Jgroups optimization on Amazon clustering environment

    Dzung Leonhart Newbie

      Hi Inifispan team,

       

      I'm using Hibernate Search with Infinispan as directory provider and Jgroups as synchronization backend for clustering on Amazon EC2.

      My project is already on air now, and it's great for me to have your advices on performance tuning.

       

      Here're my configurations:

       

           1. Spring bean

       

           <bean id="sessionFactory" class="org.springframework.orm.hibernate3.annotation.AnnotationSessionFactoryBean">  

              <property name="hibernateProperties">

                  <props>

                      <prop key="hibernate.dialect">org.hibernate.dialect.MySQLDialect</prop>

                      <prop key="hibernate.search.default.directory_provider">infinispan</prop>

                      <prop key="hibernate.search.infinispan.configuration_resourcename">hibernate-search-infinispan.xml</prop>

                  </props>

              </property>      

          </bean>

       

           2. hibernate-search-infinispan.xml

               

      <?xml version="1.0" encoding="UTF-8"?>

      <infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

          xsi:schemaLocation="urn:infinispan:config:5.0 http://www.infinispan.org/schemas/infinispan-config-5.0.xsd"

          xmlns="urn:infinispan:config:5.0">

       

       

          <!-- *************************** -->

          <!-- System-wide global settings -->

          <!-- *************************** -->

          <global>

              <!-- Duplicate domains are allowed so that multiple deployments with default

                  configuration of Hibernate Search applications work - if possible it would

                  be better to use JNDI to share the CacheManager across applications -->

              <globalJmxStatistics enabled="false"

                  cacheManagerName="HibernateSearch" allowDuplicateDomains="true" />

       

              <transport clusterName="infinispan-hibernate-search-cluster"

                  distributedSyncTimeout="60000"

                  transportClass="org.infinispan.remoting.transport.jgroups.JGroupsTransport">

                  <properties>

                      <property name="configurationFile" value="jdbc_ping.xml" />

                  </properties>

              </transport>

       

              <!-- Used to register JVM shutdown hooks. hookBehavior: DEFAULT, REGISTER,

                  DONT_REGISTER. Hibernate Search takes care to stop the CacheManager so registering

                  is not needed -->

              <shutdown hookBehavior="DONT_REGISTER" />

          </global>

       

       

          <!-- *************************** -->

          <!-- Default "template" settings -->

          <!-- *************************** -->

          <default>

              <locking lockAcquisitionTimeout="60000" writeSkewCheck="false"

                  concurrencyLevel="500" useLockStriping="false" />

       

              <!-- Invocation batching is required for use with the Lucene Directory -->

              <invocationBatching enabled="true" />

       

       

              <!-- This element specifies that the cache is clustered. modes supported:

                  distribution (d), replication (r) or invalidation (i). Don't use invalidation

                  to store Lucene indexes (as with Hibernate Search DirectoryProvider). Replication

                  is recommended for best performance of Lucene indexes, but make sure you

                  have enough memory to store the index in your heap. Also distribution scales

                  much better than replication on high number of nodes in the cluster. -->

              <clustering mode="replication">

                  <!-- Prefer loading all data at startup than later -->

                  <stateRetrieval timeout="60000" logFlushTimeout="60000"

                      fetchInMemoryState="true" alwaysProvideInMemoryState="true" />

                  <!-- Network calls are synchronous by default -->

                  <sync replTimeout="60000" />

              </clustering>

              <jmxStatistics enabled="false" />

              <eviction maxEntries="-1" strategy="NONE" />

              <expiration maxIdle="-1" />

          </default>

       

       

          <!-- ******************************************************************************* -->

          <!-- Individually configured "named" caches. -->

          <!-- -->

          <!-- While default configuration happens to be fine with similar settings

              across the -->

          <!-- three caches, they should generally be different in a production environment. -->

          <!-- -->

          <!-- Current settings could easily lead to OutOfMemory exception as a CacheStore -->

          <!-- should be enabled, and maybe distribution is desired. -->

          <!-- ******************************************************************************* -->

       

       

          <!-- *************************************** -->

          <!-- Cache to store Lucene's file metadata -->

          <!-- *************************************** -->

          <namedCache name="LuceneIndexesMetadata">

              <clustering mode="replication">

                  <stateRetrieval fetchInMemoryState="true"

                      logFlushTimeout="60000" />

                  <sync replTimeout="60000" />

              </clustering>

          </namedCache>

       

       

          <!-- **************************** -->

          <!-- Cache to store Lucene data -->

          <!-- **************************** -->

          <namedCache name="LuceneIndexesData">

              <clustering mode="replication">

                  <stateRetrieval fetchInMemoryState="true"

                      logFlushTimeout="60000" />

                  <sync replTimeout="60000" />

              </clustering>

          </namedCache>

       

       

          <!-- ***************************** -->

          <!-- Cache to store Lucene locks -->

          <!-- ***************************** -->

          <namedCache name="LuceneIndexesLocking">

              <clustering mode="replication">

                  <stateRetrieval fetchInMemoryState="true"

                      logFlushTimeout="60000" />

                  <sync replTimeout="60000" />

              </clustering>

          </namedCache>

       

      </infinispan>

       

           3. jdbc_ping.xml

       

      <config xmlns="urn:org:jgroups" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

          xsi:schemaLocation="urn:org:jgroups JGroups-2.12.xsd">

          <TCP bind_port="${jgroups.tcp.port:7800}"

              loopback="true" port_range="30" recv_buf_size="20000000"

              send_buf_size="640000" discard_incompatible_packets="true"

              max_bundle_size="64000" max_bundle_timeout="30" enable_bundling="true"

              use_send_queues="true" sock_conn_timeout="300" enable_diagnostics="false"

       

              thread_pool.enabled="true" thread_pool.min_threads="2"

              thread_pool.max_threads="30" thread_pool.keep_alive_time="5000"

              thread_pool.queue_enabled="false" thread_pool.queue_max_size="100"

              thread_pool.rejection_policy="Discard" oob_thread_pool.enabled="true"

              oob_thread_pool.min_threads="2" oob_thread_pool.max_threads="30"

              oob_thread_pool.keep_alive_time="5000" oob_thread_pool.queue_enabled="false"

              oob_thread_pool.queue_max_size="100" oob_thread_pool.rejection_policy="Discard" />

       

          <JDBC_PING connection_driver="com.mysql.jdbc.Driver"

              connection_username="root" connection_password="root"

              connection_url="jdbc:mysql://localhost/clientdb2" level="debug" />

       

       

          <MERGE2 max_interval="30000" min_interval="10000" />

          <FD_SOCK />

          <FD timeout="3000" max_tries="3" />

          <VERIFY_SUSPECT timeout="1500" />

          <pbcast.NAKACK use_mcast_xmit="false" gc_lag="0"

              retransmit_timeout="300,600,1200,2400,4800" discard_delivered_msgs="false" />

          <UNICAST timeout="300,600,1200" />

          <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"

              max_bytes="400000" />

       

          <pbcast.GMS print_local_addr="false" join_timeout="7000"

              view_bundling="true" />

          <UFC max_credits="2000000" min_threshold="0.10" />

          <MFC max_credits="2000000" min_threshold="0.10" />

          <FRAG2 frag_size="60000" />

          <pbcast.STREAMING_STATE_TRANSFER bind_port="7850"/>

      </config>

       

      ---------------------

       

      There're several things that make me concern with these configurations:

       

           1. When I modify the search data, changes are updated almost immediately to other nodes in the cluster. Should I add some latency to this process and how can I do that?

       

           2. I see these lines spamming the logs:

       

      2011-11-18 16:08:19,852 [Timer-2,infinispan-hibernate-search-cluster,HK6HZP1-29948] DEBUG org.jgroups.protocols.JDBC_PING - Removed bf97191e-13f4-b3f9-0e45-41e902f5131f for clustername infinispan-hibernate-search-cluster from database.

      2011-11-18 16:08:19,855 [Timer-2,infinispan-hibernate-search-cluster,HK6HZP1-29948] DEBUG org.jgroups.protocols.JDBC_PING - Registered bf97191e-13f4-b3f9-0e45-41e902f5131f for clustername infinispan-hibernate-search-cluster into database.

      2011-11-18 16:08:21,440 [Timer-4,infinispan-hibernate-search-cluster,HK6HZP1-29948] DEBUG org.jgroups.protocols.JDBC_PING - Removed bf97191e-13f4-b3f9-0e45-41e902f5131f for clustername infinispan-hibernate-search-cluster from database.

      2011-11-18 16:08:21,443 [Timer-4,infinispan-hibernate-search-cluster,HK6HZP1-29948] DEBUG org.jgroups.protocols.JDBC_PING - Registered bf97191e-13f4-b3f9-0e45-41e902f5131f for clustername infinispan-hibernate-search-cluster into database.

      2011-11-18 16:08:41,698 [Timer-4,infinispan-hibernate-search-cluster,HK6HZP1-29948] DEBUG org.jgroups.protocols.JDBC_PING - Removed bf97191e-13f4-b3f9-0e45-41e902f5131f for clustername infinispan-hibernate-search-cluster from database.

      2011-11-18 16:08:41,701 [Timer-4,infinispan-hibernate-search-cluster,HK6HZP1-29948] DEBUG org.jgroups.protocols.JDBC_PING - Registered bf97191e-13f4-b3f9-0e45-41e902f5131f for clustername infinispan-hibernate-search-cluster into database.

      2011-11-18 16:08:50,585 [Timer-5,infinispan-hibernate-search-cluster,HK6HZP1-29948] DEBUG org.jgroups.protocols.JDBC_PING - Removed bf97191e-13f4-b3f9-0e45-41e902f5131f for clustername infinispan-hibernate-search-cluster from database.

          

           It seems each node pings DB continously to register it to the cluster. Should I increase the interval ping time, and how can I do that?

       

           3. I really have difficulty in understanding those configurations. Could you give some documents to know more about them?

       

      Thanks a lot and Best regards,

      Dung Ngo.

        • 1. Re: Hibernate search - Infinispan - Jgroups optimization on Amazon clustering environment
          Sanne Grinovero Master

          1. When I modify the search data, changes are updated almost immediately to other nodes in the cluster. Should I add some latency to this process and how can I do that?

          Which version of Hibernate Search are you using? You can consider using the option exclusive_index_use, use an asynchronous backend, or ideally use the JMS/JGroups backend as described on the Hibernate Search documentation.

           

          JDBC_PING should not register & deregister itself that often, I'm not sure why you see that. Do you see some Infinispan messages being logged as well about frequent view changes?

           

          I really have difficulty in understanding those configurations. Could you give some documents to know more about them?

          The documentation is to be found in the Infinispan, JGroups and Hibernate Search project separately: please look at each of the websites.