Recommendation for IsolationLevel
kringdahl Jun 4, 2008 5:21 PMWe are seeing lots of replication timeout exceptions and have extensively played with the different isolation levels and locking schemes with little success. Things are all good with a single node cluster. Once we add a 2nd node to the cluster and attempt concurrent writes to the same node in the tree cache we see lots of timeout exceptions. I believe we need Serializable as an IsolationLevel since we need to ensure global synchronization. But, it does not seem to be locking the nodes appropriately. Environment is JBoss AS 4.2.2.GA and JBoss Cache 2.0.0.GA. A few questions about locking and transactions:
- With Serializable IsolationLevel, should this not prevent reads to any of the nodes touched in the cache until the transaction commits? When is the lock fetched?
- Can you recommend the appropriate configurations for a reasonably high transaction environment? Basically we are looking for the ability to synchronize the entire boundaries of a transaction. In general, a txn would take 10 seconds or less.
Here is our existing config:
<?xml version="1.0" encoding="UTF-8"?> <!-- ===================================================================== --> <!-- --> <!-- Sample TreeCache Service Configuration --> <!-- --> <!-- ===================================================================== --> <server> <!-- ==================================================================== --> <!-- Defines TreeCache configuration --> <!-- ==================================================================== --> <mbean code="org.jboss.cache.pojo.jmx.PojoCacheJmxWrapper" name="jboss.cache:service=TreeCache"> <depends>jboss:service=Naming</depends> <depends>jboss:service=TransactionManager</depends> <!-- Configure the TransactionManager --> <attribute name="TransactionManagerLookupClass">org.jboss.cache.transaction.GenericTransactionManagerLookup</attribute> <!-- Isolation level : SERIALIZABLE REPEATABLE_READ (default) READ_COMMITTED READ_UNCOMMITTED NONE --> <attribute name="IsolationLevel">SERIALIZABLE</attribute> <!-- Valid modes are LOCAL REPL_ASYNC REPL_SYNC INVALIDATION_ASYNC INVALIDATION_SYNC --> <attribute name="CacheMode">REPL_SYNC</attribute> <!-- Node locking scheme: OPTIMISTIC PESSIMISTIC (default) --> <attribute name="NodeLockingScheme">PESSIMISTIC</attribute> <!-- Just used for async repl: use a replication queue --> <attribute name="UseReplQueue">false</attribute> <!-- Replication interval for replication queue (in ms) --> <attribute name="ReplQueueInterval">0</attribute> <!-- Max number of elements which trigger replication --> <attribute name="ReplQueueMaxElements">0</attribute> <!-- Name of cluster. Needs to be the same for all TreeCache nodes in a cluster in order to find each other. Needs to be different in order to maintain separate caches --> <attribute name="ClusterName">kr-dtFabricCache</attribute> <!--Uncomment next three statements to enable JGroups multiplexer. This configuration is dependent on the JGroups multiplexer being registered in an MBean server such as JBossAS. --> <!-- <depends>jgroups.mux:name=Multiplexer</depends> <attribute name="MultiplexerService">jgroups.mux:name=Multiplexer</attribute> <attribute name="MultiplexerStack">fc-fast-minimalthreads</attribute> --> <!-- JGroups protocol stack properties. ClusterConfig isn't used if the multiplexer is enabled and successfully initialized. --> <attribute name="ClusterConfig"> <config> <UDP mcast_addr="228.10.10.10" mcast_port="50008" tos="8" ucast_recv_buf_size="20000000" ucast_send_buf_size="640000" mcast_recv_buf_size="25000000" mcast_send_buf_size="640000" loopback="false" discard_incompatible_packets="true" max_bundle_size="64000" max_bundle_timeout="30" use_incoming_packet_handler="true" ip_ttl="2" enable_bundling="false" enable_diagnostics="true" use_concurrent_stack="true" thread_naming_pattern="pl" thread_pool.enabled="true" thread_pool.min_threads="1" thread_pool.max_threads="25" thread_pool.keep_alive_time="30000" thread_pool.queue_enabled="true" thread_pool.queue_max_size="10" thread_pool.rejection_policy="Run" oob_thread_pool.enabled="true" oob_thread_pool.min_threads="1" oob_thread_pool.max_threads="4" oob_thread_pool.keep_alive_time="10000" oob_thread_pool.queue_enabled="true" oob_thread_pool.queue_max_size="10" oob_thread_pool.rejection_policy="Run"/> <PING timeout="2000" num_initial_members="3"/> <MERGE2 max_interval="30000" min_interval="10000"/> <FD_SOCK/> <FD timeout="10000" max_tries="5" shun="true"/> <VERIFY_SUSPECT timeout="1500"/> <pbcast.NAKACK max_xmit_size="60000" use_mcast_xmit="false" gc_lag="0" retransmit_timeout="300,600,1200,2400,4800" discard_delivered_msgs="true"/> <UNICAST timeout="300,600,1200,2400,3600"/> <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="400000"/> <AUTH auth_class="org.jgroups.auth.MD5Token" auth_value="desktone" token_hash="MD5"/> <pbcast.GMS print_local_addr="true" join_timeout="5000" join_retry_timeout="2000" shun="false" view_bundling="true" view_ack_collection_timeout="5000"/> <FRAG2 frag_size="60000"/> <pbcast.STREAMING_STATE_TRANSFER use_reading_thread="true"/> <!-- <pbcast.STATE_TRANSFER/> --> <pbcast.FLUSH timeout="0"/> </config> </attribute> <!-- Whether or not to fetch state on joining a cluster NOTE this used to be called FetchStateOnStartup and has been renamed to be more descriptive. --> <attribute name="FetchInMemoryState">false</attribute> <!-- The max amount of time (in milliseconds) we wait until the state (ie. the contents of the cache) are retrieved from existing members in a clustered environment --> <attribute name="StateRetrievalTimeout">15000</attribute> <!-- Number of milliseconds to wait until all responses for a synchronous call have been received. --> <attribute name="SyncReplTimeout">15000</attribute> <!-- Max number of milliseconds to wait for a lock acquisition --> <attribute name="LockAcquisitionTimeout">30000</attribute> <!-- Indicate whether to use region based marshalling or not. Set this to true if you are running under a scoped class loader, e.g., inside an application server. Default is "false". --> <attribute name="UseRegionBasedMarshalling">false</attribute> <!-- Cache Loader configuration block --> <attribute name="CacheLoaderConfig"> <config> <!-- if passivation is true, only the first cache loader is used; the rest are ignored --> <passivation>false</passivation> <preload>/</preload> <shared>true</shared> <!-- we can now have multiple cache loaders, which get chained --> <cacheloader> <class>org.jboss.cache.loader.JDBCCacheLoader</class> <properties> cache.jdbc.table.name=dht cache.jdbc.table.primarykey=dht_pk cache.jdbc.table.create=true cache.jdbc.table.drop=false cache.jdbc.fqn.column=fqn cache.jdbc.fqn.type=varchar(255) cache.jdbc.node.column=value cache.jdbc.node.type=LONGBLOB cache.jdbc.parent.column=parent_fqn cache.jdbc.datasource=java:/jdbc/FabricDS cache.jdbc.sql-concat=concat(1,2) </properties> <!-- whether the cache loader writes are asynchronous --> <async>false</async> <!-- only one cache loader in the chain may set fetchPersistentState to true. An exception is thrown if more than one cache loader sets this to true. --> <fetchPersistentState>false</fetchPersistentState> <!-- determines whether this cache loader ignores writes - defaults to false. --> <ignoreModifications>false</ignoreModifications> <purgeOnStartup>false</purgeOnStartup> </cacheloader> </config> </attribute> <!-- Buddy Replication config --> <attribute name="BuddyReplicationConfig"> <config> <!-- Enables buddy replication. This is the ONLY mandatory configuration element here. --> <buddyReplicationEnabled>false</buddyReplicationEnabled> <!-- These are the default values anyway --> <buddyLocatorClass>org.jboss.cache.buddyreplication.NextMemberBuddyLocator</buddyLocatorClass> <!-- numBuddies is the number of backup nodes each node maintains. ignoreColocatedBuddies means that each node will *try* to select a buddy on a different physical host. If not able to do so though, it will fall back to colocated nodes. --> <buddyLocatorProperties> numBuddies = 1 ignoreColocatedBuddies = true </buddyLocatorProperties> <!-- A way to specify a preferred replication group. If specified, we try and pick a buddy why shares the same pool name (falling back to other buddies if not available). This allows the sysdmin to hint at backup buddies are picked, so for example, nodes may be hinted topick buddies on a different physical rack or power supply for added fault tolerance. Note: to override this value, use system property desktone.cache.buddyName --> <buddyPoolName>myBuddyPoolReplicationGroup</buddyPoolName> <!-- Communication timeout for inter-buddy group organisation messages (such as assigning to and removing from groups, defaults to 1000. --> <buddyCommunicationTimeout>2000</buddyCommunicationTimeout> <!-- Whether data is removed from old owners when gravitated to a new owner. Defaults to true. --> <dataGravitationRemoveOnFind>true</dataGravitationRemoveOnFind> <!-- Whether backup nodes can respond to data gravitation requests, or only the data owner is supposed to respond. defaults to true. --> <dataGravitationSearchBackupTrees>true</dataGravitationSearchBackupTrees> <!-- Whether all cache misses result in a data gravitation request. Defaults to false, requiring callers to enable data gravitation on a per-invocation basis using the Options API. --> <autoDataGravitation>false</autoDataGravitation> </config> </attribute> </mbean> <!-- Uncomment to get a graphical view of the TreeCache MBean above --> <!-- <mbean code="org.jboss.cache.TreeCacheView" name="jboss.cache:service=TreeCacheView">--> <!-- <depends>jboss.cache:service=TreeCache</depends>--> <!-- <attribute name="CacheService">jboss.cache:service=TreeCache</attribute>--> <!-- </mbean>--> </server>