We are planning to load 400Million of records in a data grid.
As part of the same started a POC (proof of concept) as I-Span Data Grid with 4 Million records
Infrastructure details:- (All running in one server)
One Hot Rod Client (Running for From JBOSS SOA AS)
Two Hot Rod Server (each got this heap seting @ startup -Xms4048m -Xmx4048m - Total 8 GB for Hot rod cluster)
The first 4 Million went fine . It got completed in 52 minutes.RES memory of Linux server was 2.6gb for each Hot Rod Server.
We validated currentNumberOfEntries=4000000 in each node.
We tried running the another set of 4 Million records . Unexpected things happen @ this run . Attached the complete logs.
Log snippets :-
(Incoming-1001,InfinispanCluster,dlt21app-48881) dlt21app-48881: dropped message from dlt21app-24281 (not in xmit_table), keys are [dlt21app-48881], view=[dlt21app-48881|2] [dlt21app-48881]
2010-11-20 20:00:44,972 ERROR [JoinTask] (Rehasher-dlt21app-24281) Caught exception!
java.lang.IllegalStateException: Join cannot be complete without rehash to finish (node dlt21app-24281 )
The actual size of file 4 Million records is 530 MB each. We are not even loading the complete record.
Each cache entry details are:-
Key - "Field1"+","+"Field2"(From the file) Max 13 bytes each
Value - POJO (A String which is the Key , primitive types - 4 Java int and 1 float).
Each cache entry - 36 Bytes (2 * 13 Bytes (Key and same key present in value) , 5 Primitive type - 10 Bytes ) .
Total shold be 288 MB - 36 Bytes * 8 Millon records is
As numOwners="2" it should not go beyond 600 MB max 1 GB
As I said earlier Sum of the 2 file is 1 GB but even after setting -Xms4048m -Xmx4048m for each hot rod server it got crashed.
Based on POC we are planning for hardware purchase ..
Please suggest how to handle memory in a better way and to do better performance
Note:- Attached the <infinispan/> xml for the both the servers
Even after setting numOwners=1
we got the below exception and one of hot rod server got crashed
2010-11-21 11:02:52,578 WARN [NAKACK] (OOB-8,InfinispanCluster,dlt21app-10508) dlt21app-10508: dropped message from dlt21app-950 (not in xmit_table), keys are [dlt21app-10508], view=[dlt21app-10508|2] [dlt21app-10508] ,InfinispanCluster,dlt21app-950) dlt21app-950: not member of view [dlt21app-10508|2] [dlt21app-10508]; discarding it 2010-11-21 11:02:32,467 WARN [FD] (OOB-15,InfinispanCluster,dlt21app-950) I was suspected by dlt21app-10508; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK 2010-11-21 11:02:52,745 ERROR [JoinTask] (Rehasher-dlt21app-950) Caught exception! java.lang.IllegalStateException: Join cannot be complete without rehash to finish (node dlt21app-950 ) at org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:82) at org.infinispan.distribution.RehashTask.call(RehashTask.java:52) at org.infinispan.distribution.RehashTask.call(RehashTask.java:32) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:88
Is that running every thing in a single server is an issue ??
Attached jmap heap dump for 2 Million records (Actual size of physical file - 266 MB). Before start of loading RES memory of infinispan is 137 MB. After completion of run its 2.2gb.
Before start Heap dump
#bytes - 29MB
After completion Heap dump:-
#bytes - 930MB
Attached heap dump (though jmap for hot rod server process alone) files jmap_logs.zip contains below files:-
test17_39_18.txt (Before Start)
test18_14_19.txt (After Completion)