1 Reply Latest reply on Dec 30, 2011 7:35 AM by sannegrinovero

How to use infinitespan5.1 compute grid in my application?

kiran1260 Dec 29, 2011 7:00 PM

Hi,

our project is bio-metric project, each and every bio-metric(finger print) information matching takes long time, so we need to use multiple systems and data should be available in all system as in-memory.

Data is organized as collections(sate wise) and also we have some filtering information (gender, other criteria ) if filtering satisfies we need to match with finger print template.

Data organization:

1. each person contains one unique ID. each person contains 10 fingers and each finger contains some filtering criteria(patterns, core information) then we created one index object.

2. data is organized in collections, each collections contains person information. so we created one Map<Long,Object> , hear we are storing each personID and filtering information of that person into map.

3. we have more than 30 collections.

4. each collection contains one unique ID.

5. cache structure is Cache<String, Map<Long,Object>>

6. Task should split into 30 sub tasks(bcz 30 collections) and each sub task should run parallel

7. how can I create a Task and sub tasks ?

8. How can I distribute the data to two diffrent nodes?(Let suppose we have two nodes, each node should act as compute as well as data grid. each node as different person information in collection map.

i.e ex: two collections and total person in coll1 is 10 and coll2 is 20. and person information in node1 and node2 should be partitioned(distributed) and each node contains two collections name.)

thanks & regards,

kiran

1. Re: How to use infinitespan5.1 compute grid in my application?

sannegrinovero Dec 30, 2011 7:35 AM (in response to kiran1260)

Hi Kiran,
your project looks like very interesting, but the description is too high level for me to be able to understand what you need to do or how to best use Infinispan.
As a general guidance, it looks like the Map/Reduce API could be useful - but since I didn't study the whole problem I can only point you to some options, it's hard to say what's "generally recommended" for your specific case.

With the Map/Reduce API you can define a filter criteria which picks on which key/values you want to process, and then the mapping function would be the expensive analysis you need to apply on each matching entry; for this to be simple enough you likely want to store all needed information for a single person in a single object value, not scatter them around (I'm guessing - it seems simpler to me to start this way).
Then each task would be summitted to the Infinispan executor and indeed performed in parallel.

Another option would be to look into the Indexing options, to make the entries you store in the grid searchable by various criterias; this provides you some more flexibility on different kinds of queries and advanced filtering, but index maintenance is an additional complexity and looks like overkill if you can solve this with Map/Reduce.
Actions