4 Replies Latest reply on Sep 24, 2015 2:46 PM by nadirx

    How Infinispan works? (map of existing code for new developers)

    alexmaked

      September 2015 - I'm trying to find out how the essential parts of Infinispan are constructed.

       

      Looking into codebase, I suspect the main parts are in core/ subtree, but it still contains about half of Java code. It's divided into affinity/, atomic/, batch/ and other packages, but I don't see the place where the role of those packages would be described. What are the essential parts of the kernel system? What are the main interfaces to that kernel? What are libraries the code depends upon - and why it depends upon them?

       

      I believe answers to questions like these would be quite beneficial for any new developer - or engineering user - of Infinispan. It's likely not easy to answer shortly, but it's still very valuable.

        • 1. Re: How Infinispan works? (map of existing code for new developers)
          nadirx

          That's actually an interesting project. Is there a particular format you'd like to be used for this ? We would also need a volunteer

          • 2. Re: How Infinispan works? (map of existing code for new developers)
            rvansa

            You won't find any kernel package, but these are the parts you could start looking into.

            CacheImpl (implementation of the Cache, AdvancedCache etc.): transforms calls such as cache.put() into Command (e.g. PutKeyValueCommand) and passes this to InterceptorChain. InterceptorChain is made of several CommandInterceptors, selected according to current configuration. The interceptor chain uses the visitor pattern to handle all the commands. On the end of the chain there is the CallInterceptor, which invokes perform() method on the command and the return value then goes back through the interceptor chain.

            There are some commands that do not go through interceptor chain (these commands do not implement VisitableCommand interface) - their perform() method is invoked directly instead of entering the interceptor chain.

            Wrt. libraries - Infinispan uses JGroups to execute RPC on remote nodes. Also, JBoss marshalling is used for object <-> byte[] mapping.

             

            And yes, documenting this in a more structured way (and describing the 99% I haven't mentioned above) would be awesome

            • 3. Re: How Infinispan works? (map of existing code for new developers)
              alexmaked

              > Is there a particular format you'd like to be used for this ?

               

              Yes, I think answering questions (with potentially long answers) is good enough for now. Here are some immediate questions.

               

              So, I have a general understanding of what Infinispan is, and I would summarize that as "hash table for clusters, with cluster-specific good and bad features". Now, looking into code from Github I see 357k+ lines in *.java files (outside of test directories) - that's quite a bit. I'm reducing the scope by going to core/ subtree - there is still 155k+ lines of code; I went to core because of an "educated guess" for where the main parts of the project are. Now, how can I get more immediate information about what's there?

               

              The structure is (bunch of directories) + 10 top-level java files (two of them, AbstractDelegatingCache*, are marked as deprecated). And package-info file - which kinda confirms that this is the core part, but doesn't provide any details. So keeping looking.

               

              Opening Cache interface. Lots of Javadocs... on a quite specific level. Ugh, not quite what's needed, even though appropriate for the place - I'm just looking for different things. The definition of the Cache involves BasicCache, BatchingCache, FilteringListenable - so this relies on a lot of functionality existing elsewhere. Two from *.commons.api package, another from *.notifications... Does it mean this interface depends on classes in those other packages?.. Then it's not "a core by itself", but rather "core together with those other classes". So we have a tight coupling here. Hmm. "commons" is not even in the list of core/ subdirectories - so Cache depends on something outside of that... Not good for investigations.

               

              So may be there is a bunch of utility classes which main classes depend upon. Then those utilities are a sort of prerequisites for understanding main core classes. Where are they? util/ subtree's package-info says "utilities... not specific... helpers". Again, not very clear.

               

              The questions so far: is there a "main service" which Infinispan provides - is there an interface class for that? Suppose I'm using Infinispan as a library - where is the entry point? What is the list of "essential" services? I assume there should be some core services, and all other convenient functions should be built on top of them - and some other code, which is added as necessary but is not a requirement for building and running those core functions themselves.

               

              If this is not the case, then we probably have a "monolithic" structure - a lot of cross-dependent code, which isn't really split into modules. Could be justified, but still inconvenient - particularly for learning.

               

              In any case, there should be logic why the code was split into those subpackages. It is possible to learn each subpckage by tracing where classes from it are used, but quite inconvenient for initial learning, so  - can there be at least a better explanation for why a particular subpackage exists and where it is used in each package-info.java file?

               

              It would be very good if package explained classes in it on higher level - like, this iterator we need because this particular function not available in standard libraries but necessary for... etc.

               

              To summarize - what is the function which particular package implements? If it's a storage of code to be used externally - where and what are the functions? If there are any interfaces for several classes as a whole - what are they and where are they used?

              • 4. Re: How Infinispan works? (map of existing code for new developers)
                nadirx

                Ok, let's start from the most important top-level modules:

                 

                - commons: contains common classes/interfaces/utilities shared by both the embedded (core) and remote (client/hotrod-client) modules

                - core: contains the embedded cache manager implementation

                - client/hotrod-client: the hotrod client

                 

                Let's skip all the other modules for now and concentrate on core.

                The main entry point is the org.infinispan.manager.DefaultCacheManager. This object manages the lifecycle, configuration, components that are used by caches.

                A DefaultCacheManager is the main entry point. It can be initialized with a programmatic configuration or a declarative one (i.e. XML). Using a declarative configuration is just an alternate way to construct a programmatic configuration object.

                Once we have a configuration, a GlobalComponentRegistry is initialized: it holds a bunch of common (cross-cache) components. Example of these are the TransportManager and the Thread Pools. Then the individual caches are initialized. Each one of these also has a local ComponentRegistry for things like PersistenceManager, EvictionManager, etc, depending on the configuration. These object also form the interceptor stack. When you invoke an operation on a cache, the operation traverses the interceptor stack all the way through and back again. Each of the interceptors, and their position in the stack, determine some kind of behaviour. So if you have configured persistence, there will be a persistence interceptor which passes the operations to the PersistenceManager which takes care of reading/writing data to any cachestores that may be in use. Similarly for other components.

                At the core of a cache is a DataContainer which actually holds the data (inside a customized ConcurrentHashMap). Of interest is also the transport layer which, when using clustered caches, determines which of the other nodes to send commands to. Think of this part as the interceptor chain pausing on the local node and continuing on a remote one, and then resuming once the remote node has sent back a reply.

                 

                I hope this is enough to get you started.