3 Replies Latest reply on Aug 13, 2010 5:59 PM by rhauch

    Does connector actively "scan" it's source

    simon.g

      Hi all

       

      I have a basic quesion regarding the idea behind connectors. Are they supposed to scan ther data source in an active way (e.g. from time to time), and become aware of new / changed content? The background of that question is twofold:

      • Would content that is not added via the JCR API be indexed for a full text search? (assuming there could be an index for binary content ...)
      • Would JCR events be triggered if content is added to the store without using the JCR API?

       

      If the above were the case, this would be a very powerful instrument indeed, but I am not sure whether this is the idea, since obviously rather tricky to implement. I guess for some connectors this should/could be possible (e.g. file system), but for others it might be impossible (e.g. DB with arbitrary structure).

       

      Best regards,

      Simon

        • 1. Re: Does connector actively "scan" it's source
          rhauch

          Any connector that is accessing an "external" system that can be updated independently of the connector is responsible for "scanning" or "observing" its source for changes. Hopefully, these systems have an event mechanism that the connector can simply tie into, and then translate any incoming events into the same graph-level structured that the connector is projecting to ModeShape. In other cases, the connector will need to periodically poll for changes. And sometimes, it will be difficult to know what's changed, simply because the system doesn't make it easy (or possible). Generally speaking, each connector will likely do it differently, and some connectors don't yet do it though they should. [1][2]

           

          By default, ModeShape will index all content (exposed by the connectors) upon initialization, and that content is thus searchable/queryable. All other changes to the indexes are based upon the events.  So as long as the connector generates the "correct" events [3], the indexes will also reflect the content.

           

          For example, when a connector changes content, the connector broadcasts the events for the change (actually, these are just the frozen requests that were submitted to the connector, after of course the connector has put in the 'actuals' information), and the search engine receives and processes these events by updating the indexes. In some cases, the indexes are simply updated with the information in the event (e.g., a new property was added, or a new node was created, or a node/branch was deleted). However, in other cases, the events don't contain sufficient information, so the search engine will reindex that "changed" content.

           

          [1] The SVN connector doesn't yet do an 'update' or 'log' to figure out what changed, though it could. We've just not implemented this yet.

          [2] The file system changes in Java 7 will make it much easier to watch (in a platform-independent manner) for changes to files and folders.

          [3] The "correct" events are simply those events that would have been generated if the same changes were made by the connector. There's nothing special in the events regarding the search indexes.

          • 2. Re: Does connector actively "scan" it's source
            simon.g

            Thanks a lot for this explanation. On the topic of events, I was trying this out with a repository configuration like below and the source code attached.

             

            I registered an EventListener on "/" and created a node after that, but received no events. Am I doing something wrong?

             

            Best regards and thanks for any help,

            Simon

             

            <?xml version="1.0" encoding="UTF-8"?>
            <configuration xmlns:mode="http://www.modeshape.org/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0">
                <mode:sources jcr:primaryType="nt:unstructured">
                    <mode:source jcr:name="abc-source"
                        mode:classname="org.modeshape.graph.connector.inmemory.InMemoryRepositorySource"
                        mode:retryLimit="3" mode:defaultWorkspaceName="workspace1" />
                </mode:sources>

                <mode:repositories>
                    <mode:repository jcr:name="abc">
                        <mode:source>abc-source</mode:source>
                        <jcr:nodeTypes mode:resource="/nodetypes.cnd" />
                    </mode:repository>
                </mode:repositories>
            </configuration>

            • 3. Re: Does connector actively "scan" it's source
              rhauch

              Your configuration file is fine, but there are two small but tricky issues with your code that kept it from working as expected.

               

              First, the "listen(...)" method is creating a new session, registering a listener on the new session, and then immediately logs out of that new session. Listeners are tied to the session through which they're registered, and are automatically unregistered once the session is closed. So you need to keep the listeners' session open for the duration that you want to use the listeners.[1]

               

              The second issue with your code is that the events will actually be delivered asynchronously to the save. In other words, the call to the session's "save()" method will return before the events caused by those changes are delivered to the listeners.[2] Thus, even if you didn't log out of the listener's session, your main(...) method -- and thus the JVM process -- will likely complete and terminate normally before your listeners get a chance to receive the events.

               

              I've attached a modified version of your original application, and this program runs successfully with your configuration file. Note that, in addition to fixing the two issues mentioned above, the code also shuts down ModeShape correctly. Note the static list of sessions used for listeners and the static list of RepositoryFactory instances. (Each of these lists will only have a single element as currently used in this test, but I coded them to work no matter how many times the 'init()' and 'listener()' methods are called.)

               

              Hopefully this helps!

               

              Best regards,

               

              Randall

               

              [1] This is the way the JCR spec was written, and it may seem strange from some viewpoints, because you generally want listeners around for a long time -- often for the lifetime of your app -- and it seems odd to have a session open for that long. However, the benefit of registering listeners on sessions is that the same authentication/authorization mechanism is used throughout JCR.

               

              [2] Again, this is per the JCR specification, but IIUC there are two main justifications for this:

              1. listeners cannot block or prevent save operations, and
              2. once an event is received, the changes have already been committed.