4 Replies Latest reply on Jul 17, 2012 1:23 AM by nl

    (msoffice) sequencer issues

    nl

      Hi,

       

      I am using modeshape-2.8.1.final and playing a bit around with sequencer. Now for the office sequencer I notice two issues that do not happen with others (e.g. image sequencer):

       

      a) my sequencer nodes for office files do not get mixin type "mode:derived".

      After storing the well-known jcr-spec.doc the sequencer root looks as follows:

       

      sequenced jcr:primaryType=nt:unstructured

         msoffice jcr:primaryType=nt:unstructured

           Test jcr:primaryType=nt:unstructured

             1d jcr:primaryType=nt:unstructured

               jcr-spec.doc jcr:primaryType=nt:unstructured

                 msoffice:metadata jcr:primaryType=msoffice:metadata

                   - jcr:mimeType="application/msword"

                   - msoffice:author="Peeter Piegaze"

                   - msoffice:characters=440120

                   - msoffice:created=2012-07-05T12:53:00.000+02:00

                   - msoffice:creating_application="Microsoft Office Word"

                   - msoffice:last_printed=2009-06-09T11:16:00.000+02:00

                   - msoffice:pages=127

                   - msoffice:revision="2"

                   - msoffice:saved=2012-07-05T12:53:00.000+02:00

                   - msoffice:template="Normal.dotm"

                   - msoffice:thumbnail=binary (2,77KB, SHA1=bc4a3f709477f92af340da935b47e97ec64fe94b)

                   - msoffice:title="Content Repository API for Java™ Technology"

                   - msoffice:total_editing_time=0

                   - msoffice:words=69860

       

      whereas as for my image it looks like this:

       

      {noformat}

      sequenced jcr:primaryType=nt:unstructured

         images jcr:primaryType=nt:unstructured

           Test jcr:primaryType=nt:unstructured

             1e jcr:primaryType=nt:unstructured

               image:metadata jcr:primaryType=image:metadata jcr:mixinTypes=[mode:derived]

                 - image:bitsPerPixel=24

                 - image:formatName="JPEG"

                 - image:height=768

                 - image:numberOfImages=1

                 - image:physicalHeightDpi=192

                 - image:physicalHeightInches=4

                 - image:physicalWidthDpi=192

                 - image:physicalWidthInches=5

                 - image:progressive=false

                 - image:width=1024

                 - jcr:mimeType="image/jpeg"

                 - mode:derivedAt=2012-07-07T11:10:57.563Z

                 - mode:derivedFrom=/files/DocumentManagerTest/1e/Jellyfish.jpg

      {noformat}

       

      b) If I update my document with a new version (basically I changed the document title, I'll get the following exception

       

      {code}

      2012-07-07 13:00:52,536 ERROR [modeshape-1-thread-3] org.modeshape.repository.sequencer.SequencingService: Error finding sequencers to run against node 2012-07-07T11:00:52.098Z @nl [store] - 1 changes

      java.lang.NullPointerException

          at org.modeshape.graph.Graph$BatchResultsNode.addProperty(Graph.java:7290)

          at org.modeshape.graph.Graph$BatchResults.<init>(Graph.java:7151)

          at org.modeshape.graph.Graph$Batch.execute(Graph.java:4972)

          at org.modeshape.repository.sequencer.StreamSequencerAdapter.saveOutput(StreamSequencerAdapter.java:339)

          at org.modeshape.repository.sequencer.StreamSequencerAdapter.execute(StreamSequencerAdapter.java:232)

          at org.modeshape.repository.sequencer.SequencingService.processChange(SequencingService.java:498)

          at org.modeshape.repository.sequencer.SequencingService$RepositoryObserver$1.run(SequencingService.java:666)

          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)

          at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

          at java.lang.Thread.run(Unknown Source)

      {code}

       

      Again doing this with images, it works as expected. Both imports/updates are done with the very same code.

       

      My sequencer configuration looks as follows:

       

      {code:xml}

      <mode:sequencer jcr:name="MS Office File Sequencer" mode:classname="org.modeshape.sequencer.msoffice.MSOfficeMetadataSequencer">

                  <mode:description>Sequences Microsoft Office documents and presentations under '/files', extracting summary information and structure.</mode:description>       

                  <mode:pathExpression>store:default:/files(//)(*.(xls|ppt|doc|docx)[*])/jcr:content[@jcr:data] => store:default:/sequenced/msoffice/$1 </mode:pathExpression>

              </mode:sequencer>

              <mode:sequencer jcr:name="Image Sequencer" mode:classname="org.modeshape.sequencer.image.ImageMetadataSequencer">

                  <mode:description>Sequences images '/files', extracting summary information and structure.</mode:description>       

                  <mode:pathExpression>store:default:/files(//)(*.(jpg|gif|png|tiff|bmp)[*])/jcr:content[@jcr:data] => store:default:/sequenced/images/$1</mode:pathExpression>

              </mode:sequencer>

      {code:xml}

       

      Does anyone have a clue on this?

       

      Thanks a lot,

       

      Niels

        • 1. Re: msoffice sequencer issues
          nl

          Another (strange) thing:

           

          Though for both sequencers the pathExpressions are the nearly the same I see

           

          /sequenced/msoffice/Test/1d/jcr-spec.doc/msoffice:metadata (full path) as sequenced node path for msoffice, but only

          /sequenced/images/Test/1e/image:metadata (missing filename) as node for images.

           

          Now: what's a reliable way to get a sequencer information for my imported file? Either mode:derived is missing or path is incorrect.

          • 2. Re: (msoffice) sequencer issues
            nl

            I think I know now what the problem is:

             

             

            The MSO Sequencer creates an entry in its output like "{}jcr-spec.doc/{http://www.modeshape.org/msoffice/1.0}metadata".

            See MSOfficeMetadataSequencer.java (Line 113-115):

             

            {code}

            Path docNode = pathFactory.createRelativePath(docName);

            Path metadataNode = pathFactory.create(docNode, MSOfficeMetadataLexicon.METADATA_NODE);

            output.setProperty(metadataNode, JcrLexicon.MIMETYPE, mimeType);

            {code}

             

            But the StreamSequencerAdapter only adds mode:derived for path with length <= 1.

            See StreamSequencerAdapter.java (Line 405-407):

             

            {code}

            if (targetNodePath.size() <= 1 && addDerivedMixin && pathsOfTopLevelNodes.add(absolutePath)) {

                            properties = addDerivedProperties(properties, context, derivedFromPath);

            }

            {code}

             

            Therefore mode:derived is never added to sequenced office nodes.

             

            If I change the first snippet to:

             

            {code}Path metadataNode = pathFactory.createRelativePath(MSOfficeMetadataLexicon.METADATA_NODE);

            output.setProperty(metadataNode, JcrLexicon.MIMETYPE, mimeType);

            {code}

             

            it works as expected. (Note: The image sequencer works the same way).

             

            This looks like a bug to me.

            • 3. Re: (msoffice) sequencer issues
              rhauch

              Hi. I don't recall why that check is there, but it does look like a bug. Would you mind filing a bug in our JIRA?

              • 4. Re: (msoffice) sequencer issues
                nl

                See MODE-1559

                 

                Thanks.