4 Replies Latest reply on Jul 17, 2012 1:23 AM by nl

(msoffice) sequencer issues

nl Jul 7, 2012 2:57 PM

Hi,

I am using modeshape-2.8.1.final and playing a bit around with sequencer. Now for the office sequencer I notice two issues that do not happen with others (e.g. image sequencer):

a) my sequencer nodes for office files do not get mixin type "mode:derived".

After storing the well-known jcr-spec.doc the sequencer root looks as follows:

sequenced jcr:primaryType=nt:unstructured

msoffice jcr:primaryType=nt:unstructured

Test jcr:primaryType=nt:unstructured

1d jcr:primaryType=nt:unstructured

jcr-spec.doc jcr:primaryType=nt:unstructured

msoffice:metadata jcr:primaryType=msoffice:metadata

- jcr:mimeType="application/msword"

- msoffice:author="Peeter Piegaze"

- msoffice:characters=440120

- msoffice:created=2012-07-05T12:53:00.000+02:00

- msoffice:creating_application="Microsoft Office Word"

- msoffice:last_printed=2009-06-09T11:16:00.000+02:00

- msoffice:pages=127

- msoffice:revision="2"

- msoffice:saved=2012-07-05T12:53:00.000+02:00

- msoffice:template="Normal.dotm"

- msoffice:thumbnail=binary (2,77KB, SHA1=bc4a3f709477f92af340da935b47e97ec64fe94b)

- msoffice:title="Content Repository API for Java™ Technology"

- msoffice:total_editing_time=0

- msoffice:words=69860

whereas as for my image it looks like this:

{noformat}
 sequenced jcr:primaryType=nt:unstructured
   images jcr:primaryType=nt:unstructured
     Test jcr:primaryType=nt:unstructured
       1e jcr:primaryType=nt:unstructured
         image:metadata jcr:primaryType=image:metadata jcr:mixinTypes=[mode:derived]
           - image:bitsPerPixel=24
           - image:formatName="JPEG"
           - image:height=768
           - image:numberOfImages=1
           - image:physicalHeightDpi=192
           - image:physicalHeightInches=4
           - image:physicalWidthDpi=192
           - image:physicalWidthInches=5
           - image:progressive=false
           - image:width=1024
           - jcr:mimeType="image/jpeg"
           - mode:derivedAt=2012-07-07T11:10:57.563Z
           - mode:derivedFrom=/files/DocumentManagerTest/1e/Jellyfish.jpg
{noformat}

b) If I update my document with a new version (basically I changed the document title, I'll get the following exception

{code}
2012-07-07 13:00:52,536 ERROR [modeshape-1-thread-3] org.modeshape.repository.sequencer.SequencingService: Error finding sequencers to run against node 2012-07-07T11:00:52.098Z @nl [store] - 1 changes
java.lang.NullPointerException
    at org.modeshape.graph.Graph$BatchResultsNode.addProperty(Graph.java:7290)
    at org.modeshape.graph.Graph$BatchResults.<init>(Graph.java:7151)
    at org.modeshape.graph.Graph$Batch.execute(Graph.java:4972)
    at org.modeshape.repository.sequencer.StreamSequencerAdapter.saveOutput(StreamSequencerAdapter.java:339)
    at org.modeshape.repository.sequencer.StreamSequencerAdapter.execute(StreamSequencerAdapter.java:232)
    at org.modeshape.repository.sequencer.SequencingService.processChange(SequencingService.java:498)
    at org.modeshape.repository.sequencer.SequencingService$RepositoryObserver$1.run(SequencingService.java:666)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
{code}

Again doing this with images, it works as expected. Both imports/updates are done with the very same code.

My sequencer configuration looks as follows:

{code:xml}
<mode:sequencer jcr:name="MS Office File Sequencer" mode:classname="org.modeshape.sequencer.msoffice.MSOfficeMetadataSequencer">
            <mode:description>Sequences Microsoft Office documents and presentations under '/files', extracting summary information and structure.</mode:description>        
            <mode:pathExpression>store:default:/files(//)(*.(xls|ppt|doc|docx)[*])/jcr:content[@jcr:data] => store:default:/sequenced/msoffice/$1 </mode:pathExpression>
        </mode:sequencer>
        <mode:sequencer jcr:name="Image Sequencer" mode:classname="org.modeshape.sequencer.image.ImageMetadataSequencer">
            <mode:description>Sequences images '/files', extracting summary information and structure.</mode:description>        
            <mode:pathExpression>store:default:/files(//)(*.(jpg|gif|png|tiff|bmp)[*])/jcr:content[@jcr:data] => store:default:/sequenced/images/$1</mode:pathExpression>
        </mode:sequencer>
{code:xml}

Does anyone have a clue on this?

Thanks a lot,

Niels

1. Re: msoffice sequencer issues

nl Jul 7, 2012 2:36 PM (in response to nl)

Another (strange) thing:

Though for both sequencers the pathExpressions are the nearly the same I see

/sequenced/msoffice/Test/1d/jcr-spec.doc/msoffice:metadata (full path) as sequenced node path for msoffice, but only
/sequenced/images/Test/1e/image:metadata (missing filename) as node for images.

Now: what's a reliable way to get a sequencer information for my imported file? Either mode:derived is missing or path is incorrect.
Actions
2. Re: (msoffice) sequencer issues

nl Jul 16, 2012 1:48 PM (in response to nl)
I think I know now what the problem is:

The MSO Sequencer creates an entry in its output like "{}jcr-spec.doc/{http://www.modeshape.org/msoffice/1.0}metadata".
See MSOfficeMetadataSequencer.java (Line 113-115):

{code}
Path docNode = pathFactory.createRelativePath(docName);
Path metadataNode = pathFactory.create(docNode, MSOfficeMetadataLexicon.METADATA_NODE);
output.setProperty(metadataNode, JcrLexicon.MIMETYPE, mimeType);
{code}

But the StreamSequencerAdapter only adds mode:derived for path with length <= 1.
See StreamSequencerAdapter.java (Line 405-407):

{code}
if (targetNodePath.size() <= 1 && addDerivedMixin && pathsOfTopLevelNodes.add(absolutePath)) {
properties = addDerivedProperties(properties, context, derivedFromPath);
}
{code}

Therefore mode:derived is never added to sequenced office nodes.

If I change the first snippet to:

{code}Path metadataNode = pathFactory.createRelativePath(MSOfficeMetadataLexicon.METADATA_NODE);
output.setProperty(metadataNode, JcrLexicon.MIMETYPE, mimeType);
{code}

it works as expected. (Note: The image sequencer works the same way).

This looks like a bug to me.
Actions
3. Re: (msoffice) sequencer issues

rhauch Jul 16, 2012 2:46 PM (in response to nl)

Hi. I don't recall why that check is there, but it does look like a bug. Would you mind filing a bug in our JIRA?
Actions
4. Re: (msoffice) sequencer issues

nl Jul 17, 2012 1:23 AM (in response to rhauch)

See MODE-1559

Thanks.
Actions

Go to original post