5 Replies Latest reply on Jun 5, 2012 5:54 AM by hchiorean

    Exception when trying to use Microsoft Office Document Sequencer

    jacobdoran

      Hi,

       

      I'm to create a simple repository for storing and searching microsoft office documents and images but I'm getting a strange error when trying to sequence Microsoft Office Documents. I have the following code to create my repository. Exception handling is stripped:

                 static final String repositoryName = "repository A";      static final String repositorySource = "source A";            static final String sourcePath =  "c:\\data\\modeshape\\";      JcrConfiguration config = new JcrConfiguration();      config.repositorySource(repositorySource)           .usingClass(DiskSource.class)           .setProperty("repositoryRootPath", sourcePath)           .setProperty("updatesAllowed", true)           .setDescription("The repository for our content")           .setProperty("defaultWorkspaceName", workspaceName);                     

               config.repository(repositoryName)           .setSource(repositorySource);

               config.textExtractor("Tika Text Extractors")

                                  .setDescription("Text extractors using Tika parsers")

                                     .usingClass(org.modeshape.extractor.tika.TikaTextExtractor.class)

                                    .setProperty("includedMimeTypes", "application/msword,application/vnd.oasis.opendocument.text");

       

                config.sequencer("Image Sequencer")                  .usingClass("org.modeshape.sequencer.image.ImageMetadataSequencer")                  .loadedFromClasspath()                  .setDescription("Sequences image files to extract the characteristics of the image")                  .sequencingFrom("//(*.(jpg|jpeg|gif|bmp|pcx|png|iff|ras|pbm|pgm|ppm|psd)[*])/jcr:content[@jcr:data]")                  .andOutputtingTo("/images/$1");                                config.sequencer("Microsoft Office Document Sequencer")                     .usingClass("org.modeshape.sequencer.msoffice.MSOfficeMetadataSequencer")                     .loadedFromClasspath()                     .setDescription("Sequences MS Office documents, including spreadsheets and presentations")                     .sequencingFrom("//(*.(doc|xls|docx|xlsx)[*])/jcr:content[@jcr:data]")                     .andOutputtingTo("/msoffice/$1");                                engine = config.build();                engine.start();

      Then to store a document I do the following:

                repository = engine.getRepository(repositoryName);               
                Session session = repository.login();
                   JcrTools tools = new JcrTools();
                   Node node = tools.findOrCreateNode(session, fileName, "nt:folder", "nt:file");
                   // Upload the file to that node ...
                   Node contentNode = tools.findOrCreateChild(node, "jcr:content", "nt:resource");
                   contentNode.setProperty("jcr:lastModified", Calendar.getInstance());             
                   
                   Binary binaryValue = session.getValueFactory().createBinary(stream);
                   contentNode.setProperty("jcr:data", binaryValue);
           
                   session.save();
      

      For image files it works pefectly but office documents I get the following error:

      org.modeshape.repository.sequencer.SequencerException: java.lang.IllegalArgumentException: The bytes argument may not be null
           at org.modeshape.repository.sequencer.StreamSequencerAdapter.execute(StreamSequencerAdapter.java:198)
           at org.modeshape.repository.sequencer.SequencingService.processChange(SequencingService.java:498)
           at org.modeshape.repository.sequencer.SequencingService$RepositoryObserver$1.run(SequencingService.java:666)
           at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
           at java.lang.Thread.run(Thread.java:662)
       

      I'm a newbie so probably doing something stupid. I would really appreciate some pointers as to what I'm doing wrong.

       

      Thanks in advance,

      Jacob