7 Replies Latest reply on Mar 26, 2013 12:28 PM by eric.wittmann

    Using S-RAMP as a general purpose content repository

    lkrejci

      I'd like to discuss with you a slightly "twisted" way of using S-RAMP repository as general purpose store for any kind of content, not just SOA-related artifacts.

      IMHO all the pieces are almost there:

      • The notion of mime types
      • Rich querying
      • classification of artifacts

       

      What I am missing though is a more robust handling of mime types and detection of the "features" of the uploaded content that can only be uploaded as generic "Document" artifact type.

       

      For example, in S-RAMP 0.1.1, whatever one uploads as a "Document" will end up with the "application/octet-stream" by default or with whatever mime type the client supplies (see https://issues.jboss.org/browse/SRAMP-178). This is despite of the fact that there is already support for extension-based mimetype detection (which I would love to see enhanced with at least mime-magic detection - see https://github.com/metlos/s-ramp/commits/mimetype-detection (which I have not sent in as a pull request yet, because I'd like to have it discussed here first)).

       

      Robust detection of mimetypes could then theoretically be used for a more "intelligent" upload mechanism, where the server itself would be able to assess the type of the artifact being uploaded (today, if you upload an XML file as a generic "Document", it will stay as such even though the server is capable of telling its correct artifact type).

       

      While the distinction of the artifacts in S-RAMP is based on the notion of artifact types, which can be provided by the 3rd paties, too, in the form of extended artifact types (which is great for more complex artifacts), I find that aproach rather cumbersome for a general purpose content repository  - for example creating a custom extension just for storing "scripts" in the repository, while a simple (and readily available) mimetype detector could be used to classify such files just by mimetype (there is not much more one can analyze on a script, that would call for a custom extended artifact type, IMHO).

        • 1. Re: Using S-RAMP as a general purpose content repository
          kurtstam

          Lukas Krejci wrote:

           

           

          What I am missing though is a more robust handling of mime types and detection of the "features" of the uploaded content that can only be uploaded as generic "Document" artifact type.

           

          For example, in S-RAMP 0.1.1, whatever one uploads as a "Document" will end up with the "application/octet-stream" by default or with whatever mime type the client supplies (see https://issues.jboss.org/browse/SRAMP-178). This is despite of the fact that there is already support for extension-based mimetype detection (which I would love to see enhanced with at least mime-magic detection - see https://github.com/metlos/s-ramp/commits/mimetype-detection (which I have not sent in as a pull request yet, because I'd like to have it discussed here first)).

           

          Robust detection of mimetypes could then theoretically be used for a more "intelligent" upload mechanism, where the server itself would be able to assess the type of the artifact being uploaded (today, if you upload an XML file as a generic "Document", it will stay as such even though the server is capable of telling its correct artifact type).

           

          I completely agree with this; see also my comment on https://issues.jboss.org/browse/SRAMP-178. I'm more then happy to accept a pull request .

           

           

          Lukas Krejci wrote:

           

           

          While the distinction of the artifacts in S-RAMP is based on the notion of artifact types, which can be provided by the 3rd paties, too, in the form of extended artifact types (which is great for more complex artifacts), I find that aproach rather cumbersome for a general purpose content repository  - for example creating a custom extension just for storing "scripts" in the repository, while a simple (and readily available) mimetype detector could be used to classify such files just by mimetype (there is not much more one can analyze on a script, that would call for a custom extended artifact type, IMHO).

          The artifactTypes fullfill the role of relating an artifact to an artifact model, where the model represents a data structure which represents meta data specific to this artifact type. This to define a standard protocol 3rd party tools can represent and work with this artifact using any S-RAMP compliant repository (and/or the repository can run a deriver to extract data in a standard way). Some artifact will not need a model; scripts are probably one of them. Again I fully agree with your assessment that script can probably simply use 'Document' and they would have their own mimeType as recognized by the server.

           

          --Kurt

          • 2. Re: Using S-RAMP as a general purpose content repository
            eric.wittmann

            I'll add a couple of points.  Our existing mime-type support is very rudimentary and naive.  I definitely think we would be happy to accept improvements in that area!

             

            As Kurt mentioned in the JIRA issue comment, section 2.2.2 of the spec reads:

             

            contentType:

                •          A string indicating the MIME Media type of the content.  This is set by the server as part of processing the publication of the document, and cannot be changed by the user.

             

            If the server is accepting whatever the HTTP request indicated was the content-type, then that's a bug we need to fix.    I probably implemented it incorrectly.

             

            Lastly, the only thing I don't agree with is the concept of uploading a file to /core/Document and having the server realize it's an XML file (presumably by introspecting the content) and actually storing it as a core/XmlDocument.  I believe that would violate the s-ramp specification.  However, auto-detecting the content type when the upload goes to /core/Document or /ext/ExtendedDocumentType makes a lot of sense.  I'm all for that.

            • 3. Re: Using S-RAMP as a general purpose content repository
              kurtstam

              Eric Wittmann wrote:

               

               

              Lastly, the only thing I don't agree with is the concept of uploading a file to /core/Document and having the server realize it's an XML file (presumably by introspecting the content) and actually storing it as a core/XmlDocument.  I believe that would violate the s-ramp specification.  However, auto-detecting the content type when the upload goes to /core/Document or /ext/ExtendedDocumentType makes a lot of sense.  I'm all for that.

              Right, I think we're in agreement on all of that.

               

              In some case the Model is closely related to a MimeType (i.e. XmlDocument) but for others it is not (Document). We probably should return an error if we detect an incompatible mimeType/Model combination, which will probably lead to problems anyway when we go to extract or derive metadata from it. But again 'Document' can probably accept any mimeType discovered by the server, but if we wanted I guess we could exclude certain mimeTypes if that makes sense.

              • 4. Re: Using S-RAMP as a general purpose content repository
                eric.wittmann

                +1 to all that.

                 

                There's another thing I started thinking about based on this thread.  In the original post, Lukas mentioned:

                 

                detection of the "features" of the uploaded content

                 

                Depending on the specifics of the use-case, one that can be done today is to define a custom extended document type and upload all content as that type.  If you did that, you could implement a custom Deriver that could analyze the content for the "features" it has.  The features could be stored on the artifact in whatever way makes sense (as custom properties, as classifications, as derived artifacts, as relationships, etc...).  The Deriver has access to the binary content of the artifact as well as all of its existing meta-data.

                • 5. Re: Using S-RAMP as a general purpose content repository
                  lkrejci

                  Kurt Stam wrote:

                   

                  Eric Wittmann wrote:

                   

                   

                  Lastly, the only thing I don't agree with is the concept of uploading a file to /core/Document and having the server realize it's an XML file (presumably by introspecting the content) and actually storing it as a core/XmlDocument.  I believe that would violate the s-ramp specification.  However, auto-detecting the content type when the upload goes to /core/Document or /ext/ExtendedDocumentType makes a lot of sense.  I'm all for that.

                  Right, I think we're in agreement on all of that.

                   

                  In some case the Model is closely related to a MimeType (i.e. XmlDocument) but for others it is not (Document). We probably should return an error if we detect an incompatible mimeType/Model combination, which will probably lead to problems anyway when we go to extract or derive metadata from it. But again 'Document' can probably accept any mimeType discovered by the server, but if we wanted I guess we could exclude certain mimeTypes if that makes sense.

                   

                  The main problem with the automated artifact type detection would be that a number of artifact types share the same mimetype, which would make the automated detection more complex, if not impossible. This is in addition to breaking the spec The reason I raised this point was that the human users make mistakes and if the choice of the artifact type is in their hands, mistakes will happen. Therefore the server-side should be as smart as possible to prevent and/or correct the user's mistakes.

                   

                  I think Kurt's suggestion is a nice step in that direction - detect the mimetypes for generic documents and check the mimetypes for more specialized artifact types.

                  • 6. Re: Using S-RAMP as a general purpose content repository
                    lkrejci

                    That is something I didn't completely wrapped my head round yet. The artifact type implies a model which then implies the available metadata of the artifact. Now there can be derived artifacts defining derived models. I couldn't find a definitive answer in the spec, but the code suggests that for a given artifact type there can be only a single "deriver" which then can create a set of derived artifacts for given artifact. There is currently ArtifactDeriverFactory, which hardcodes the derivers for the built-in artifact types and is able to load the derivers for extended types.

                     

                    What I don't understand is why it is not possible to supply a custom deriver for the built-in artifact types or have multiple derivers "attached" to single artifact type? The use case I'm trying to capture is that I'd like RHQ to be able to derive its own artifacts off of other, existing artifact types - let's say we had a model for EAR's - I'd like to derive RHQ-specific artifacts off of an EAR artifact, but keep the EAP artifact as EAR artifact type with its own model and therefore all attached functionality available to others. I'd hate to create a "silo" of RHQ artifacts only usable to RHQ. Instead, I'd like RHQ to be able to tap into the existing "environment" non-destructively. If users create workflows around their SOA artifacts and those artifacts could be used by RHQ, too, I'd like to be able to use those artifacts without affecting their models.

                    • 7. Re: Using S-RAMP as a general purpose content repository
                      eric.wittmann

                      I'm not against enhancing the deriver factory to provide a chain of derivers.  It wouldn't be hard to implement and would introduce additional flexibility in the system.  The idea of custom derivers is implementation-specific, but I don't believe it violates the spec.  It's just something that users would need to be aware of (e.g. swapping out our S-RAMP implementation for another one - you might lose the ability to have custom derivers).

                       

                      I'll also note that Derivers can add any meta-data they want (to the artifact they are deriving) - they are not restricted to just creating derived *artifacts*.  An example of this is the XSD deriver - it sets the targetNamespace property of the artifact being derived (in addition to create the XSD derived model artifacts).