13 Replies Latest reply on Mar 5, 2013 9:40 AM by eric.wittmann

    Resolving references in S-RAMP (e.g. WSDL imports)

    eric.wittmann

      An issue that still needs to be solved in the S-RAMP repository is how to resolve references between artifacts.  For the purpose of this post I will use WSDL as an example, but the problem is generic across multiple artifact types (any artifact type that can have references to other artifact instances).

       

      As you know, a WSDL may refer to another WSDL (import, include, redefine).  Various systems (tooling, runtime systems, etc) have different strategies for resolving these external references.  The purpose of this post is to discuss the resolution strategy in S-RAMP.

       

      For context, here is a typical import statement from a WSDL:

       

             <import namespace="http://example.com/stockquote/schemas" location="../common/stockquote.xsd"/>
      

       

      For tooling, resolution is usually straightforward - just follow the location as though it were a file path.  In runtime systems, the strategy usually varies, but often involves a registry of resources keyed by namespace.

       

      The S-RAMP repository may contain multiple artifacts that match the given namespace (but with different version numbers).  When adding a WSDL, we need to implement an appropriate strategy for resolving the WSDL imports/includes.

       

      Straw Man Proposal:  Resolution should be done using the namespace (look up a WSDL artifact in the repo by its targetNamespace property).  If multiple artifacts are found, use the one most recently added (assuming it is the one with the highest version number).  I don't think version number can be used, given that the version property is free-form.

       

      An additional consideration is scoping.  If a WSDL is being added as part of a batch, then perhaps resolution should happen within that scope.  For instance, if an S-RAMP package (spec-defined ZIP format) is being added, and a number of WSDLs and Schemas are included in that package, then we should first attempt to resolve references by looking in the package (first by location path and then by namespace).

       

      I'm considering addig a "linker" phase to the core S-RAMP repository that would follow the "deriver" phase.  This would allow content to be derived into the repository first, and then relationships between artifacts could be resolved after.

        • 1. Re: Resolving references in S-RAMP (e.g. WSDL imports)
          kurtstam

          1. To find if the file is already in the repo I think we can use fileName, fileSize, contentHash and version:

          • find all files which match fileSize and contentHash
            • no match - no duplication, no reference
            • match (I argue that the version field can be used, maybe ignore case though?)
              • if name and version match - build reference
              • if name matches and version is different: I think this needs to be a validation error, user needs to clean up the repo first
              • if name matches and version is not set on one of the artifact: add the version - generate warning?
              • if name does not match, regardless of version info - no match - we'd store a the artifact as new file

           

          2. For some artifacts we could use more markers then the one described above, such as namespace. However do we need to do this, would the hash key take care of this since any of these markers are already part of the content?

           

          3. We should not find multiple matches (see 1), this is a validation issue. Cleanup of the repo is required before a reattempt.

           

          4. For a zip upload; I'm assuming we don't have a upload order; which is what you are referring to I think. So I think it makes sense to wait till all the artifacts are in the repo before creating relations between i.e. wsld and xsd in case of an import. This usecase is independent of finding a file-already-in-the-repo. A Linker phase seems ok to me.

           

          Question:

           

          How do we handle inline wsdl&xsds versus a wsdl with imported xsds. So we have a logical match, but not a physical match.

          Can we ever consider these as equals?. A good xsd would have the version in the namespace, so we could do additional comparison work and maybe create an 'possibleMatchRelationShip'? Human interaction can change this to a logicalMatchRelationShip, and we'd keep both versions in the repo?

          • 2. Re: Resolving references in S-RAMP (e.g. WSDL imports)
            eric.wittmann

            1. I don't think it's possible to do linking based on fileSize and contentHash because (when adding the WSDL) these values won't be known.   The only thing we'll know when a particular WSDL file is added to the repository is the "namespace" and (I think optionally) the "location" of the WSDL being imported.  In other words, when A.wsdl is being added to the S-RAMP repository, if it contains an  of B.wsdl (which may or may not already be in the S-RAMP repo) how do we find the B.wsdl artifact so we can add a relationship to it?  I think the filesize and contenthash are relevant when determining whether an artifact being added to the repository is a duplicate (clearly), but not when linking artifacts together. 

             

            4. Right, upload order is a motivating factor.  Although there are other ways to solve the problem, I think adding a linker phase is the right solution. 

             

            I'm not sure I understand the Question you raise at the end.  If an XSD is defined inline, then it will never exist in the S-RAMP repository as a separate artifact.  However, all of the elements, complex types, simple types, and attribute declarations will be added as derived S-RAMP artifacts.  These derived artifacts will be the target for various WSDL specific relationships (e.g. a Part artifact has an 'element' relationships from the Part to, potentially, one of the element declarations coming from the inline schema).  I suspect I'm missing the point. 

            • 3. Re: Resolving references in S-RAMP (e.g. WSDL imports)
              kurtstam

              Eric Wittmann wrote:

               

              1. I don't think it's possible to do linking based on fileSize and contentHash because (when adding the WSDL) these values won't be known.   The only thing we'll know when a particular WSDL file is added to the repository is the "namespace" and (I think optionally) the "location" of the WSDL being imported.  In other words, when A.wsdl is being added to the S-RAMP repository, if it contains an  of B.wsdl (which may or may not already be in the S-RAMP repo) how do we find the B.wsdl artifact so we can add a relationship to it?  I think the filesize and contenthash are relevant when determining whether an artifact being added to the repository is a duplicate (clearly), but not when linking artifacts together. 

               

               

              http://www.w3.org/TR/wsdl#_Toc492291093

               

              "When working with WSDL, it is sometimes desirable to make up a URI for an entity, but not make the URI globally unique for all time and have it "mean" that version of the entity (schema, WSDL document, etc.). There is a particular URI base reserved for use for this type of behavior. The base URI "http://tempuri.org/" can be used to construct a URI without any unique association to an entity. For example, two people or programs could choose to simultaneously use the URI "http://tempuri.org/myschema" for two completely different schemas, and as long as the scope of the use of the URIs does not intersect, then they are considered unique enough. This has the further benefit that the entity referred to by the URI can be versioned without having to generate a new URI, as long as it makes sense within the processing context. It is not recommended that "http://tempuri.org/" be used as a base for stable, fixed entities."

               

              I'm interpreting this such that in development mode, it's ok to not use a fully unique URI, but it is 'not recommended' to not use a fully unique URI at release time. SchemaLoc is optional to help users or tooling find the xsd. So we *could* require for namespace to be unique reference. So  we can use that to lookup the xsd in the repo. This same section is referenced when importing other WSDLs.

               

              IF we allow none unique URIs and then we can check if we have a location, and if this specifies the name of the other wsdl we can use it to reduce the set of matches. We fail if we end up with more then one match. We maybe want to make 'allowNonUniqueURIs' a config option.

               

               

              Eric Wittmann wrote:

               

               

              I'm not sure I understand the Question you raise at the end.  If an XSD is defined inline, then it will never exist in the S-RAMP repository as a separate artifact.  However, all of the elements, complex types, simple types, and attribute declarations will be added as derived S-RAMP artifacts.  These derived artifacts will be the target for various WSDL specific relationships (e.g. a Part artifact has an 'element' relationships from the Part to, potentially, one of the element declarations coming from the inline schema).  I suspect I'm missing the point. 

              I'm just worried about a WSDL being a logicalMatch to another WSDL.

              • 4. Re: Resolving references in S-RAMP (e.g. WSDL imports)
                eric.wittmann

                IF we allow none unique URIs and then we can check if we have a location, and if this specifies the name of the other wsdl we can use it to reduce the set of matches. We fail if we end up with more then one match. We maybe want to make 'allowNonUniqueURIs' a config option.

                I agree completely - we can use the namespace (required) and schemaLocation (optional) to find the right artifact in the repository.  I also have no problem with your discussion of unique vs. non-unique namespaces (I would hope a reasonably mature SOA organization would use unique URIs).  In particular, versioning the WSDL without changing the URI was the use-case I was considering in my original post.  For that use case, there will be multiple artifacts in the repository for the same namespace and schemaLocation, but with different version numbers.  When linking, we would either need to fail or pick the most recent one.

                • 5. Re: Resolving references in S-RAMP (e.g. WSDL imports)
                  kurtstam

                  Eric Wittmann wrote:

                   

                  I agree completely - we can use the namespace (required) and schemaLocation (optional) to find the right artifact in the repository.  I also have no problem with your discussion of unique vs. non-unique namespaces (I would hope a reasonably mature SOA organization would use unique URIs).  In particular, versioning the WSDL without changing the URI was the use-case I was considering in my original post.  For that use case, there will be multiple artifacts in the repository for the same namespace and schemaLocation, but with different version numbers.  When linking, we would either need to fail or pick the most recent one.

                   

                  Ah now I think I fully appreciate all of your points in the first post . I think we are in agreement:

                   

                  • We fail if no version numbers are supplied,
                  • but if we do find version numbers we'd link to the most recent one. Maybe we can add a classification to warn that we did this so we can display a warning in the UI, and maybe way for the user to fix it if we picked the wrong one, or approve it (and then we clear the warning),

                   

                  And again we only allow this if the allowNonUniqueURIs property is set to true.

                  • 6. Re: Resolving references in S-RAMP (e.g. WSDL imports)
                    eric.wittmann

                    +1

                     

                     

                    • 7. Re: Resolving references in S-RAMP (e.g. WSDL imports)
                      objectiser

                      Based on the other discussion, about artifact versioning - was wondering whether it would be better to setup the relationship to the exact matched artifact, rather than the latest version?

                       

                      As a common artifact (e.g. wsdl) is updated, there will be relationships from services to older versions. If a user wishes to know all services that use a particular wsdl, then it would need to traverse the relevant versions (isVersionOf) relationships to pick up the related services. So not sure we gain anything by always relating to the latest version, I think it is better to relate to the actual artifact (e.g. wsdl) that matches the one being used.

                       

                      Regards

                      Gary

                      • 8. Re: Resolving references in S-RAMP (e.g. WSDL imports)
                        eric.wittmann

                        Based on the other discussion, about artifact versioning - was wondering whether it would be better to setup the relationship to the exact matched artifact, rather than the latest version?

                        I'm not sure I understand what you meanby exact matched artifact.  Can you expand on that? 

                         

                        Ultimately, when the repository creates relationships from one WSDL to another, it will have (at most) the namespace and location of the other WSDL.

                         

                        As for finding out who references a WSDL ("Find All Services Consuming This WSDL") I think that's a specific gesture in the UI.  The user would navigate to a particular version of a particular WSDL, then choose "Dependency Analysis" or some such action.  To answer the more complicated problem of "Find All Services Consuming ANY VERSION OF This WSDL" would either not be supported or would be a different gesture.

                        • 9. Re: Resolving references in S-RAMP (e.g. WSDL imports)
                          objectiser

                          Eric Wittmann wrote:

                           

                          Based on the other discussion, about artifact versioning - was wondering whether it would be better to setup the relationship to the exact matched artifact, rather than the latest version?

                          I'm not sure I understand what you meanby exact matched artifact.  Can you expand on that? 

                           

                           

                          If we consider the wider dev workflow, then a wsdl may be uploaded to sramp by an architect/service analyst. A developer would then pull down the specified version and begin building the service implementation against that interface.

                           

                          In the meantime, another project may start working on a future version of the service interface, and upload modifications to that interface.

                           

                          When the original developer uploads their service implementation binary, we would want it associated with the original wsdl that they downloaded - primarily so that when a sramp user asked "find all services consuming this wsdl", that they get the correct implementations. I believe with the scheme currently proposed, the developer's service would be associated with the updated/incompatible version of the wsdl?

                          • 10. Re: Resolving references in S-RAMP (e.g. WSDL imports)
                            eric.wittmann

                            Ah ha!  Thanks for the clarification.  The use-case you mention wasn't the one I was trying to solve with this post, but you make an excellent point.  The specific use-case that this post is about is how to resolve dependencies when adding a WSDL artifact (which depends on and refers to other WSDLs).  This is needed because the WSDL Deriver needs to automatically create relationships from the WSDL artifact to other WSDL artifacts, but also from derived artifacts to other derived artifacts from other WSDLs.

                             

                            But you are right, the other use-case is when an implemenation of a WSDL is uploaded.  There are a number of problems to overcome with that use-case, but I do agree that the artifact resolution algorithm would certainly be different.  If the WSDL is included in the implementation JAR, then we can find an exact match and link to that.  If not, then some other search criteria will be needed.  I'm not sure yet how that will be done, although we'll probably have additional context information when creating these relationships (context we won't have in the above use-case).

                             

                            We'll need to make sure there is a best-practice way to do it so that everything works automatically.  For other approaches we may need to fall back on manually linking the two together.

                            • 11. Re: Resolving references in S-RAMP (e.g. WSDL imports)
                              objectiser

                              Ok understand now

                              • 12. Re: Resolving references in S-RAMP (e.g. WSDL imports)
                                rhauch

                                There is also the scenario of uploading a new version of a WSDL file. In this case, I think the linker should first try to find the older version (aka, the "parent" version) of the WSDL, establish a "isVersionOf" relationship to it, and then take into account the relationships from the parent version to other artifacts when attempting to establish (potentially similar) relationships from the newer version.

                                 

                                Consider a simple case of the repository already containing a WSDL "a.wsdl" (v1) that references another WSDL "b.wsdl" (v1, via "r1") and an XSD "c.xsd" (v2, via "r2"). Then, a new version of "a.wsdl" (v2) is uploaded. The linker could use "r1" and "r2" to find potential targets for similar relationships from "a.wsdl" (v2). I said "potential" because there might be a newer veresion of "c.xsd" (v3) that should be used instead, but such newer versions could actually be found by navigating "r2" to "c.xsd" (v2) and looking for newer versions. If no newer version is found and a relationship is still needed (because the reference to the same XSD still exists in the uploaded file), then the new relationship from "a.wsdl" (v2) could just point to "c.xsd" (v2).

                                 

                                Perhaps the linker needs distinct phases, too:

                                1. Identify the parent artifact of the new artifact (this may be identified by the user when they "upload a new version" of an artifact they're looking at, or maybe they have to be prompted to select the parent from search results if there are more than one leaf of the "isVersionOf" tree for a set of matching artifacts)
                                2. Identify potential relationships and targets, either by: including other artifacts included in the upload; by navigating relationships of the parent artifact and walking to find the most recent versions of the parent artifact's relationship targets; and searching for artifacts that match newly added references in the updated artifact (e.g., the newer version of the WSDL contains a new reference to an XSD called "d.xsd")
                                3. Find the best match for each of the needed relationships; the user may need to be involved here

                                 

                                Considering that the user may need to be involved in each of these steps, it may be desirable to store the result of each phase in the repository. For example, if it is not possible to positively identify the target of a particular relationship, perhaps it is possible to create a relationship from the uploaded artifact to a new "Ambiguous Target" object that itself references multiple potential matches (and/or criteria for identifying matches at the time it's asked for)? This records the fact that some user-involvement is necessary to manually resolve/pick the ambiguous relationships from a set of already-found potentials, but it also enables that resolution to be done at a later time.

                                 

                                It might even be possible to turn a concrete relationship into an ambiguous one. A developer or architect might want to do this if they see that the concrete relationship is to the wrong version. This might classify the relationship as potentially invalid, and then the "owner" of the artifact(s) could be notified via some task list or to-dos that the relationship needs to be looked at. If the owner deems the relationship target to be incorrect, they could attempt to resolve the relationship and the system might create the ambiguous relationship structure, allowing the "manual resolution" mechanism to kick in.

                                 

                                I guess the short summary is that ambiguous relationships should probably be acknowledged as a valid intermediate persistent state that may only be able to be resolved by manual intervention.

                                • 13. Re: Resolving references in S-RAMP (e.g. WSDL imports)
                                  eric.wittmann

                                  I think this (or something similar) is the level of sophistication we want to eventually achieve, yes.  We can probably do even more to narrow down the set of potential matches.  For example, it's entirely possible that the parent has a differnet targetNamespace (both for the artifact being uploaded and its references).  In other words, the version tree for a given WSDL or XSD might have artifact instances with different targetNamespaces (this is the ideal state, I think - developers actually versioning their interfaces by altering the targetNamespace).  But in some cases the TN might be the same.  We need to handle both, clearly.

                                   

                                  Your bottom line that there will be cases where we simply can't reliably choose a concrete reference is true, and I agree that the only solution is manual resolution.