The S-RAMP specification (section 4) specifies an X-Path 2.0 based query language. The S-RAMP Query grammar is actually a subset of the XPath 2.0 grammar. The challenges for an S-RAMP implementation is to properly parse and execute queries that conform to this dialect. This discussion is intended to help us decide the best approach to implementing the Query dialect for the Overlord S-RAMP implementation.
For the most part, the S-RAMP Query dialect is a straightforward use of XPath. However, there are some nuances that must be considered when deciding on an implementation approach. Some of these, in no particular order, are:
- Parsing the query dialect into an AST/model
- Validating the resulting model
- Ensuring the model is suitable for use by the S-RAMP provider (ModeShape right now)
- Handling the classifiedBy set of custom s-ramp functions
- Are arbitrary relationship depths supported (e.g. give me all WSDLs that import any XSD that imports xyz.xsd) - this is not clear to me after reading the spec
I think the first decision to make is: how will we parse the query into a model?
Some options that come to mind:
- Use javacc or some other parser generator to create our own parser specific to the S-RAMP Query dialect (I think I might favor this approach).
- Use an existing XPath query parser such as Saxon or Xalan (it's unclear whether these could easily be leveraged to simply do the parsing)
Once the query is parsed into a model, then I think static validation is a simple matter of analysis of the model. I don't think there are decisions to be made here.
We do need to make sure that the model we produce is easily consumed by the provider. I'll assume that any model we produce can be visited/traversed to make it easy for a provider to convert the query into something native. In the case of our existing ModeShape provider, I think this actually means converting from S-RAMP XPath into ModeShape XPath. These two dialects are different enough that I believe we would definitely want to convert to a Model and then back again. For other providers the resulting provider query might be SQL or some other native language.
That leads to the following concern I have: how do we handle S-RAMP classifications?
I think this is an challenging issue that needs to be solved by each provider in its own way. For example, if the provider supports something like XPath and supports user-defined functions (e.g. eXist XML database) then the solution might be straightforward. However, not all providers will have an easy time with this. It's possible that the ontologies will need to be normalized by the provider for easy querying. That would work reasonably well, although making ontology changes after the fact becomes a more challenging operation (existing normalized artifacts would need to be updated).
Another (unrelated) issue with querying are relationships. How deep into the relationship hierarchy can the query go?
The S-RAMP spec provides Query examples like this:
This query should return all WsdlDocument artifacts that include an XSD which has 'someProperty' set to 'true'. That's pretty straightforward, but what about a query like this:
That query should return all WsdlDocument artifacts that include any XSD that itself includes an XSD with 'someProperty' set to 'true'. You can see how the depth is infinitely deep. Does the S-RAMP spec allow this? I couldn't quite tell based on my reading of it.
I'll stop here - let the discussion begin!!