If I understand you, you want to create a repository showing the structure of the HTML, JSP, Java, SQL and ETL scripts, and then relate all those to show the traceability?
As mentioned in another thread, we're hoping to release the sequencing framework very soon. And I hope that the examples and docs will help people understand how to write and use sequencers.
Serge is working on a Java sequencer, but SQL, HTML and even JSP are also on my list of great-to-haves, as would ETL scripts. Any desire to start work on one of those? Perhaps maybe just start sketching out the JCR schema for these?
What kinds of MetaMatrix models are you using? Relational only, or also XML?
Most of our interfaces to MetaMatrix are web services interfaces.
So to get a full view of related metadata, I can imagine it would have to include: WSDL and XSD (currently registered in UDDI ), web service procedure transformation code, -> XML model transformation code -> Virtual views and virtual procedures -> physical resources (databases and web services).
The idea is to be able to trace lineage from an attribute in an XSD all the way back to the source and show transformation on the attribute (upper, concat, CASE if...END, eval. etc.) and other relevant information about how the attribute flows from a source to a consumption point.
So I can imagine repositories from various domains being federated through DNA.
MetaMatrix exposure Domain -> Database storage domain -> ETL loading domain. Currently our world stops at the MetaMatrix interface, but you're correct that ideally you would keep going up through the stack towards end users (Java /AXIS -> JSP -> HTML)
The MetaMatrix Domain can show lineage from a WS to a physical database table. Further back would be where database comes into the picture and now we are in the database domain (views, tables, procedures)
The Database domain can show various procedures and view scripts, giving insight into how the database tables are populated and any transformation there. This domain might also have supporting diagrams (ERD, Visio) that can be referenced and discoverable.
The ETL domain would be able to show from which systems the data is coming from and maybe also some information about when the ETL loads are run.
I am currently trying to skech out how a MetaMatrix model could be structured in jackrabbit. I am sort of doing it from scratch, ignoring the current MetaMatrix repository structure which might be a waste of time since the MetaMatrix team will for sure come up with something better. I am also looking into WSDLs and schemas.
I would be interested in working on sequencers for doc/lit web services (not exactly sure what the scope would be ) My main domain is EII and am very interested in getting a MetaMatrix sequencer off the ground.
I was thinking about this a bit.
To understand JCR better, I would play around with jackrabbit. It doesn't really matter which domain I work with. I think a good understanding of JCR is probably needed to be able to understand the concept of Sequencers and write good effecient sequencers. For choices of building a Sequencer I am thinking that a SQL/JDBC Sequencer might be an idea since it could connect to the MetaMatrix repository vdb using JDBC. Having a JDBC Sequencer seems to supprt quite a few use cases I can think of since many repositories are probably persisted in a relational data store.
Yes, you're definitely going in the right direction. Getting a snapshot of multiple relational DBs into the database by directly connecting to them is very important.
But I think that works better with DNA's federation model, where the repository content is dynamically built ("contributed") from JCR repositories, JDBC data sources, UDDI registries, SCM systems, etc. Clients connecting to the federated repository see all the federated content, some of which is really only a cached representation while other content is literally a local snapshot.
This federation model uses connectors to talk to the different kinds of sources, and to solve your use case there would be a connector that would talk to JDBC and dynamically expose the database metadata to the rest of the repository. (You could also do this with data, although that may require a connector that is much more aware of the specific schema.)
As soon as the sequencer release is out, we'll start on the federation. Federation just has a ton of possibilities.
But back to sequencers. One major characteristic of DNA sequencers is that DNA listens to an existing repository, watching it for changes. If a change to a node fits the pattern for a sequencer, that sequencer is then run against that node. So far I've assumed that this content consists of file content, and the sequencer reads it and constructs a representation of that content as a graph. For example, if the file is a DDL file, the DDL sequencer would extract the table structure and create a graph of that tables, columns, keys, indexes, etc. (The sequencer configuration can specify where in the repository the output graph is to be saved.)
However, I'm trying to think of whether a sequencer could also connect to an external system (e.g., a db) and load the data from there. The sequencers only run based upon some change in the repository (that matches the sequencer's pattern), so what could that change represent? Maybe the data source information?
Thoughts? Maybe I need to explain the federation concept in more detail.
There are several new JIRA issues where we're tracking work and requirements for several sequencers, including:
- MetaMatrix models (DNA-31)
- XSDs (DNA-32)
- WSDL (DNA-33)
There are other feature requests for other sequencers, and several for federation connectors, including a connector that dynamically accesses the schema from a relational database (DNA-37).
I am very curious as to your views on this thread as it relates to the current state of ModeShape.
I am a firm believer that because of what ModeShape is maturing into it should be a strongly considered as a core to any enterprise meta data solution... Especially when you take into account the strong integration with TeiiD it finally brings to bear what I call "Active" Metadata managment... Finally metadata should be MORE than just pertty plots on walls!!