The registry solution is a lot better since it would allow for example two different canonical formats, however it's difficult to design because of following:
A -> C, C -> B
A -> D, D -> B
get(a->b), which route would you choose? It would have to be weighted graph, where each route would have their priority. On the other hand if it's done, it would create lots of nice possibilities (and also bad configuration could create very long chains which would be near impossible to trace back why did one field transform to one place).
I think the degenerate case here is providing the ability to link transformers via configuration and getting the runtime to execute the transformers as part of the exchange. Once we have that in place, we can see if that's sufficient from a usability perspective or if we need to get fancy with automatically deriving intermediate transformation formats. My gut tells me that we end up spending lots of time developing such a solution and it turns out to be brittle and error prone, and ultimately people just end up coding around it.
I think I'd prefer a more explicit chaining/pipeling of transformations. When things get too magical, I find they become a pita for users (debugging etc). Maybe it'd be OK if there's a single non-ambiguous route from format A to format B via a single intermediate format. I think if the options bubbled any more than that I'd prefer to see the user being asked to intervene and give some direction.
We have an approach somewhat related in Smooks that could possibly be leveraged here too. We call it "Model Driven Transformation". You can see some examples here if interested.
The Smooks example brings up an interesting point. It's important to remain agnostic as to the representation of the canonical data format. It could be XML, Java, CSV, whatever. I think XML offers the widest range of options for an intermediate format, but it's all gonna depend on the environment.
Keith Babo wrote:
The Smooks example brings up an interesting point. It's important to remain agnostic as to the representation of the canonical data format. It could be XML, Java, CSV, whatever.
Keith Babo wrote:
I think XML offers the widest range of options for an intermediate format, but it's all gonna depend on the environment.
Not so sure with that. Using XML as an intermediate format also offers wide ranging scope for things to go pear shaped, unless you add the additional overhead of validating the xml canonical form every time, before applying the transform to the target format.
Sure it would be a good idea to validate a Java based canonical model too, but in that case it would be just validation of the content of the model... not its structure too (the compiler has already done that for you). Add to that the fact that you can debug it...
I guess it'd come down to personal preference in a lot of cases (and env issues as you say) and I'm sure some would prefer to use XML.
Interesting thread. Is there any reason Google Protocol Buffers is not considered versus XML? I think it meets the same criteria as XML with all the self describing characteristics, and it directly solves the serializable issue. In addition to this there are libraries for almost all common languages and Google maintains C++, Python, and Java. However, I think they are lacking in documentation of all the capabilities of GPB. Many of the dynamic features are not well documented nor is the use of field options, but overall my experiences with the software have been extremely positive.
Perhaps it's just my lack of experience with GPB, but not sure how it would apply as an option for using as a cannonical data format. In any case... I don't think we'll be doing anything in SwitchYard that would rule out anything. This would be an implementation choice.
As Tom said, you could really use any format for your canonical message format within SwitchYard. The transformer support simply takes a name for the from and to type and looks for the appropriate transformer to move between them. Generally speaking, people will choose a canonical format which provides straightforward mapping options, since all data formats are supposed to map to the canonical form. XML is one example, since there is a well-defined structure, standard serialization rules, and plentiful tooling options.
To be honest, I'm not all that familiar with GPB, but I just took a quick cruise through their developer guide. Seems like you could have a generalized transformer implementation that takes a .proto file as configuration and maps between an input message and a Java type. So instead of having a transform.java or transform.smooks, it would be a transform.proto. I'm guessing the .proto file could then be passed to external consumers of the service which could use it to create a request message from multiple languages.
Does that sound like what you had in mind? Again, we don't really force SwitchYard users into a specific (or even a single) canonical data format, so you can choose the solution that's best for you.
You are on the track that I am. GPB is lacking with a true inheritance model but the extendability built into the capability is very powerful. In addition the reason our program leveraged the technology is the ease of change. When digging in what we found was that you could define a very minimal set of 'message' properties and on the fly in code make necessary changes. The reasons one might choose XML are typically the robust serialization tools(marshalling), the ability to be self describing, the flexibility to define what you need quickly and the general multilanguage support. These capabilities are all available in GPB and because of the code generation features the marshalling time is considerably faster. So when I see something like Smooks - which I also love and have considered writing .proto capabilities into - I think if the core behavior was in GPB (or a similar techonology e.g. Thrift) the performance capability would improve greatly without an impact to flexibilty. Plus you generally get multiple lanaguage support.
Performance Trade Study on Marshallers...
Here's another thread that might interest you:
It's more about our internal serialization requirements, but I think there may be an opportunity to synthesize some of the work we are doing with transformers in that area.
In my project we want to want to able to support different versions of services and thus dealing with large amounts of formats with nearly similar requirements.
Creating transformation from every format to the other doesn't seem to be efficient.
After writing a small example, I realized that with current version(0.6.0.Beta1) chaining of transformers isn't possible.(JIRA didn't yeld any results either)
Is there some plan for supporting this feature?
Can you provide a bit more detail on your specific scenario? Transformer chaining is not yet supported in SY, but it is an area that we're interested in addressing.
Imagine the following simple scenario:
ServiceV1 -> TransformerV1 -> ReferenceV1
At some point we want to introduce a newest version of the service, let's say V2 .
Without chaining, we have to create a new transformer so that :
ServiceV2 -> TransformerV2 -> ReferenceV1
Now if chaining was allowed, it would be possible to keep the old transformer and add a new one
ServiceV2-> TransformerV2toV1 -> TransformerV1 -> ReferenceV1
That's an interesting perspective. The transformer pairs essentially form a directed graph, so It should be possible to walk the graph to find a multi-step transform.