1 2 Previous Next 16 Replies Latest reply on Nov 1, 2012 3:58 PM by kcbabo

Chaining Transformers

kcbabo Jan 18, 2011 8:43 PM

This discussion involves the first use case "Chaining transformers, case intermediate format" listed here:

http://community.jboss.org/wiki/Transformationusecases

The idea being that an organization may have a canonical data format through which all data mappings occur. Each format defines a mapping to the canonical form, which allows for any two formats to be translated into one another via the intermediate, canonical format. e.g.

InvoiceType1 <-> CanonicalInvoice
InvoiceType2 <-> CanonicalInvoice
InvoiceType1 -> CanonicalInvoice -> InvoiceType2
InvoiceType2 -> CanonicalInvoice -> InvoiceType1

For SwitchYard, this means that a transformation between a "from" and "to" may require multiple, invidual transformations chained together.

new Transformer[] = {
   new Transformer("InvoiceType1", "CanonicalInvoice"), 
   new Transformer("CanonicalInvoice", "InvoiceType2") };

This should be relatively straightforward to support with SwitchYard. The TransformHandler must be prepared to execute multiple transforms to get the job done. One mechanism would involve an ordered list of Transfomers set in the message or exchange context. The TransformerHandler would simply pick them up and fire them in order. Another approach would involve a more intelligent query capability in the TransformerRegistry, e.g. "I have 'a' and I need to get to 'c'". The registry would have to traverse the graph of available to/from transformations to figure out if that was possible.

I will work up a config and example code to demonstrate what this would look like.

1. Chaining Transformers

burmanm Jan 19, 2011 3:23 AM (in response to kcbabo)

The registry solution is a lot better since it would allow for example two different canonical formats, however it's difficult to design because of following:

Transformers:

A -> C, C -> B
A -> D, D -> B

get(a->b), which route would you choose? It would have to be weighted graph, where each route would have their priority. On the other hand if it's done, it would create lots of nice possibilities (and also bad configuration could create very long chains which would be near impossible to trace back why did one field transform to one place).
Actions
2. Chaining Transformers

kcbabo Jan 19, 2011 9:18 AM (in response to burmanm)

I think the degenerate case here is providing the ability to link transformers via configuration and getting the runtime to execute the transformers as part of the exchange. Once we have that in place, we can see if that's sufficient from a usability perspective or if we need to get fancy with automatically deriving intermediate transformation formats. My gut tells me that we end up spending lots of time developing such a solution and it turns out to be brittle and error prone, and ultimately people just end up coding around it.
Actions
3. Chaining Transformers

tfennelly Jan 19, 2011 9:19 AM (in response to burmanm)

I think I'd prefer a more explicit chaining/pipeling of transformations. When things get too magical, I find they become a pita for users (debugging etc). Maybe it'd be OK if there's a single non-ambiguous route from format A to format B via a single intermediate format. I think if the options bubbled any more than that I'd prefer to see the user being asked to intervene and give some direction.

We have an approach somewhat related in Smooks that could possibly be leveraged here too. We call it "Model Driven Transformation". You can see some examples here if interested.
Actions
4. Chaining Transformers

kcbabo Jan 19, 2011 9:49 AM (in response to tfennelly)

The Smooks example brings up an interesting point. It's important to remain agnostic as to the representation of the canonical data format. It could be XML, Java, CSV, whatever. I think XML offers the widest range of options for an intermediate format, but it's all gonna depend on the environment.
Actions
5. Chaining Transformers

tfennelly Jan 19, 2011 10:12 AM (in response to kcbabo)

Keith Babo wrote:

The Smooks example brings up an interesting point. It's important to remain agnostic as to the representation of the canonical data format. It could be XML, Java, CSV, whatever.

Sure.
Keith Babo wrote:

I think XML offers the widest range of options for an intermediate format, but it's all gonna depend on the environment.
Not so sure with that. Using XML as an intermediate format also offers wide ranging scope for things to go pear shaped, unless you add the additional overhead of validating the xml canonical form every time, before applying the transform to the target format.

Sure it would be a good idea to validate a Java based canonical model too, but in that case it would be just validation of the content of the model... not its structure too (the compiler has already done that for you). Add to that the fact that you can debug it...

I guess it'd come down to personal preference in a lot of cases (and env issues as you say) and I'm sure some would prefer to use XML.
Actions
6. Chaining Transformers

moofish32 Mar 20, 2011 7:06 PM (in response to tfennelly)

Interesting thread. Is there any reason Google Protocol Buffers is not considered versus XML? I think it meets the same criteria as XML with all the self describing characteristics, and it directly solves the serializable issue. In addition to this there are libraries for almost all common languages and Google maintains C++, Python, and Java. However, I think they are lacking in documentation of all the capabilities of GPB. Many of the dynamic features are not well documented nor is the use of field options, but overall my experiences with the software have been extremely positive.
Actions
7. Chaining Transformers

tfennelly Mar 21, 2011 7:22 AM (in response to moofish32)

Hi Michael.

Perhaps it's just my lack of experience with GPB, but not sure how it would apply as an option for using as a cannonical data format. In any case... I don't think we'll be doing anything in SwitchYard that would rule out anything. This would be an implementation choice.
Actions
8. Chaining Transformers

kcbabo Mar 21, 2011 7:43 AM (in response to tfennelly)

As Tom said, you could really use any format for your canonical message format within SwitchYard. The transformer support simply takes a name for the from and to type and looks for the appropriate transformer to move between them. Generally speaking, people will choose a canonical format which provides straightforward mapping options, since all data formats are supposed to map to the canonical form. XML is one example, since there is a well-defined structure, standard serialization rules, and plentiful tooling options.

To be honest, I'm not all that familiar with GPB, but I just took a quick cruise through their developer guide. Seems like you could have a generalized transformer implementation that takes a .proto file as configuration and maps between an input message and a Java type. So instead of having a transform.java or transform.smooks, it would be a transform.proto. I'm guessing the .proto file could then be passed to external consumers of the service which could use it to create a request message from multiple languages.

Does that sound like what you had in mind? Again, we don't really force SwitchYard users into a specific (or even a single) canonical data format, so you can choose the solution that's best for you.
Actions
9. Chaining Transformers

moofish32 Mar 21, 2011 9:00 AM (in response to kcbabo)

Keith,

You are on the track that I am. GPB is lacking with a true inheritance model but the extendability built into the capability is very powerful. In addition the reason our program leveraged the technology is the ease of change. When digging in what we found was that you could define a very minimal set of 'message' properties and on the fly in code make necessary changes. The reasons one might choose XML are typically the robust serialization tools(marshalling), the ability to be self describing, the flexibility to define what you need quickly and the general multilanguage support. These capabilities are all available in GPB and because of the code generation features the marshalling time is considerably faster. So when I see something like Smooks - which I also love and have considered writing .proto capabilities into - I think if the core behavior was in GPB (or a similar techonology e.g. Thrift) the performance capability would improve greatly without an impact to flexibilty. Plus you generally get multiple lanaguage support.

Performance Trade Study on Marshallers...

https://github.com/eishay/jvm-serializers/wiki/
Actions
10. Chaining Transformers

kcbabo Mar 21, 2011 9:11 AM (in response to moofish32)

Here's another thread that might interest you:
http://community.jboss.org/thread/163838?tstart=0

It's more about our internal serialization requirements, but I think there may be an opportunity to synthesize some of the work we are doing with transformers in that area.
Actions
11. Re: Chaining Transformers

kostas_papag Oct 31, 2012 7:26 AM (in response to kcbabo)

Hello,

In my project we want to want to able to support different versions of services and thus dealing with large amounts of formats with nearly similar requirements.
Creating transformation from every format to the other doesn't seem to be efficient.
After writing a small example, I realized that with current version(0.6.0.Beta1) chaining of transformers isn't possible.(JIRA didn't yeld any results either)

Is there some plan for supporting this feature?

Regards,
Actions
12. Re: Chaining Transformers

kcbabo Oct 31, 2012 7:45 AM (in response to kostas_papag)

Kostas,

Can you provide a bit more detail on your specific scenario? Transformer chaining is not yet supported in SY, but it is an area that we're interested in addressing.

cheers,
keith
Actions
13. Re: Chaining Transformers

kostas_papag Nov 1, 2012 4:51 AM (in response to kcbabo)

Imagine the following simple scenario:

ServiceV1 -> TransformerV1 -> ReferenceV1

At some point we want to introduce a newest version of the service, let's say V2 .
Without chaining, we have to create a new transformer so that :

ServiceV2 -> TransformerV2 -> ReferenceV1

Now if chaining was allowed, it would be possible to keep the old transformer and add a new one

ServiceV2-> TransformerV2toV1 -> TransformerV1 -> ReferenceV1

Regards,
Kostas
Actions
14. Re: Chaining Transformers

kcbabo Nov 1, 2012 7:46 AM (in response to kostas_papag)

That's an interesting perspective. The transformer pairs essentially form a directed graph, so It should be possible to walk the graph to find a multi-step transform.
Actions

1 2 Previous Next

Go to original post