What are the natural sources of your data, i.e., ignoring ESB for now.
The natural source of the data will vary from source, but the majority will be coming in tab-delimited flat files delivered (or pulled via FTP) daily from enterprise applications.
There's going to be a trade-off between message size and processing power required. The larger your messages, the more you're potentially going to have to read in order to massage and then route onward. The smaller your messages, the quicker you can determine where to route to, but the more network overhead you may incur.
One thing that springs to mind would be using some kind of streaming approach, so you're continually streaming the input data across the ESB to the ultimate destination.
I've actually been discussing the concepts of using the ESB for what would normally be considerd to be ETL and/or DB replication use cases with a few different people.
I think one key criteria that will help you make the decision is how much logic should be applied to pieces of the "message". Do you really need transformation, routing, orchestration and other forms of message/transport mediation to get the data to its final destination? If so then an ESB may be in your future, if not then perhaps an ETL tool or something built-in to your DB engine may make more sense.
It is possible for you to load in the large message from the FTP server, break it down into "chunks" where the right size of a "chunk" depends on what kinds of mediation techniques apply and how much network and storage overhead you wish to have.
For instance, let's say I received a single file from my largest business partner. That file contains 3,000 of today's new orders and electronic e-payment notices for previous orders/invoices. I might wish to break down this single file into those two types of "message" order vs e-payment notice. And then break it down into separate individual items. Perhaps drill-down into actual line-items because at that level I need to have different forms of transformations, routing, orchestration, business rules etc. For instance, a given order may have a line-item requested that requires additional federal gov't documentation for export controls on hazardous materials therefore there is an entire process around how we sale that one type of product.
Hopefully, this makes some sense. :-)
This is making perfect sense!
It seems that what I might end up with is a hybrid solution. A service that can orchestrate the ETL jobs but also break that large bulk load into smaller chunks and pass along the ESB. I can then have other services that operate on those chunks as needed, generating the messages that would typically be passed via the ESB. I think for efficiency I would need to shift the way I work on the messages from a 1 row or data item per message to a bulk format.
In your example of receiving 3,000 orders, creating a message per order or line item would be a bit more realistic. Using the same example in my situation would leave me with on average 600,000 orders and generating a message per order is a little overwhelming at this point.
Would it make sense to instead send a message that contains 1,000 or 10,000 orders and services can operate on them in bulk? For example, a service that goes through those 1,000 orders and checks for a "preferred spender" and generate a message containing the specifics of that order or customer (Lame example, but you get the idea ;) ).
Or am I still approaching this from the wrong angle?
A ruleservice could operate on 10000 orders in a message, applying 'prefered spender' rules to each order.