6 Replies Latest reply on Dec 4, 2007 9:53 PM by jeffdelong

    Real world example process flow

    jmorgan

      Hey guys first post so bear with me,

      I'm evaluating JBossESB to solve pretty much the same "real world" problem that is presented in the May deep dive slides Burr Sutter put together and I have a few questions.

      I am basically looking at the scenario of different systems dumping bulk data via different methods. Some are ftp flat files, some pulled from web services, etc. I want to take this bulk data, massage it as necessary and deposit it into a data storage area (ie Oracle DB). The idea of leveraging the ESB to be able to do content-based routing as well as utilizing all the other "free" services is very appealing.

      The issue I have come across is how to do this in a efficient fashion. I'm not looking at a lot of data for example, one file can contain anywhere from a couple thousand records to just under a million and eventually growing from there. Operating on the data in a "per-row" fashion doesn't seem to be very efficient when compared to loading it with sqlldr or something similar.

      Is there a recommended solution to this?

      The only thing I can think of would be to have one service that sends the bulk data to the database with sqlldr but also splits the data into rows and fires those rows off into a service that can then inspect the data to do content based routing there.

      I imagine this is a common scenario (maybe not?) so if anyone has any insight I would LOVE to hear it.

      Thanks,
      JD

        • 1. Re: Real world example process flow
          marklittle

          What are the natural sources of your data, i.e., ignoring ESB for now.

          • 2. Re: Real world example process flow
            jmorgan

            The natural source of the data will vary from source, but the majority will be coming in tab-delimited flat files delivered (or pulled via FTP) daily from enterprise applications.

            • 3. Re: Real world example process flow
              marklittle

              There's going to be a trade-off between message size and processing power required. The larger your messages, the more you're potentially going to have to read in order to massage and then route onward. The smaller your messages, the quicker you can determine where to route to, but the more network overhead you may incur.

              One thing that springs to mind would be using some kind of streaming approach, so you're continually streaming the input data across the ESB to the ultimate destination.

              • 4. Re: Real world example process flow
                burrsutter

                I've actually been discussing the concepts of using the ESB for what would normally be considerd to be ETL and/or DB replication use cases with a few different people.

                I think one key criteria that will help you make the decision is how much logic should be applied to pieces of the "message". Do you really need transformation, routing, orchestration and other forms of message/transport mediation to get the data to its final destination? If so then an ESB may be in your future, if not then perhaps an ETL tool or something built-in to your DB engine may make more sense.

                It is possible for you to load in the large message from the FTP server, break it down into "chunks" where the right size of a "chunk" depends on what kinds of mediation techniques apply and how much network and storage overhead you wish to have.

                For instance, let's say I received a single file from my largest business partner. That file contains 3,000 of today's new orders and electronic e-payment notices for previous orders/invoices. I might wish to break down this single file into those two types of "message" order vs e-payment notice. And then break it down into separate individual items. Perhaps drill-down into actual line-items because at that level I need to have different forms of transformations, routing, orchestration, business rules etc. For instance, a given order may have a line-item requested that requires additional federal gov't documentation for export controls on hazardous materials therefore there is an entire process around how we sale that one type of product.

                Hopefully, this makes some sense. :-)

                Burr

                • 5. Re: Real world example process flow
                  jmorgan

                  This is making perfect sense!

                  It seems that what I might end up with is a hybrid solution. A service that can orchestrate the ETL jobs but also break that large bulk load into smaller chunks and pass along the ESB. I can then have other services that operate on those chunks as needed, generating the messages that would typically be passed via the ESB. I think for efficiency I would need to shift the way I work on the messages from a 1 row or data item per message to a bulk format.

                  In your example of receiving 3,000 orders, creating a message per order or line item would be a bit more realistic. Using the same example in my situation would leave me with on average 600,000 orders and generating a message per order is a little overwhelming at this point.

                  Would it make sense to instead send a message that contains 1,000 or 10,000 orders and services can operate on them in bulk? For example, a service that goes through those 1,000 orders and checks for a "preferred spender" and generate a message containing the specifics of that order or customer (Lame example, but you get the idea ;) ).

                  Or am I still approaching this from the wrong angle?

                  • 6. Re: Real world example process flow
                    jeffdelong

                    A ruleservice could operate on 10000 orders in a message, applying 'prefered spender' rules to each order.