14 Replies Latest reply on Jun 13, 2013 1:34 AM by preeti1987060

    How to capture source data changes in destination datasource using teiid

    preeti1987060 Newbie

      Hi,

      How to track source changes and update in destination using teiid .Suppose I have a source where few columns got deleted ,modified and new records are inserted today.How to perform delta load to track all these using teiid and update destination?

      If any online videos with examples present ,request you to share.

      Thanks in advance.

        • 1. Re: How to capture source data changes in destination datasource using teiid
          Ramesh Reddy Master

          Preeti,

           

          How are the updates being done, through Teiid? If the inserts, updates and deletes are being performed only through Teiid, then you can have triggers to additional stuff what you want elsewhere. However if the updates are occrring independent of Teiid, then currently there is no possible way Teiid knows about them automatically. What we suggest, if you can install a some kind of data capture event source in your physical sources, then event can be propagated to Teiid (may be through JMS etc.) and take necessary actions in the Teiid. Once you capture the event, Teiid does provide a way to handle them. Look for EventDistributor interface in Teiid documents.

           

          As per the deletion of columns usecase

          1) Teiid does not allow DDL command to delete the columns.

          2) Once you delete the columns on your physical source through other means, if you did not update the VDB of the changes you made, then queries against that table through VDB may fail, depending upon usage of non-existent columns in the source.

           

          Sorry, no videos. I will consider a blog article on this subject, but if you want to write something once you figure it out let me know, I will link the article on Teiid blog. BTW, I highly encourage that and appreciate it too. -:)

           

          Thanks

           

          Ramesh..

          • 2. Re: How to capture source data changes in destination datasource using teiid
            preeti1987060 Newbie

            Actually we do incremental loads everything using ETL ,so wanted to know if we can use TEIID in place of ETL tool for the use cases stated above.Could you please explain in detail if there is no direct possibilty of Teiid to handle these usecases.As u are saying Triggers creation on source side would suffice,in that case maintenance cost would be high I suppose as we will b using triggers to capture delete,insert,update.Is there any other way ?

            We do incremental,scd tracking in ETL s ,can t we cover those use cases using Teiid?If not directly,what are all the other tools that can be used with Teiid to suffice the requirements.

             

            Thanks.

            • 3. Re: How to capture source data changes in destination datasource using teiid
              Ramesh Reddy Master

              Can you explain what is SCD tracking in the ETL and how it works? I am not familiar with the term.

               

              It should be possible to replace with the ETL given the usecase is feasible.

               

              Ramesh..

              • 4. Re: How to capture source data changes in destination datasource using teiid
                preeti1987060 Newbie

                scd in etl is slowly changing dimension .There are 3 cases in this.

                 

                1.one is if source data changes,directly truncate and reload the target data.

                2.if source data changes,i.e ,if  a new record gets inserted in source,insert the same in target else update .

                This can be done based on incremental load on daily or monthly or weekly etcc  .

                It also covers delete .

                Example:

                 

                If a week ago,new records got inserted,few records modified and few records got deleted,it should capture the same and update the target as well for insert,update and delete to be in sync with source.(This is one use case)

                 

                The other use case which is complex with ETL also is ,

                For a weekly load,if few columns got added to source(can be dynamic also ) which is SQL and if the target is oracle and after a week ,along with incremental load if you will have to update the  target table structure followed by incremental load of 1st use case ,how easy it is with Teiid to cover these two use cases or either of the use case.If its complex ,what is the procedure to obtain this ?...

                 

                Could you please explain in detail the steps to be followed to cover the use cases mentioned.

                 

                Thanks in advance.

                • 5. Re: How to capture source data changes in destination datasource using teiid
                  Ramesh Reddy Master

                  Preeti,

                   

                  Thank you for the explanation. 

                   

                  Now, a follow up question. In the first scenario, how are sources indicating that there is update in the source table? I mean what is event that makes the ETL tool wake up, and run the either truncate or incremental script?

                   

                  On the second one, if there are additional columns added to the source table, then that usecase can *not* be handled in Teiid. You would need to rebuild the vdb, and re-populate the materialization tables. Currently there is no way t dynamically add columns to the view table in a deployed VDB.

                   

                  Thanks

                   

                  Ramesh..

                  • 6. Re: How to capture source data changes in destination datasource using teiid
                    preeti1987060 Newbie

                    Hi ,

                    Thanks for your quick reply.Basically in Informatica we have look up concept which acts as a transformation .By doing a lookup on the target,if the source incoming primary key is null ,it means its a new record which should be inserted,else it does an update .

                    Regarding truncate option ,before loading into the target there is pre sql command which can be run where different options are provided ,out of which one of them is truncate  and reload.

                     

                    Likewise,do we have any transformations in teiid ?if not how these things are handled.

                     

                    For the second one,rebuilding the vdb might consume more time .Is nt it?These are the real scenerios faced when multiple applications are integrated in real environment.Columns can be dynamically added or deleted at the source side,so each time checking the source manually for any structure level changes and then deleting the target structure and reloading might require huge efforts.Is it not the major blockage if this is not automated?

                    • 7. Re: How to capture source data changes in destination datasource using teiid
                      Ramesh Reddy Master

                      Preeti,

                      preeti1987060 wrote:

                      Basically in Informatica we have look up concept which acts as a transformation .By doing a lookup on the target,if the source incoming primary key is null ,it means its a new record which should be inserted,else it does an update .

                      If *you* are writing these lookups, then same can be done in the Teiid, you can query against the materialized tables and find out if there record associated with the given PK or not and take necessary action of insert/update of the record. The frequency at which you do these lookups is entirely dependent upon your process which is doing these lookups. In the earlier post I was talking about, what is the event that makes your process to engage in that lookup and do a insert/update? The process that supplies the Pk for your lookup is important.

                       

                      Understand that the inserts/updates to the source database(s) are occurring independent of Teiid directly though other legacy applications, thus Teiid can not capture those events. If the inserts/updates are *only* going through the Teiid then it is possible to capture the these events and take appropriate actions with some amount code wiring.

                       

                      preeti1987060 wrote:

                      Regarding truncate option ,before loading into the target there is pre sql command which can be run where different options are provided ,out of which one of them is truncate  and reload.

                       

                      complete reload aka truncate is simplest of scenarios. You can all SYSADMIN.refreshMatView. See this for more details https://docs.jboss.org/author/display/TEIID/Internal+Materialization

                       

                      preeti1987060 wrote:

                       

                      For the second one,rebuilding the vdb might consume more time .Is nt it?These are the real scenerios faced when multiple applications are integrated in real environment.Columns can be dynamically added or deleted at the source side,so each time checking the source manually for any structure level changes and then deleting the target structure and reloading might require huge efforts.Is it not the major blockage if this is not automated?

                       

                      It depends. Teiid is not a ETL tool. I would imagine if there are additional columns added or removed, certainly the down stream applications that are consuming these data also needs to be changed to make use of this newly found/removed data. Similarly, if the Teiid falls in same category, you need to modify its VDB to take advantage of that. Actually this *is* one of the huge benefits of a Data Virtualzation system, where they provide a *abstract* layer that insulate your upstream applications that are consuming data from the down stream database changes.

                       

                      As per rebuilding the VDB, if you are working with Dynamic VDBs, you can either read the new schema from source by redeploying a VDB for physical models, or you can even programically build the Dynamic VDB with both physical and view models using updated DDL and redeploy into server to take advantage of new columns. At the same time if you using a Designer based VDB, yes that does require more hands on approach. So, to answer your question, yes it possible to automate in rebuilding the VDB. It does matter what type of VDB that is.

                       

                       

                      Ramesh..

                      • 8. Re: How to capture source data changes in destination datasource using teiid
                        preeti1987060 Newbie

                        Thanks for the info. Ramesh.I am just lookin an online tutorial where I can try out all these use cases .I have gone throu teiid docs but I need some practical examples if present .Let me check the views concept in teiid and get back to you if any queries.

                         

                        One more question ,is there any limitation for the number of records if huge and also performance wise how does it behave?If  I have a db residing in another network and db residing in local and integration has to take place for huge data sets ,how does it behave?Any performance probs do we face?

                         

                        Thanks again.

                        Preeti

                        • 9. Re: How to capture source data changes in destination datasource using teiid
                          Ramesh Reddy Master

                          Preeti,

                           

                          We have many examples,  start with Quick Start Example, then there are more example in the Teiid download. Then look at the Teiid Designer, there there are more examples. Designer provides a UI environemnt to develop your VDBS.

                           

                          Teiid can handle fairly large datasets as it has pretty good memory management system. Ofcourse more memory you can allocate can make a difference. The performance really depends upon the your sources and type of query you are running. It has lot of optimizations to rewrite the query such that most of the work is pushed to sources and rest is handled in the Teiid layer. Look at the Reference Guide to learn more.

                           

                          Redhat does provide Teiid as a product Enterprise Data Services and it sells subscription for it, and it used by many organizations in their production environments. Teiid is also has long history as metamatrix, check out the history on the Teiid.

                           

                          BTW, I am still interested in your answer about the event that starts out the update script.

                           

                          Thanks

                          Ramesh..

                          • 10. Re: How to capture source data changes in destination datasource using teiid
                            preeti1987060 Newbie

                            Thanks .There is no direct concept such as events in etl,we have a transformation by name Update ,so its like a flow where in it does a lookup on target so whenever a row comes as an input from source it compares it with the target primary key based on that there r transformation operations  that get processed either as insert or update or delete ,so logic would b written in such a way after getting data from source before sending it to target that the transformation logic gets run and then loaded into target as separate routes for insert and update .

                            • 11. Re: How to capture source data changes in destination datasource using teiid
                              Ramesh Reddy Master

                              Preeti,

                               

                              If you could write the same lookup using the Teiid, you could use same technique to keep the rows updated too. The case in Teiid gets little complicated because you may be dealing with multiple data sources and at a view layer which is abstracted over the physical data layer. But certainly possible.

                               

                              What some of our uses have done is much more pro-active integration, where when a record is inserted into the database, or new file gets created and JMS event is generated and then JMS event handler (much like your lookup) will update/insert the row in the materialization. This is ofcourse, is when your tolerance for staleness is data is very low and like to keep the data very fresh or near fresh. Otherwise depending upon the volume of the data, tolerance for freshness and complexity and time taken to render the whole data you can just truncate and regenerate each load/refresh cycle easily.

                               

                              Ramesh..

                              • 12. Re: How to capture source data changes in destination datasource using teiid
                                preeti1987060 Newbie

                                Ok,What if suppose I wanted to store history also in the same db like ,I wantedto store history data for an yr and also the new updates.In that case ,how is that possible with teiid?

                                Example:

                                I wanted to store latest current data (can be real time ) and also history for an yr.After an yr the history data should b moved to a stage table (archived) and from then again history would b maintained with latest.

                                • 13. Re: How to capture source data changes in destination datasource using teiid
                                  Ramesh Reddy Master

                                  Preeti,

                                   

                                  Just have two different views one for the yearly data and one for the current and have both of them materialized, and have the updates to them appropriately. Have another view on top of both these views that unions to be used in your end user applications. Since data is really not copied when you create a view in Teiid they are not expensive to create and maintain as in ETL tool.

                                   

                                  Ramesh..

                                  • 14. Re: How to capture source data changes in destination datasource using teiid
                                    preeti1987060 Newbie

                                    Hi Ramesh,

                                    You said you would be considering a blog article on this subject, is it published now?if so pls do let me know .

                                     

                                    Thanks