6 Replies Latest reply on Mar 3, 2017 2:12 PM by rareddy

    HDFS as a source from Teiid

    sanjay_chaturvedi

      We have a file stored in HDFS  Hadoop cluster on some box, Is there any way in teiid to take this file as a source to create view model.

      We have successfully accessed Hadoop objects using hive but will it be same for a JSON file distributed over cluster.

       

      Please assist.

       

      Thanks,

      Sanjay

        • 1. Re: HDFS as a source from Teiid
          rareddy

          Sanjay,

           

          Unfortunately there is no HDFS based resource adapter, which is what ones needs then you can use the TEXTTABLE to parse the contents if they are in CSV, or XMLTABLE for the JSON. I do not think this very complicated to do, take a look at Developer's Guide to write a resource adapter. This may be a good opportunity for you to contribute back to Teiid community

           

          See the corresponding JIRA [TEIID-3647] Create native connector to interact with HDFS as a datasource - JBoss Issue Tracker

           

          Ramesh..

          • 2. Re: HDFS as a source from Teiid
            sanjay_chaturvedi

            Thanks Ramesh, would love to do so.

            Btw can we make connection to Apache drill in that case, any pointer please.

            Do we have resource adapter and translator for this. Or can we use some alternative misal components.

             

            Thanks.

            • 3. Re: HDFS as a source from Teiid
              rareddy

              Apache Drill is interesting proposition, this is query engine is some what similar to Teiid, but has a distributed execution capabilities. So, if you are just trying to call using the SQL then it may be possible using existing "jdbc-simple" or "jdbc-ansi" translators (we have not tried it).

               

              However, there is roadmap thought in Teiid as to how we can either leverage or contribute into Apache Drill community with Teiid optimizer engine. This is a very long pole, essentially shifts Teiid architecture totally. So there is no decisions/ideas forward there yet on future direction.

               

              Ramesh..

              • 4. Re: HDFS as a source from Teiid
                sanjay_chaturvedi

                Hi Ramesh,

                 

                Thanks for the info.

                 

                Even from the teiid designer I tried to make connection to drill using JDBC importer. Translator I used were jdbs-ansi and jdbc-simple. But both ended up with following error:

                Caused by: java.lang.NullPointerException

                  at oadd.org.apache.calcite.avatica.AvaticaConnection.isReadOnly(AvaticaConnection.java:176)

                  at org.apache.drill.jdbc.impl.DrillConnectionImpl.isReadOnly(DrillConnectionImpl.java:452)

                  at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnection.<init>(BaseWrapperManagedConnection.java:199)

                  at org.jboss.jca.adapters.jdbc.local.LocalManagedConnection.<init>(LocalManagedConnection.java:62)

                  at org.jboss.jca.adapters.jdbc.local.LocalManagedConnectionFactory.getLocalManagedConnection(LocalManagedConnectionFactory.java:336)

                 

                I know its coming from Drill, but I am completely stucked on this. Would be great to have some assistance around it.

                I used drill-jdbc-all-1.9 jar for this.This jar is single enough to make connection to drill as it includes sufficient dependencies as well. drill-jdbc-all-1.9.0.jar

                 

                Thanks,

                Sanjay

                • 5. Re: HDFS as a source from Teiid
                  sanjay_chaturvedi

                  An update, JDBC importer still not worked.

                  But somehow if connection can be managed, I found source query appends table name as a prefix before column name. In that case it doesnot work with Drill.

                  Ex.

                   

                  SELECT PNRId FROM dfs.compleat.pnr_view  ===============this work in drill

                  but   SELECT dfs.compleat.pnr_view.PNRId FROM dfs.compleat.pnr_view            ====this is not working in Drill.

                   

                  I tried giving column name in source as PNRId only, but it always adding table name as prefix. Any way to skip that ? I know multi table same column problem will occur, but still..ne guess..

                   

                  Thanks,

                  Sanjay

                  • 6. Re: HDFS as a source from Teiid
                    rareddy

                    If this driver is not following some what strict JDBC rules, then a specific Apache Drill translator is required before it can be used.