4 Replies Latest reply on Jan 8, 2015 7:27 AM by shiveeta.mattoo

    Additional encodings support for File data source

    shiveeta.mattoo

      Hi,

       

      We are seeking support for additional encodings while defining a File Data Source, specifically for -

      CP850

      CP500

      MS932

      EBCDIC

       

      Regards,

      Shiveeta

        • 1. Re: Additional encodings support for File data source
          shawkins

          There are several mechanisms for controlling encodings.  On the file translator you can set the encoding property.  Any call to getTextFiles will then return clobs with that encoding.  You can also use the system function to_bytes and to_chars - https://docs.jboss.org/author/display/TEIID/String+Functions

          • 2. Re: Additional encodings support for File data source
            shiveeta.mattoo

            Thanks Steven.

             

            1. I tried the system function to_chars along with getFiles to get the file clob in desired encoding. The query is :

             

            SELECT A.AddressLine1,A.zip FROM (TO_CHARS((EXEC MS932.getFiles('CP932_Write.txt')),"MS932")) AS F, TEXTTABLE(F.file COLUMNS AddressLine1 STRING,zip STRING delimiter ',' quote ' ' HEADER

            However I get a Query parse exception -

            org.teiid.api.exception.query.QueryParserException: TEIID31100 Parsing error: Encountered "FROM (TO_CHARS[*]([*](EXEC" at line 1, column 43.

            Was expecting: "as" | "cross" | "full" | "inner" | "join" | "left" | "makedep" | "makenotdep" | "right" | "union" ...

             

            Please let me know what is wrong in the Query syntax.

             

            2. I am concerned, that the previous approach may cause some performance impact, as there is an additional step of converting the blob to desired encoding after getFiles invocation.

            On setting the encoding at the translator level, I understand it would be applicable to all VDB deploy calls, since EmbeddedServer.java, maintains a map of translator instances which are being reused on VDB deploys invoked for a datasource.

             

            Is there any other way to set the translator properties at runtime, so that it can be uniquely specified for each VDB deployment invocation ?

            • 3. Re: Additional encodings support for File data source
              shawkins

              1. You need to apply the function to the scalar argument and not to the from clause:

               

              SELECT ... FROM (EXEC MS932.getFiles('CP932_Write.txt')) AS F, TEXTTABLE(TO_CHARS(F.file,"MS932") ...


              2. On performance the TO_CHARS function accepts a well formed parameter which can allow you to skip validation.  This is nearly the same as what the translator does when you ask it to get a text file.


              > Is there any other way to set the translator properties at runtime, so that it can be uniquely specified for each VDB deployment invocation ?


              Are you looking for VDB scoped translators for embedded?  Yes, that was left out of embedded for simplicity.  You'd have to log an enhancement for that.


              1 of 1 people found this helpful
              • 4. Re: Additional encodings support for File data source
                shiveeta.mattoo

                - Thank you Steven. The query worked with the changes specified

                - I have also raised the enhancement request for VDB scoped translators - TEIID-3280.