6 Replies Latest reply on Sep 22, 2016 4:43 PM by m.ardito

    Google spreadsheet: text cells size issue...

    m.ardito

      Another one on Google Spreadsheets,

      I am developing something to put later in production, and I'm starting to get real data (ie: not test data) from my google spreadsheets.

       

      the model is like

      <model name="gssmodel">
          <property name="importer.useFullSchemaName" value="false"/>
          <source name="gformsA" translator-name="google-spreadsheet-override" connection-jndi-name="java:/gssmodel"/>
      </model>
      

       

      and the view like

      CREATE VIEW gformsAView(
          ...
          "Descrizione segnalazione"    clob,
          ...
      )
      AS
      SELECT
          ...
          gsA."Descrizione segnalazione",
          ...
      FROM
          (call gssmodel.native('
              worksheet=A;
              query=SELECT A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U'
          )) w,
          ARRAYTABLE(w.tuple 
              COLUMNS 
                  ...
                  "Descrizione segnalazione"    clob,
                  ...
          ) AS gsA
      WHERE
          gsA."timestamp" IS NOT NULL
      

       

      And it works nicely, but I recently found a text cell size issue:

      • The google spreadsheet has a "description" column which contains a sort of "description text" for ech row item, formatted there as "Automatic".
      • Some rows have long texts in the "description" column: more than 4000 chars, and also "rich text" inside (bullet lists, special chars for quotes, etc, I guess some user pastes there texts copied elsewhere...
      • my Teiid model "gssmodel" retrieves a DDL with the column as "string" type (4000 chars limit in Teiid) ("NATIVE_TYPE 'STRING')"
      • SquirrelSQL retrieves the data as "string", witha  "column_size" of 4000
      • but selecting that column returns all real text length as in the google spreadsheet, eg: more than 4000 chars.eg:


       

      Then, since I needed a materialized view of the spreadsheet data, I created it:

      • this view gets the column value through the "gssmodel.native" method, then using ARRAYTABLE to convert that column values
      • but the "description" column offered by the view, always returns a maximum of 4000 chars
      • I tried to define the column (within ARRAYTABLE definition) as "string" or "clob" and:
        • the returned text length is always 4000 (using unix format for line feeds)
        • the returned text seems to be missing (fortunately) some "rich text" charaters causing issues with POstgres/ODBC recordset, later...
      • I also tried to define the view column as blob, and then use the "TO_CHARS" string function ("Return a clob from the blob with the given encoding.")
        but this always failed because the Teiid engine seems to always try to use a "CONVERT" function before, which fails... with something like:

        if my view query was something like
        SELECT TO_CHARS(gsA."Descrizione segnalazione", 'UTF-8') FROM gsA

        I got something this:
        "2016-09-16 10:01:06,903 ERROR [org.teiid.PROCESSOR.MATVIEWS] (Worker5518_QueryProcessorQueue51833) YQVO+HaeqoHu TEIID30015 Failed to load materialized view table #MAT_GFORMSVIEW.GFORMSAPITOVIEW.: org.teiid.api.exception.query.ExpressionEvaluationException: TEIID30328 Unable to evaluate TO_CHARS(convert(array_get(w.tuple, 7), blob), 'UTF-8'): TEIID30384 Error while evaluating function convert

              at org.teiid.query.eval.Evaluator.evaluate(Evaluator.java:728)"

       

      so I guessed this couldn't work...

       

      Since Google spreadsheet seem to have no limit, now, as cell text length (Files you can store in Google Drive - Drive Help ), I guess the Teiid "string" format could be a bit short...

      But somehow, although the model DDL defines the column as "string", Squirrel can get all field content, as if it was a clob... perhaps.

       

      I could live with that, but I need to let my custom view to behave the same, if possible (mostly because I need materialization)... How can I do? (Of course I can post more details, try whatever, to reproduce the issue...)

       

      Thanks

        • 1. Re: Google spreadsheet: text cells size issue...
          shawkins

          > but selecting that column returns all real text length as in the google spreadsheet, eg: more than 4000 chars.eg:

           

          Yes.  That is a not too well documented feature of the value system - anything we create is expected to conform, but the translators are open to return whatever values they want.  The expectation, at least for typical JDBC imports, is that the source column length will be set on the source model appropriately.  However in the case of a native call, we're just using the default metadata.

           

          >  the returned text length is always 4000 (using unix format for line feeds)

           

          If the arraytable and the view are both clob, you will have the whole cell text available to you correct?

           

          > I also tried to define the view column as blob, and then use the "TO_CHARS" string function ("Return a clob from the blob with the given encoding.")

           

          convert(array_get(w.tuple, 7), blob) - should fail if the value is not a binary type already.  So you would need to use to_bytes, then to_chars - but this shouldn't be necessary as using the clob type directly should accomplish the same thing.

           

          > I could live with that, but I need to let my custom view to behave the same, if possible (mostly because I need materialization)... How can I do? (Of course I can post more details, try whatever, to reproduce the issue...)

           

          Can you clarify what is your desired behavior - do you want the length enforced, or do you want the full content?

           

          • 2. Re: Google spreadsheet: text cells size issue...
            m.ardito
            > the returned text length is always 4000 (using unix format for line feeds)

             

            If the arraytable and the view are both clob, you will have the whole cell text available to you correct?

             

            No, atm the view is like that (the above view snippetis taken from the currently active VDB xml), but while the model returns full data, the view stops instead at 4000 chars... weird...

             

            > I also tried to define the view column as blob,

             

            convert(array_get(w.tuple, 7), blob) - should fail if the value is not a binary type already. So you would need to use to_bytes, then to_chars - but this shouldn't be necessary as using the clob type directly should accomplish the same thing.

             

            Yes, I guessed that, at the time, but it was just to expose all tests I did...

             

            > I could live with that, but I need to let my custom view to behave the same, if possible (mostly because I need materialization)... How can I do? (Of course I can post more details, try whatever, to reproduce the issue...)

             

            Can you clarify what is your desired behavior - do you want the length enforced, or do you want the full content?

             

            Sorry that sentence wasn't clear enough.

             

            I meant: since (as it is now) the model returns all available (or so it seems) cell content, I could fetch the values from the model directly, instead of using a view, but I need to have materialized access to that data for performance reasons... so the best solution would be to let the view get all the data that the model can see (ideally, all the data contained in google spreadsheets'cells, whatever the text length)...

             

            I hope it is clearer now...

            • 3. Re: Google spreadsheet: text cells size issue...
              shawkins

              > No, atm the view is like that (the above view snippetis taken from the currently active VDB xml), but while the model returns full data, the view stops instead at 4000 chars... weird...

               

              Using just the clob type, you would expect to see the whole thing.  It looks like the issue is that the implicit array_get operation is truncating the string - but it should not be.  That will require an issue to fix.

               

               

              • 4. Re: Google spreadsheet: text cells size issue...
                m.ardito

                Thanks, I will create issue shortly, [edit: [TEIID-4460] A view field defined as clob, on a google spreadsheeet model, returns data truncated to string type length … ]

                 

                Anyway, here you find a better view of how that field is exposed differently in the model and in the view:

                 

                I double checked, and the view is exactly like above, and I see this DDL in Teiid:

                 

                for the model I find this for the source google spreadsheet

                CREATE FOREIGN TABLE A (
                    ...
                    "Descrizione segnalazione" string OPTIONS (NAMEINSOURCE 'Descrizione segnalazione', UPDATABLE FALSE, NATIVE_TYPE 'STRING'),
                    ...
                ) OPTIONS (NAMEINSOURCE 'A', UPDATABLE TRUE);
                

                 

                while the view has this DDL exposed:

                 

                CREATE VIEW gformsAView (
                    ...
                    "Descrizione segnalazione" clob,
                    ...
                ) 
                AS
                SELECT 
                    ...
                    gsA."Descrizione segnalazione",
                    ...
                FROM (
                    EXEC gssmodel.native('worksheet=A;query=SELECT A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U')) AS w, 
                    ARRAYTABLE(w.tuple 
                        COLUMNS 
                            ...
                            "Descrizione segnalazione" clob, 
                            ...
                    ) AS gsA
                WHERE 
                    gsA."timestamp" IS NOT NULL;
                

                 

                I just omitted other fields, and replaced table/view names to something more understandable.

                 

                Unfortunetely the LENGTH string function seems to not accept columns defined as clobs, as for

                SELECT s."Descrizione segnalazione", length(s."Descrizione segnalazione")
                FROM gformsAView AS s where s.Data='2016-05-18'
                

                which works on the model field, and gives that "4622" length value I posted before

                 

                while trying that in the view I get:

                Error: TEIID30070 Remote org.teiid.api.exception.query.QueryResolverException: TEIID30070 The function 'length(s."Descrizione segnalazione")' is a valid function form, but the arguments do not match a known type signature and cannot be converted using implicit type conversions.
                SQLState:  50000
                ErrorCode: 30070
                

                 

                so I can't obtain a result to get the length of that field's values in the view, but believe me, copied the cell content in a text editor and counted, it's 4000.

                • 5. Re: Google spreadsheet: text cells size issue...
                  shawkins

                  > Unfortunetely the LENGTH string function seems to not accept columns defined as clobs, as for

                   

                  There are a lot of string functions that currently only operate on the string type - [TEIID-1248] Clob usability - JBoss Issue Tracker covers expanding our logic, but the length function has not been specifically called out yet.

                   

                  > so I can't obtain a result to get the length of that field's values in the view, but believe me, copied the cell content in a text editor and counted, it's 4000.

                   

                  I believe you - that will be addressed by TEIID-4460

                  • 6. Re: Google spreadsheet: text cells size issue...
                    m.ardito

                    Thank you Steven,

                    this will surely help us to make users of google spreadsheets more confident that their data can be integrated well within applications.