12 Replies Latest reply on Jan 29, 2018 5:06 PM by andriyg

    Wrong result set for query against Mongo and MSSql datasources

    andriyg

      hello,

      I have two datasorces: MongoDB and MS SQL Server. And trying to perform join across these datasources with TEIID

       

      here is DDL generated for tables I'm trying to join

       

      CREATE FOREIGN TABLE test_table_mssql (

        id_column string(250) OPTIONS (NAMEINSOURCE '"id_column"', NATIVE_TYPE 'varchar'),

        numeric_col integer OPTIONS (NAMEINSOURCE '"numeric_col"', NATIVE_TYPE 'int')

      ) OPTIONS (NAMEINSOURCE '"foodmart"."dbo"."test_table_mssql"', UPDATABLE TRUE, CARDINALITY 31);

       

      CREATE FOREIGN TABLE MONGO_COLUMNS (

        MONGO_ID_COLUMN string(255) OPTIONS (NAMEINSOURCE '"MONGO_ID_COLUMN"', CHAR_OCTET_LENGTH 510, NATIVE_TYPE 'STRING'),

        "_ID" string(24) NOT NULL OPTIONS (NAMEINSOURCE '"_ID"', CHAR_OCTET_LENGTH 48, NATIVE_TYPE 'STRING'),

        MONGO_NUMERIC_COLUMN integer OPTIONS (NAMEINSOURCE '"MONGO_NUMERIC_COLUMN"', NATIVE_TYPE 'INTEGER'),

        CONSTRAINT SYS_PK_10147 PRIMARY KEY("_ID"),

        CONSTRAINT SYS_IDX_10146 UNIQUE("_ID")

      ) OPTIONS (NAMEINSOURCE '"TEST"."MONGO_COLUMNS"', UPDATABLE TRUE);

       

      Query to select MONGO_COLUMNS.MONGO_ID_COLUMNS

       

      select "mongodb_jdbc_ds_TEST_MONGO_COLUMNS"."MONGO_ID_COLUMN" as "mongodb_jdbc_ds_TEST_MONGO_COLUMNS_MONGO_ID_COLUMN"

      from "mongodb_jdbc_ds_TEST"."MONGO_COLUMNS" "mongodb_jdbc_ds_TEST_MONGO_COLUMNS"

      limit 1000

       

      returns 15 records: id1 id3 id5 id7 id9 11 13 15 17 19 id21 id23 id25 id27 id29

       

      while query to select "sql_serv_native_jdbc_dbo_test_table_mssql"."id_column"

       

      select "sql_serv_native_jdbc_dbo_test_table_mssql"."id_column" as "sql_serv_native_jdbc_dbo_test_table_mssql_id_column"

      from "sql_serv_native_jdbc_dbo"."test_table_mssql" "sql_serv_native_jdbc_dbo_test_table_mssql"

      limit 1000

       

      returns 31 record: id99 id1 id2 id3 id4 id5 id6 id7 id8 id9 id10 11 12 13 14 15 16 17 18 19 id20 id21 id22 id23 id24 id25 id26 id27 id28 id29 id30

       

      FINALLY query from both joined datasources

       

      select "mongodb_jdbc_ds_TEST_MONGO_COLUMNS"."MONGO_ID_COLUMN" as "mongodb_jdbc_ds_TEST_MONGO_COLUMNS_MONGO_ID_COLUMN",

      "sql_serv_native_jdbc_dbo_test_table_mssql"."id_column" as "sql_serv_native_jdbc_dbo_test_table_mssql_id_column"

      from "mongodb_jdbc_ds_TEST"."MONGO_COLUMNS" "mongodb_jdbc_ds_TEST_MONGO_COLUMNS"

      inner join "sql_serv_native_jdbc_dbo"."test_table_mssql" "sql_serv_native_jdbc_dbo_test_table_mssql" on (("mongodb_jdbc_ds_TEST_MONGO_COLUMNS"."MONGO_ID_COLUMN" = "sql_serv_native_jdbc_dbo_test_table_mssql"."id_column"))

      limit 1000

       

      returns only 10 records:  id1 id3 id5 id7 id9 id21 id23 id25 id27 id29

       

      skipping integral values: numbers like 11, 13, 15 etc, while it contains string values: id1, id3 etc

       

      This issue is reproduced only when I try to perform join across MondDB and MS SQL Server datasources

      I was NOT able to reproduce it for

      1. Mysql to MongoDB join

      2. Mysql to MS SQL join

      3. MS Sql to MS Sql joins through teiid

        • 1. Re: Wrong result set for query against Mongo and MSSql datasources
          shawkins

          Can you post the query plan for the MongoDB to SQL Server join?

           

          Seeing how the join is being processed - in particular where the predicates are being evaluated will let us know how to address the issue.

          • 2. Re: Wrong result set for query against Mongo and MSSql datasources
            andriyg

            ok, I've enable query plan for connection. Here is link for file with plan

             

            here is plan description

            • 3. Re: Wrong result set for query against Mongo and MSSql datasources
              shawkins

              Can you try preventing the creation of the dependent join:

              ...

              from /*+ MAKENOTDEP */ "mongodb_jdbc_ds_TEST"."MONGO_COLUMNS" "mongodb_jdbc_ds_TEST_MONGO_COLUMNS"

              ...

               

              That should confirm that the over constraining is coming from the push down of the predicate "mongodb_jdbc_ds_TEST.MONGO_COLUMNS.MONGO_ID_COLUMN IN (<dependent values>)"  - as from the the plan description we see only 10 rows coming from the mongodb source query.

               

              As to why there may be an issue from MS SQL Server and the not the other databases may have to do with something non-obvious with the values, such as padding - what does the DDL definition of the table look like in MS SQL Server?

              • 4. Re: Wrong result set for query against Mongo and MSSql datasources
                andriyg

                hello,

                thank you for following my case up

                 

                I"ve tried to turn /*+MAKENOTDEP*/ macro and now query returns RIGHT resultset.

                 

                select "mongodb_jdbc_ds_TEST_MONGO_COLUMNS"."MONGO_ID_COLUMN" as "mongodb_jdbc_ds_TEST_MONGO_COLUMNS_MONGO_ID_COLUMN",

                "sql_serv_native_jdbc_dbo_test_table_mssql"."id_column" as "sql_serv_native_jdbc_dbo_test_table_mssql_id_column"

                from  /*+ MAKENOTDEP */ "mongodb_jdbc_ds_TEST"."MONGO_COLUMNS" "mongodb_jdbc_ds_TEST_MONGO_COLUMNS"

                inner join "sql_serv_native_jdbc_dbo"."test_table_mssql" "sql_serv_native_jdbc_dbo_test_table_mssql" on (("mongodb_jdbc_ds_TEST_MONGO_COLUMNS"."MONGO_ID_COLUMN" = "sql_serv_native_jdbc_dbo_test_table_mssql"."id_column"))

                limit 1000

                 

                resulting: id1 id3 id5 id7 id9 11 13 15 17 19 id21 id23 id25 id27 id29

                 

                Here is query plan description

                here is ms sql server table DDL... nothing special

                 

                create table test_table_mssql

                (

                id_column varchar(250),

                numeric_col int(10)

                )

                go

                • 5. Re: Wrong result set for query against Mongo and MSSql datasources
                  andriyg

                  I've decided to generate query plan for similar queries for mysql-mongo and postgres-mongo datasources: both generates right resultset

                   

                  query plan and query for mysql-mongodb case:

                   

                  select "MysqlCase1558724DS_test_mysql_columns"."id_col" as "MysqlCase1558724DS_test_mysql_columns_id_col",

                  "mongodb_jdbc_ds_TEST_MONGO_COLUMNS"."MONGO_ID_COLUMN" as "mongodb_jdbc_ds_TEST_MONGO_COLUMNS_MONGO_ID_COLUMN"

                  from "mongodb_jdbc_ds_TEST"."MONGO_COLUMNS" "mongodb_jdbc_ds_TEST_MONGO_COLUMNS"

                  inner join "MysqlCase1558724DS"."test_mysql_columns" "MysqlCase1558724DS_test_mysql_columns" on (("mongodb_jdbc_ds_TEST_MONGO_COLUMNS"."MONGO_ID_COLUMN" = "MysqlCase1558724DS_test_mysql_columns"."id_col"))

                  limit 1000

                   

                  query plan and query for postgres-mongodb case:

                   

                  select "Postgres_Foodmart_public_test_mysql_columns"."id_col" as "Postgres_Foodmart_public_test_mysql_columns_id_col",

                  "mongodb_jdbc_ds_TEST_MONGO_COLUMNS"."MONGO_ID_COLUMN" as "mongodb_jdbc_ds_TEST_MONGO_COLUMNS_MONGO_ID_COLUMN"

                  from "mongodb_jdbc_ds_TEST"."MONGO_COLUMNS" "mongodb_jdbc_ds_TEST_MONGO_COLUMNS"

                  inner join "Postgres_Foodmart_public"."test_mysql_columns" "Postgres_Foodmart_public_test_mysql_columns" on (("mongodb_jdbc_ds_TEST_MONGO_COLUMNS"."MONGO_ID_COLUMN" = "Postgres_Foodmart_public_test_mysql_columns"."id_col"))

                  limit 1000

                   

                  so yes, in both cases as seen from query plans there was NO dependant join generated so resultsets are right.

                   

                  PS. Is there some way to reveal/display real <dependent values>? Though I understand that in this case it's obvious that they will not contain integral values

                  • 6. Re: Wrong result set for query against Mongo and MSSql datasources
                    shawkins

                    > so yes, in both cases as seen from query plans there was NO dependant join generated so resultsets are right.

                     

                    The next best test then is to directly issue the mogodb query against Teiid with the in predicate:

                     

                    SELECT mongodb_jdbc_ds_TEST.MONGO_COLUMNS.MONGO_ID_COLUMN FROM mongodb_jdbc_ds_TEST.MONGO_COLUMNS WHERE mongodb_jdbc_ds_TEST.MONGO_COLUMNS.MONGO_ID_COLUMN IN ('id99', 'id1', ...)

                     

                    Since it doesn't appear related to the values themselves, it's likely an issue with the MongoDB translator logic and the handling of what is effectively a mixed type column.  If in the underlying documents the integral values are stored as numbers, and not strings, there search logic may not be appropriate.

                     

                    > PS. Is there some way to reveal/display real <dependent values>? Though I understand that in this case it's obvious that they will not contain integral values

                     

                    The full source query can be seen command log at a debug/detail level.  You can also direct the translator to not use bindings for dependent joins or turn of bindings entirely.

                    • 7. Re: Wrong result set for query against Mongo and MSSql datasources
                      rareddy

                      Translator goes by metadata of the document it sniffed from first few documents at the VDB deployment time. If there is a mismatch in types, yes that will be an issue that Teiid will not be able to detect.

                      • 8. Re: Wrong result set for query against Mongo and MSSql datasources
                        andriyg

                        query with following plan description

                         

                         

                        AccessNode

                          + Relational Node ID:0

                          + Output Columns:mongodb_jdbc_ds_TEST_MONGO_COLUMNS_MONGO_ID_COLUMN (string)

                          + Statistics:

                            0: Node Output Rows: 1

                            1: Node Next Batch Process Time: 20

                            2: Node Cumulative Next Batch Process Time: 20

                            3: Node Cumulative Process Time: 767

                            4: Node Next Batch Calls: 2

                            5: Node Blocks: 1

                          + Cost Estimates:Estimated Node Cardinality: 1000.0

                          + Query:SELECT g_0.MONGO_ID_COLUMN AS c_0 FROM mongodb_jdbc_ds_TEST.MONGO_COLUMNS AS g_0 WHERE g_0.MONGO_ID_COLUMN IN ('id99', 'id1', '11') LIMIT 1000

                          + Model Name:mongodb_jdbc_ds_TEST

                          + Data Bytes Sent:18

                          + Planning Time:1596

                         

                         

                        returns one string value

                         

                        "id1"

                         

                        while query with plan:

                         

                        AccessNode

                          + Relational Node ID:0

                          + Output Columns:mongodb_jdbc_ds_TEST_MONGO_COLUMNS_MONGO_ID_COLUMN (string)

                          + Statistics:

                            0: Node Output Rows: 15

                            1: Node Next Batch Process Time: 1

                            2: Node Cumulative Next Batch Process Time: 1

                            3: Node Cumulative Process Time: 372

                            4: Node Next Batch Calls: 2

                            5: Node Blocks: 1

                          + Cost Estimates:Estimated Node Cardinality: 1000.0

                          + Query:SELECT g_0.MONGO_ID_COLUMN AS c_0 FROM mongodb_jdbc_ds_TEST.MONGO_COLUMNS AS g_0 LIMIT 1000

                          + Model Name:mongodb_jdbc_ds_TEST

                          + Data Bytes Sent:103

                          + Planning Time:26

                         

                         

                        returns 15 string values:

                         

                        "id1" "id3" "id5" "id7" "id9" "11" "13" "15" "17" "19" "id21" "id23" "id25" "id27" "id29"

                        • 9. Re: Wrong result set for query against Mongo and MSSql datasources
                          andriyg

                          Instead of native driver but JDBC one for Mongo DB and configured for that purpose JDBCExecutionFactory, therefore I suppose we experience current issues.

                           

                          Could you suggest, is there some posibility, as a workaround, simply disable dependent joins in case of MongoDB?

                           

                          I was trying to extend JDBCExecutionFactory for mongodb and set enableDependentJoins property to false, but without success - TEIID still generates that DependentAccessNode with query

                           

                          SELECT mongodb_jdbc_ds_TEST.MONGO_COLUMNS.MONGO_ID_COLUMN FROM mongodb_jdbc_ds_TEST.MONGO_COLUMNS WHERE mongodb_jdbc_ds_TEST.MONGO_COLUMNS.MONGO_ID_COLUMN IN (<dependent values>)

                           

                          Aside from disabling dependent joins, maybe there is another way to tweak JDBCExecutionFactory to make it handle correctly fields type mapping, is there?

                          • 10. Re: Wrong result set for query against Mongo and MSSql datasources
                            shawkins

                            > returns one string value

                             

                            That confirms this is an issue with matching against mongodb documents with a mixed type (string and integer) column. 

                             

                            > Could you suggest, is there some posibility, as a workaround, simply disable dependent joins in case of MongoDB?

                             

                            It's not really dependent joins that are the issue - this would happen on any such query that searches against the MONGO_ID_COLUMN.  Can you capture an issue on that - we'll see if the translator can handle that situation better and if not add some doc note.

                             

                            For a workaround, since the integer values are implicitly convertable to string for projection, you need to disable comparison for the MONGO_ID_COLUMN and let the engine handle it:

                             

                            ...

                            MONGO_ID_COLUMN string(255) OPTIONS (NAMEINSOURCE '"MONGO_ID_COLUMN"', CHAR_OCTET_LENGTH 510, NATIVE_TYPE 'STRING', searchable 'unsearchable'),

                            ...