11 Replies Latest reply on Jun 29, 2010 4:02 PM by michael.walker

    Metadata for LDAP Connector

    rareddy

      Mike,

       

      Based on this forum request for modeling the LDAP metadata as schema in the Teiid runtime, what is your opinion on turning the LDAP connector to provide metadata in Dynamic VDB scenario?  Secondly, do you envision Teiid Designer ultimately will have a importer for LDAP or is it easier to hand model these source because the generated model highly depends on the users LDAP data and it is harder to implement such a thing.

       

      If this feasible, I would like to get this on the roadmap of Teiid.

       

      Thank you.

       

      Ramesh..

        • 1. Re: Metadata for LDAP Connector
          michael.walker

          Hi Ramesh,

           

          Just noticed this post today (!).

           

          Yes, in general, it would be feasible to implement an importer for the LDAP connector. LDAP systems will provide the metadata needed to browse the system, and to automatically generate the models.

           

          It is easy enough to do the hand-modeling in most cases.

           

          The benefits of adding an LDAP importer imo:

          - reduce user error

          - make it easier to model lots of LDAP attributes at once

          - during a re-import, we could leverage the pre-existing difference analysis tool to determine what has changed since the last import

          - Designer could support for "gather source data statistics" to automatically turn on cost-based optimizations for LDAP-based models

          - it could automatically create multiple tables to handle multivalued attributes

          - perhaps the importer code could be leveraged in the dynamic VDB scenario

           

          I don't think this is a high-priority item, since hand-modeling usually suffices but would be nice to have.

           

          If we had an API for writing importers, it would make this task much easier -- are there any plans to add one?

          • 2. Re: Metadata for LDAP Connector
            rareddy

            Mike,

             

            This is the only connector that does not expose any dynamic metadata in the 7.0 release, any chance we can pursuade you to guide us in implementing this feature for 7.0 final? or if you have some time you can contribute implementing this feature.

             

            Thank you.

             

            Ramesh..

            • 3. Re: Metadata for LDAP Connector
              michael.walker

              I'd be happy to give you more input on how it could work.

               

              I'm interested to know what you mean when you say that a connector could expose dynamic metadata. I'm not up-to-speed on the new connector architecture, but the old API did not have any role in exposing metadata to Designer during import. Instead, you had to write an importer, because Designer used a completely different approach to browsing metadata/creating models. But there was no API for writing importers, so all the work was done on a source-by-source basis. Has this changed in Teiid?

               

              I think a larger question to consider is whether you really want to implement this now, because it's going to be non-trivial to implement. What demand is there for the feature?

              • 4. Re: Metadata for LDAP Connector
                rareddy

                Starting from Teiid 6.2, Teiid Connector API supported a feature called "dynamic metadata", where a connector can define a view metadata that is exposing to the query engine. This is more like exposing the JDBC metadata from DatabaseMetadata object, but lot a simpler API. With this approach there is no Designer involved.

                 

                This is carried into 7.0 version, now we are calling previous Connector API as the Translator API, that has same API. A Translator is different from previous connector as Translator does not expose any connection semantics, only the translation layer. It also fronts any a Data Source. This Data Source is the connection mechanism to the any EIS systems. Read more details here.

                 

                Now after the above changes + some XML processing changes,  Teiid have all the translators exposing source metadata through their respective translators except for LDAP. We would like to offer the same functionality with LDAP. We would like offer this for completeness than the user demand. This will relieve users from using Designer, if they want to use LDAP in a "Dynamic VDB" scenario.

                 

                If this is "non-trivial", we can leave as is for now, but I would like to explore idea of providing this feature if possible.

                 

                Thanks

                 

                Ramesh..

                • 5. Re: Metadata for LDAP Connector
                  michael.walker

                  OK, I see you're only interested in supporting the dynamic VDB scenario -- I thought you were also interested in supporting the ability to import LDAP-based metadata from Designer.

                   

                  Where can I find some information on the API that was added to support dynamic metadata? And what is your timeline for 7.0 final?

                  • 6. Re: Metadata for LDAP Connector
                    rareddy

                    Check out the "Developer's Guide" on the new Translator API. For metadata call see "getMetadata" method on the "ExecutionFactory" class. The package "org.teiid.metadata" defines the schema building objects like Table, Procedures, Columns etc.

                     

                    You can take a look at any of the built in translator code for an example usage.

                     

                    Pending few document changes, we are ready for 7.0-final. However, we are waiting for to make sure all the tooling requirements are met, which could be next week hopefully.

                     

                    Ramesh..

                    • 7. Re: Metadata for LDAP Connector
                      michael.walker

                      I've reviewed the latest source code examples and spent some time thinking about how this would work for LDAP over some Lebanese food last night. I've sketched a few potential ways to implement it, but I notice a few major issues. I'd be happy to discuss them in detail over the phone, etc., if that makes more sense. I'll try to summarize here.

                       

                      In LDAP, the things that are typically going to be  added/changed/removed over time are:

                      1. A node (i.e. table)

                      2a. An attribute (i.e. column)

                      2b. A multi-valued attribute - could be treated as a table, or a  column

                      3.  An entry and its set of attribute values (i.e. row)

                       

                      The first problem is that users are probably going to want to explicitly name each node that should be represented as a table.

                       

                      In the JDBC translator, we simply fetch all tables that match certain criteria (e.g. fall under a particular schema), and model them. That approach won't work in LDAP because developers will want certain parent nodes and all their children to be considered a single table, whereas other parent nodes should not be modeled as tables, but their children should. This is the difference between using SUBTREE_SCOPE searches vs. using ONELEVEL_SCOPE searches -- there's no reasonable default, and I think users will want to specify it on a node-by-node basis.

                       

                      The second problem is that there's no easy way to dynamically get all the possible attributes (columns) for entries in a particular node.

                       

                      That's because there's no guarantee that all entries below a particular node will share the same set of attributes. I don't think there's an easy way to get the entire set without inspecting every entry. In practice, the list of attributes among all the entries of a particular subtree might be very similar, but it's still possible we'd miss certain attributes unless we inspected each entry.

                       

                      Another issue to deal with is multi-valued attributes. Users might want them modeled as tables, or might want them modeled as columns, with all values concatenated together. Teiid currently doesn't support either approach and some further implementation work would need to be done.

                       

                      So, I'm not convinced that it would be useful or easy to dynamically expose LDAP-based metadata and turn it into a relational source view, at least not without requiring users to explicitly list tables and columns, which defeats the purpose of "dynamic metadata".

                       

                      I think there are other improvements that would be far more useful -- for example, extending support for multi-valued attributes, and implementing an LDAP importer for Designer. Extending support for multi-valued attributes is actually something I implemented in part, but never got to add to the Teiid source code due to time constraints. If this sounds useful, I still have the code, and I could try to fit it in for 7.0.

                       

                      However, one item that comes out of the dynamic metadata capability that I do see as potentially useful is the ability to use a configuration file to define models. This avoids the use of Designer and supports embedded scenarios, etc.

                       

                      Because the LDAP connector is the only connector that doesn't support dynamic metadata, this currently means that it's also the only connector that doesn't support model definitions in configuration files. Therefore, it's the only connector that requires the use of Designer. This is really a separate issue that could be solved by creating a config file schema that allows users to explicitly define tables/columns for a given source model. I could see this as being potentially useful, even though it would not be dynamic.

                      • 8. Re: Metadata for LDAP Connector
                        rareddy
                        This is the difference between using SUBTREE_SCOPE searches vs. using ONELEVEL_SCOPE searches -- there's no reasonable default, and I think users will want to specify it on a node-by-node basis.

                        There is a facility you can define the "import" specific properties on the Translator to define the behiour of the dynamic metadata. Will that help here?

                         

                        That's because there's no guarantee that all entries below a particular node will share the same set of attributes. I don't think there's an easy way to get the entire set without inspecting every entry. In practice, the list of attributes among all the entries of a particular subtree might be very similar, but it's still possible we'd miss certain attributes unless we inspected each entry.

                        So, are you saying that a node defines a "common" set of attributes, and each entry can have addtional attributes? and the entry may or may not define the "common" attributes? Since the entries could be in thousands or tens of thousands, how is this handled in the manual model building scenario by user?

                         

                        Another issue to deal with is multi-valued attributes. Users might want them modeled as tables, or might want them modeled as columns, with all values concatenated together. Teiid currently doesn't support either approach and some further implementation work would need to be done.

                        IMO, Concat is fine, as this does not define separate table. With the new TEXTTABLE function, I can convert that text into table anytime I want, if I need to.

                         

                        I still have the code, and I could try to fit it in for 7.0.

                        Before 7.1 is fine too for this feature.

                         

                        However, one item that comes out of the dynamic metadata capability that I do see as potentially useful is the ability to use a configuration file to define models. This avoids the use of Designer and supports embedded scenarios, etc.

                        This is something we are considering for future releases, but configuration file will be defined using "DDL". Also the direction of the Teiid is to define its metadata independent of the Designer based metadata. Teiid would like to define its view layers etc using DDL like language and tooling will be geneating that metadata based on its models in 8.0 and beyond.

                        • 9. Re: Metadata for LDAP Connector
                          michael.walker

                          Ramesh Reddy wrote:

                           

                          This is the difference between using SUBTREE_SCOPE searches vs. using ONELEVEL_SCOPE searches -- there's no reasonable default, and I think users will want to specify it on a node-by-node basis.

                          There is a facility you can define the "import" specific properties on the Translator to define the behiour of the dynamic metadata. Will that help here?

                           

                          What I'm getting at is that these import properties will now have to include information about each table, because users are almost always going to want to define the search scope on a table-by-table basis.

                           

                          An example might help here.

                           

                          It might seem nice if we could just do something like this in the .def file:

                           

                           <Model>
                                  <Property Name="importer.rootDN" Value="dc=company,dc=com" />
                                  <Property Name="importer.TableFilter" Value="(objectClass=organizationalUnit)" />
                                  <Property Name="importer.SearchScope" Value="SUBTREE_SCOPE" />
                          </Model>

                           

                          This would mean that every OU node should be turned into a table, and that the table should include everything in its subtree as rows. But this is not likely to be what the users want, becuase (a) they don't want to use subtree scope for every OU, just some of them, and (b) it would produce a tremendous amount of tables.

                           

                          I think it's more likely that people will want to specify the location and scope for each table. This will require explicitly naming each table, e.g.:

                           

                          <Model>

                               <Table>

                                    <Name>Groups</Name>

                                    <DN>ou=Groups,dc=company,dc=com</DN>

                                    <SearchScope>SUBTREE_SCOPE</SearchScope>

                                    <SearchFilter />

                                    ...

                               </Table>

                               <Table>

                                    <Name>US Regions</Name>

                                    <DN>ou=US,ou=North America,ou=Groups,dc=company,dc=com</DN>

                                    <SearchScope>ONELEVEL_SCOPE</SearchScope>

                                    ...

                               </Table>

                          </Model>

                           

                          And of course, the drawback with this approach is that table discovery is no longer dynamic, because we're statically naming every table. Also, it requires a change to the schema for your .def files. And we're basically doing modeling in a config file, when it sounds like the future is leaning towards dynamic modeling using DDL.

                          That's because there's no guarantee that all entries below a particular node will share the same set of attributes. I don't think there's an easy way to get the entire set without inspecting every entry. In practice, the list of attributes among all the entries of a particular subtree might be very similar, but it's still possible we'd miss certain attributes unless we inspected each entry.

                          So, are you saying that a node defines a "common" set of attributes, and each entry can have addtional attributes? and the entry may or may not define the "common" attributes? Since the entries could be in thousands or tens of thousands, how is this handled in the manual model building scenario by user?

                          No, a node actually does not define a common set of attributes. A node is similar to a directory in a file system, and could contain anything.

                           

                          For example, on a file system, I might have:

                           

                          /temp/mystuff:

                          myfile.txt

                          myotherfile.txt

                          mypic.jpg

                          myvid.avi

                           

                          The files may or may not share any common properties. If we wanted to expose all the properties of all the contents of the 'mystuff' directory, we'd have to survey all the entries to get a list of file types (txt, jpg, avi), and then lookup the properties for each type.

                           

                          Similarly in LDAP, you can have a directory node that contains entries of many different types. Each type has certain properties defined for it, some are mandatory, others optional. The properties for each object class can be derived by looking at the LDAP schema. However, in order to determine all the possible properties in all the entries for a particular node, you'd have to survey all the entries.

                           

                          In practice, people typically put entries of the same class together, e.g. the "Users" directory will contain nothing but entries of type "user". So in a manual modeling situation, when people are modeling the "Users" directory as a table, they can add all the user attributes as columns. If we were to automate this process, there will be know easy way to infer that a particular node contains a particular type of entries (at least, not to my knowledge).

                           

                          If we conceded that users must explicitly name each directory node they wish to represent as a table (along with the search scope), then they could conceivably specify the object class(es) that will be contained in each node. Then, we could dynamically derive the attributes from those specific object classes, represent them as columns, and ignore the rest.

                           

                          Going back to the previous example, it might look like this:

                           

                          <Model>

                               <Table>

                                    <Name>Users</Name>

                                    <DN>ou=Users,dc=company,dc=com</DN>

                                    <SearchScope>SUBTREE_SCOPE</SearchScope>

                                    <SearchFilter>(objectClass=user)</SearchFilter>

                                    <ObjectClasses>

                                         <class>user</class>

                                         <class>organizationalPerson</class>

                                    </ObjectClasses>

                                    ...

                               </Table>

                               ...

                          </Model>

                           

                          With this information, we could dynamically lookup all the attributes for the classes "user" and "organizationalPerson" in the LDAP schema and use them to determine the columns of the "Users" table. We'd still be statically defining the tables, as before.

                           

                          This might be the most practical way to support (limited) dynamic metadata. What do you think?

                           

                          Before 7.1 is fine too for this feature.

                          OK, I can add this. I just need some guidance on your unit test requirements when adding new functionality like this.

                           

                          Also the direction of the Teiid is to define its metadata independent of the Designer based metadata. Teiid would like to define its view layers etc using DDL like language and tooling will be geneating that metadata based on its models in 8.0 and beyond.

                          This sounds very interesting.

                          • 10. Re: Metadata for LDAP Connector
                            shawkins

                            Based upon what you're describing dynamic metadata integration is not a good fit for ldap, since it requires custom metadata down to a table level.  As Ramesh says that will later be handled with DDL extensions,

                             

                            For now a different approach may be to use the relational work from Modeshape.  They are already developing some configuration based approach for mapping node types to tables, attributes to columns, etc.  And since they support an LDAP connector the same thing should work with exposed LDAP nodes.  The Modeshape JDBC integration is still under works, so I'm not sure about supported predicates/functions, etc. and what potential performance implications are.

                            • 11. Re: Metadata for LDAP Connector
                              michael.walker

                              Agreed. DDL sounds promising.

                               

                              In a sense, the current connector already handles one of the use cases for dynamic 'metadata' quite well -- if new nodes are created underneath a node you've modeled as a table, the connector will automatically pick up all entries underneath the new node. For example, if you add "Canada" as a new directory under your "Users" tree, and "Users" has already been modeled as a table, then you'll automatically get all the new user entries under the Canada subtree, potentially including the Canada node as well, without any modeling changes required.

                               

                              We just won't be able to support the automatic modeling of all nodes as tables, or the automatic modeling of all attributes as columns.

                               

                              As far as tables are concerned, I can't see this being a practical use case, nor does it seem there is any good default strategy for doing it. For attributes, I can see more usefulness, and I think I've described one reasonable strategy, but it requires that you at least model down to the table level, and would require a signifigant rework of the def file.