1 2 Previous Next 21 Replies Latest reply on May 30, 2004 5:47 AM by hbaxmann

    Nulls in CMR where clause

    erik777

      Does anyone know what the status is of fixing the ability to use 'table.col = ?1' in EJB-QL without losing rows with null values? I remember that some future version of JBoss was supposed to address it, but can't remember which version, so don't know if it is supposed to be out now.

      I'd even settle for something along the lines of:

      table.col = ?1 OR (table.col IS NULL AND null=?1)

        • 1. Re: Nulls in CMR where clause
          ironbird

          ??? What do you say ?
          You can get null values rows in EJB-QL testing by IS NULL.

          For example the EJB-QL query

           * @ejb.finder
           * view-type="remote"
           * signature="java.util.Collection findCustomers(Pharma_EJB.interfaces.Pharma_Customer customer)"
           * query="SELECT DISTINCT OBJECT(c) FROM Pharma_User c WHERE c.customer IS NULL OR c.customer=?1"
          

          generates the following SQL
          Executing SQL: SELECT DSITINCT t0_c.IND_ID FROM TD_USER t0_c, TD_CUSTOMER t1_c_customer WHERE (t0_c.CLI_ID IS NULL) OR ((t1_c_customer.CLI_ID=?) AND t0_c.CLI_ID=t1_c_customerCLI_ID)
          

          Where is the problem ?

          The spec says explicitly that the null values do not participate in the relation, so you must test it with IS NULL explicitly (snippet from the spec):

          Path expression navigability is composed using ?inner join? semantics. That is, if the value of a non-terminal
          cmr-field in the path expression is null, the path is considered to have no value, and does not participate
          in the determination of the result.


          • 2. Re: Nulls in CMR where clause
            erik777

            The problem with that query is that it will return rows where c.customer IS NULL but the parameter ?1 is not null. You only want rows where c.customer is null and ?1 is also null. The equals doesn't work, and IS NULL only tests one side of the equation. You can't apply it to the parameter to be sure both the parameter and c.customer are null at the same time.

            I've worked around this by creating multiple queries to test for the various scenarios, and then aggregating the results. That's extremely inefficient, but the runtime inefficiency is not the biggest problem.

            The problem is that the number of queries you need is the number of optional parameters you have to the power of 2. For instance, if you have two optional parameters, then you need 4 queries to cover all possible combinantions of null and not null values. However, I'm asking this question today because I now had a case where the business key has 4 optional columns. This would require 16 queries, and I'm not going there.

            To get around this for now, I created a "dummy null" value, and have a lot at points to convert it back and forth between a real null. This was at least better than creating 16 EJB-QL queries and creating a lot of Java code to tie them together.

            • 3. Re: Nulls in CMR where clause
              erik777

               

              Path expression navigability is composed using ?inner join? semantics. That is, if the value of a non-terminal cmr-field in the path expression is null, the path is considered to have no value, and does not participate in the determination of the result.

              If that's the standard, then that is just assinine. Why should I have to choose between 16 EJB-QL queries or "fake" nulls?!? Imagine 8 optional parameters. That would require 256 EJB-QL queries, rendering "fake" nulls virtually inevitable!

              If I were generating the SQL myself, I would do this to generate the component of the WHERE clause for one comparison (in psuedo code):

              if ({the parameter is null})
               whereComponent = field + " IS NULL ";
              else
               whereComponent = field + " = " + parameter;
              


              If this violates the standard, then the standard is overdue for a change. I heard that UNIONS were coming out in a future release. That's a cheesy solution to me. There's no reason why it can't be a simple and fast query. UNIONS are slow.

              I'm not blaming JBoss at all for this. I blaim the JCP. I love J2EE. It has a lot of good well thought out standards. But this is not one of them.



              • 4. Re: Nulls in CMR where clause
                ironbird

                You can do what you want with DynamicSQL, which is specific to JBoss. But the J2EE spec do not address your needs.
                Even in any language (PL-SQL, T-SQL, C++ or whatever), you need to build dynamically your SQL string to do what you want.
                Why do you want the container do the job for you ?

                • 5. Re: Nulls in CMR where clause
                  erik777

                   


                  Why do you want the container do the job for you ?


                  Because I am using container managed persistence (CMP) and EJB-QL is a the Query Language?

                  Are you asking why I'm using container managed persistence?


                  - To obtain database vendor indepedence. The application is 100% compatible today with MySQL, DB2, Oracle, SQL Server and any other database that provides a compliant JDBC driver and a configuration in the J2EE server. The differences between them that that JDBC doesn't handle, JBoss handles through CMP configurations.


                  - To develop using a Java database interface that's independent of the underying database technology. JBoss and other vendors just happened to be using JDBC/relational to implement CMP today because relational databases are still popular. But there's nothing in the spec to prevent them from adding an object database engine for CMP down the road. I won't have to change a single line of code to be able to use object databases when this happens, since I never use SQL in the application.


                  • 6. Re: Nulls in CMR where clause
                    ironbird

                     


                    Because I am using container managed persistence (CMP) and EJB-QL is a the Query Language?

                    EJB-QL query is for persistence with additional features, not complex queries. I agree with you that the spec is not adapted with professional applications. I work in industry and many times I cannot rely on the spec OR THE CONTAINER to satisfy my customers.


                    To obtain database vendor indepedence. The application is 100% compatible today with MySQL, DB2, Oracle, SQL Server and any other database that provides a compliant JDBC driver and a configuration in the J2EE server. The differences between them that that JDBC doesn't handle, JBoss handles through CMP configurations.

                    Just for persistence, you means. And your query, even implemented in SQL, is 100% compatible with all the databases because SQL address the database compatibility before java and J2EE.


                    To develop using a Java database interface that's independent of the underying database technology. JBoss and other vendors just happened to be using JDBC/relational to implement CMP today because relational databases are still popular. But there's nothing in the spec to prevent them from adding an object database engine for CMP down the road. I won't have to change a single line of code to be able to use object databases when this happens, since I never use SQL in the application.

                    No its false, because SUN impose its certification and any vendor must comply if he want to keep its customers. But do you experiment vendor compatibility ? I do. Normally, the "Container Managed" means that you can migrate from one vendor to another. And it's often difficult, and I can tell you that you spend more time to be 100% vendor compatible that 100% database compatible, even if the containers and EJB-QL are limited.
                    You have always to separate the system level (the container) from the business level (you). The frontier is not always where you want !


                    • 7. Re: Nulls in CMR where clause
                      hbaxmann

                      IMHO there is no big, but fundamental problem:

                      - if your app rely on more then one _significant_ optional field - your ER model sucks
                      - NULL compared to anything gives false, even NULL=NULL

                      solution: build a db view per optional field, null-'key' rows will vanish, not the denormalized table

                      what you are trying to accomplish is a case clause in SQL, as we know: every case statement could be outfactored better by working on your model

                      ;-)

                      bax

                      • 8. Re: Nulls in CMR where clause
                        erik777

                        hbaxmann:

                        solution: build a db view per optional field, null-'key' rows will vanish, not the denormalized table


                        This isn't a relational issue. The query I mentioned with 4 columns ironically has no relationships whatsoever. However, it does have 4 optional columns I need to filter on. They happen to be parts of a business key, but that's arbitrary. They could be any optional column that I need to filter on. Remember, this issue is I'm passing a parameter and pulling values that match that paramter. This isn't complext, and has nothing to do with normalization because it has nothing to do with joins or relationships.

                        if your app rely on more then one _significant_ optional field - your ER model sucks


                        Huh?!? More than half a table's columns are often optional. Any column can be filtered via a parameterized where clause in the real world. These are common business requirements. Just because a customer table does require an email address doesn't mean the user shouldn't be able to query based on the email address. Just because an email address is optional doesn't mean that every other piece of customer information has to be mandatory. How is having multiple optional columns a bad model?

                        And your query, even implemented in SQL, is 100% compatible with all the databases because SQL address the database compatibility before java and J2EE.


                        Wouldn't that be nice if it were true. To name a few differences: the formatting of a date in the where clause, unique identifier generators and the syntax of outter joins. I wrote one system that ran in both SQL Server and Oracle. Last year, I lead the migration of a J2EE application to use a new DB2 data model with 150+ tables while still working 100% with its Oracle 90+ table predecessor. To complicate things, we had to use stored procedures for all data communication for DB2 (dumb "security" standard .) The calculation code, which never stopped being fully functional during the entire migration, was 100% independent of the database vendor, but it it wasn't SQL that made it possible. We used a Data Accessor pattern.

                        There are two two problems with SQL: The ANSI 89 and 92 standards that became popular did not address all the needs (e.g., dates in the WHERE clause). The second problem is that it is relational, making it difficult to migrate to object database technologies. Thus... CMP, and the current trend of using object-relational to implement it.

                        If Sun's Java certification has anything that requires CMP to use a relational database, then they are not testing the standard, because nothing in the standard requires a relational database for implementation. If they relational database requirements have drifted into it, then it's probably just because every vendor out there implemented it with a relational model. Thus, when vendors start to implement it with object databases as well, Sun will be obligated to remove any relational specific requirements that slipped into their test suite.




                        • 9. Re: Nulls in CMR where clause
                          erik777

                          hbaxmann:

                          solution: build a db view per optional field, null-'key' rows will vanish, not the denormalized table


                          This isn't a relational issue. The query I mentioned with 4 columns ironically has no relationships whatsoever. However, it does have 4 optional columns I need to filter on. They happen to be parts of a business key, but that's arbitrary. They could be any optional column that I need to filter on. Remember, this issue is I'm passing a parameter and pulling values that match that paramter. This isn't complext, and has nothing to do with normalization because it has nothing to do with joins or relationships.

                          if your app rely on more then one _significant_ optional field - your ER model sucks


                          Huh?!? More than half a table's columns are often optional. Any column can be filtered via a parameterized where clause in the real world. These are common business requirements. Just because a customer table does require an email address doesn't mean the user shouldn't be able to query based on the email address. Just because an email address is optional doesn't mean that every other piece of customer information has to be mandatory. How is having multiple optional columns a bad model?

                          hbaxmann:

                          I think you understand where I'm coming from. J2EE has some great enterprise quality standards. But other standards, such as JAAS, don't fit the bill for "enterprise" readiness, IMHO.

                          While I know CMP has is a bit short on some critical areas, such as its support for optional columns in a query where the parameter can be null, I'm investing in CMP for the long run vision. Thus, I'm taking the time to try to find the best-of-breed solutions that others are using to deal with the bump along the way; and, in the absense of such, trying to sense the priorities of others and how it's driving the improvement of the standard. In other words, I need to understand the priorities the CMP developers are sensing so I can bet on likely improvements.

                          And your query, even implemented in SQL, is 100% compatible with all the databases because SQL address the database compatibility before java and J2EE.


                          Wouldn't that be nice if it were true. To name a few differences: the formatting of a date in the where clause, unique identifier generators and the syntax of outter joins. I wrote one system that ran in both SQL Server and Oracle. Last year, I lead the migration of a J2EE application to use a new DB2 data model with 150+ tables while still working 100% with its Oracle 90+ table predecessor. To complicate things, we had to use stored procedures for all data communication for DB2 (dumb corporate "security" standard.) The calculation code (business logic,) which never stopped being fully functional during the migration, was 100% independent of the database vendor, but it wasn't SQL that made it possible. We used a Data Accessor pattern.

                          There are two problems with SQL: The ANSI 89 and 92 standards that became popular did not address all the needs (e.g., dates in the WHERE clause). The second problem is that it is relational, making it difficult to migrate to object database technologies. Thus... CMP, and the current trend of using object-relational to implement it. CMP offers a migration path if you believe it will one day support object database implementation.

                          If Sun's Java certification has anything that requires CMP to use a relational database, then they are not testing the standard, because nothing in the standard requires a relational database for implementation. If the relational database requirements have drifted into it, then it's probably just because every vendor out there implemented it with a relational model. Thus, when vendors start to implement it with object databases, Sun will be obligated to remove any relational specific requirements that slipped into their test suite. However, this is all conjecture. Do we even know that Sun's test suite has relational database requirements unwittingly embedded into it?

                          It's Sun's own website that states that CMP does not need to be implemented using a relational database. Here's one instance:

                          "It is unlikely that the EJB expert group will accept any form of Java transparent persistence as an EJB requirement; however, some J2EE vendors may well use JDO as the underlying implementation of CMP."


                          - http://java.sun.com/products/jdo/JDOCMPFAQ.html


                          • 10. Re: Nulls in CMR where clause
                            hbaxmann

                             

                            This isn't a relational issue.


                            Right, because in this kind of usage your Entity design simply lacks. This solution is a workaround. In most cases one is not able to tidy up, normalize an already implemented ER model.
                            The one and only solution is to have a 3rd NF ER model - most times impractical. Here comes the DB views for select statements on _denormalized_entities_ (ER entities, not EJB ;) in the mind.
                            In the case of your NULL columns, your ER model is not even in the 1st NF. NULL means 'column does not exist here'. NULL is is a 'workaround' in RDBMSes for not to force to normalize till the 5th NF everywhere. Ok, 3rd NF ;)

                            Huh?!? More than half a table's columns are often optional.


                            Reading may be of help: I wrote '_significant_ '.

                            Any column can be filtered via a parameterized where clause in the real world.


                            I know, but you are not working in a real world, you are talking about a RDBMS. Exactly the fields in a where clause are kind of keys, if you are happy with this or not. Even in a single table query.

                            How is having multiple optional columns a bad model?


                            See above regarding _significant_.
                            This is not what I am trying to explain: Building relations, (if clauses like where statements) even between the same table based on unnormalized data is not possible. Real world: ... in most cases.

                            What i suggest is to normalize your model for your queries via DB views and than use EJB. Dont blame the J2EE for not resolving bad RDBMS design _and_usage_.

                            Buy a book about ER modelling and implementing this model in RDBMS and a pump gun for the DB designer (most often the noise is sufficient).

                            bax

                            • 11. Re: Nulls in CMR where clause
                              erik777

                              hbaxmann:

                              You are clearly learning relational theory in an academic setting. What school are you attending? What is your major?

                              For one, no one completely normalizes data in the real world, and by real world, I mean business applications. When someone does, it's clear it's a college student or grad that hasn't had the experience beyond what he learned in school.

                              Null columns exist because the user is not always required to provide them. Like you said, it gives us the ability to detect that there is no data for it. It's a very nice concept and works really well in relational databases. There's no normalization problem because you have optional columns. If I wasn't using EJB-QL, but DynamicSQL, as someone suggested, or direct JDBC, I wouldn't have a problem at all.

                              Using the example of customer registration, here are columns I might make optional because I wanted customers to register even if they didn't have or want to provide the information:

                              - All address information. Definately second address line if I chose to require address information.
                              - Second email address.
                              - Additional phone numbers.

                              Now, all this info could be normalized, but with an OO front-end, the benefits are near nill. Take the phone number, for instance. Phone numbers can be handled in a standard way via application objects. This mutes the benefits of normalization.

                              However, even if I normalized it, I would still have optional foreign keys... and thus nulls. This would not resolve the problem EJB-QL has, as it applies to relationships as well as regular columns. Optional is optional, whether they are regular columns or relationships. Optionality is defined by the business requirements. If the phone number is not required information, then either a column for phone number data or a foreign key can be null.

                              As for ER modeling, I've been developing relational applications since the early 80s prior to SQL's definition in 89. I've read quite a few books over those years, including the adademic books you're reading now, as I have a degree in IS. I've used ERwin, Oracle Designer, Rational Rose and other tools for ER modeling. Last year, I lead the effort to create a 150+ table data model for a large commercial payroll company for DB2. We used Rational Rose to create an ER model. When that was complete, we generated the data model, and then revamped it some more. Then we generated it in DB2 migrated an entire commercial payroll system to use it. It was successful. We did a lot of normalization of the data coming in, which came from the mainframe in over 100 formats. Creating a practical normalized model was one of our goals. But knowing when to normalize and when not to normalize isn't something you can learn in an academic book on relational theory. It's something you learn by repeatedly doing it in the real world for many years. Building data models for large businesses over and over again is the real world. Learning relational theory in a class is not.

                              Using views to try to normalize a single table is rediculous in this case, both in terms of practability and the end result's ability to actually solve the EJB-QL issue. Even if it did somehow miraculously help, it would run completely counter to why I'm using CMP, and the database vendor independence required by the application. Please read comments I made earlier on why I am using CMP.

                              It's exciting to see you learn relational theory. Clearly, you are grasping the concepts your class is teaching you. However, I highly recommend you try to learn from others who have applied it in business environments, and give them the benefit of the doubt until you have a chance to learn the practicality of some of the academic ideas when developing large business applications.

                              • 12. Re: Nulls in CMR where clause
                                hbaxmann

                                nice to meet you, too

                                bax

                                • 13. Re: Nulls in CMR where clause
                                  ironbird

                                  You are too much revendicative to exchange with both of you. When you want to justify yourself so much, it's suspect for me. Continue to consider everyone as newbie while you seems to stand yourself on some cloud, nobody's care.
                                  Nice to meet you too, but not too much !

                                  • 14. Re: Nulls in CMR where clause
                                    erik777

                                    ironbird:

                                    You're right. And I do apologize for coming off as condescening.

                                    Believe me, though, it has nothing to do with his being a newbie. I am normally very supportive of people learning something new.

                                    His tone was condescending, so I felt he needed to hear himself in context. There's a difference between someone that is new, and someone that just learned something in a class and think they know it all and can log onto a forum to insult everyone that think exactly like he thinks. I figured a gentle correction on a JBoss forum was a better place for him to learn than on a job. Unfortunately, I didn't know how to do that without coming off as condescending.

                                    I would have probably dismissed what he said outright, if it wasn't for his display of lack of ettiquette:

                                    - "your ER model sucks"
                                    - "because in this kind of usage your Entity design simply lacks"
                                    - "dont blame the J2EE for not resolving bad RDBMS design"

                                    This is in addition to outright illogical statements he made, his assumption that he's completely right without even understanding the problem, his inability to listen to why I chose certain options, like avoiding dynamic SQL or direct JDBC, and the lack of inquiry into the actual table design and purpose.

                                    By the way, the table which he thinks needs to be normalized with views only consists of a primary key (generated sequence number), a composite business key, and a single column for data. It's ok if his logic is incomplete, but he should try not to condescendingly insult people when he displays his lack of understanding of other people's problems.

                                    1 2 Previous Next