Now, I managed to get data retrieval on first call down to 4 seconds, and less than 800 ms after that first call.
Still, 4 seconds is a lot of time for just 328 records.
I might soon get more than 10,000 records in that table.
Of course, a production server will perform better (I am running a 550 Mhz PC with Win2k SP2, jdk 1.4), but I would like to tune my local application as much as I can.
How are you retrieving the data ?
Via an Entity ?
May be you could step away from the theory & for these cases connect directly to the DB from the SessionBean.
Of course, if you need to do that for most your calls you should probably solve you performance issue directly !
One hint: never call entity finders from a client application. The result is the behavior you pointed: a very inefficient entity loading.
This is because a new transaction should be started everytime you call the next() method on the collection iterator from an entity bean.
Write a session bean and call your finder from there. Return a collection (or array) of value objects (or data objects, or data transfer objects, or whatever you call them).
I hope you known about value objects and similar patterns. If not, hurry to learn. Entity beans without VO's are a performance killer.
Regarding data retrieval directly through a direct database connection: I use CMP 2.x and not BMP so that I do not have to be too much relying on any specific dbms. I am currently using DB2, but it might change in the future. SQL is not that standard.
Here is how my application is working: the client is very thin (just a presentation layer) and I use session facades for client/server data exchanges. The client never accesses directly any EJB. On the server, calls to entity beans are encapsulated in session beans which are part of the data access layer. So that a call to a finder method such as findAll uses one transaction. My entity beans fields contain Integers, Strings and java.sql.Date. That is all. And result sets are built upon what I call "detail" classes which contain only the value of fields in an EJB.
Using timers, I found that initializing the cache on a findAll call takes approximately 900 ms, and then the first SQL statement executed takes 2600 ms (average). So that is a total of 3,5 s for 328 small records, on first call. Later it takes less than 800 ms.
The problem remains, that is if my table contains 30,000 records then it might take almost 6 minutes for initializing on the server.
I have already tried to tune my db2 connection, i am using commit-option A, only one transaction per call, i have modified the size of the pool. Is there anything else ? Right now, I am using tcp-ip for db2 connection, and knowing that the database and jboss run on the same machine, is there a better way ? Is there something I am missing regarding the CMP 2.x container configuration ?
from my experience, the problem relates only the JDBC driver:
The conversion between an SQL record and a Java object.
To test/compare you could try to get the records with an SQL statement and return the desired fields without creating any object (ev. a Vector or a collection).
A SQL statement such as "SELECT * FROM MYTABLE" executed directly through JDBC driver takes less than 18ms to return 328 records, with no optimization. Is it the transformation into java objects which takes so long ? And why ?
This depends of the implementation of the JDBC driver.
I found native drivers slower than ODBC drivers over sun JDBC implementation...
There is a big performance difference if, for example, the DB fields are accessed by column name or column number. But this is the JDBC driver implementation.
Try another DB/JDBC (ex. MySQL) to see the differences.
Does the really slow time perhaps include opening the first connection to the db?
yes & no
Normally the connection pooler should ALWAYS keep the minimum requested number of connection openned. Hence & unless misconfigured the Connections to the DB should be set up already via the minimum number of pooled Connection !
However, may be after the first (a few) call the MetaData of the DB are then loaded / cached.
"Since the draft release of the Human Genome, we're all Open Source!"
Well, a little thought will show that it is not possible for the connection pool to open any connections until it knows who to open them as. So, it now waits until you request one connection, then, when it knows the Subject, asynchronously fills the pool to the min size. However since this is asynchronous you shouldn't really see a big delay after your first connection, and I would expect the deployment process that gets the db metadata would open the first connection. So, I don't really have an explanation or a theory.
Indeed, I believe that on first call, a connection is already opened and maybe pooled since it has to get the database metadata on deployement in order to process the <create-table> and other stuff.
I do not exactly know how Jboss works internally, but I found out that the real SQL statement is processed only when accessing a field within one of my entity bean. For instance, for the MyEntityBean.findAll() it takes 900 ms, and then 2600 ms (average) when accessing MyEntityBean.getMyField() because the SQL is processed only here. Maybe this is bad idea (or it is already done), but would it be possible to "anticipate" by processing the SQL statement asynchronously ? Changes are that if I call a findAll method, I might be willing to access at least one field. Then why wait for an effective call to a getMyField() ?
I haven't yet really understood the read ahead stuff, however if you aren't using it I think you should be;-) It may be able to get the fields with the query for findAll().
> Indeed, I believe that on first call, a connection is
> already opened and maybe pooled since it has to get
> the database metadata on deployement in order to
> process the <create-table> and other stuff.
Nope, this is set in the parameters of the connection pool !
In my jboss.jcml, there is :
> This a follow-up to my previous message.
> I was right, it takes more than 100 seconds to
> retrieve only 328 (small) records from the Jboss
> server with a DB2 database.
> What happens is that the SQL query is not executed on
> findAll, but only when there is an explicit call to a
> get method on an EJB.
You can configure this behavior pr. CMP-bean. Take a look at jbosscmp-jdbc.xml (dtd). With read ahead, jboss can load more than one entity (row) at a time. You can also configure which fields it loads automatically, before they are used (and which groups of fields that should be loaded together).
> And the SQL generated seems to
> be very inefficient: if my table contains 328
> records, then the first SELECT statement will contain
> 328 expressions like (MY_IDENTIFIER=?) OR
> (MY_IDENTIFIER=?) OR (MY_IDENTIFIER=?) OR
> (MY_IDENTIFIER=?) (...), the second SQL statement
> will contain 327 expr like the one above, and so on,
> till the last record which will contain only one
I agree. I have also seen this, and I am sure it can be made better, by using "IN"-sql-statement or joining. I have reported this as a bug (http://sourceforge.net/tracker/?func=detail&aid=532734&group_id=22866&atid=376685).
It is assigned, but not resolved.
> I understand it is an automated code generating
> mechanism, but if it takes more than 100 seconds just
> to retrieve 328 records, then what about big tables
> with thousands or more records ?
> I read a post by Pazu who had the same problem. My
> commit option is A, and all the calls are within the
> same transaction. Is there anything else that needs
> to be tuned ?