1 Reply Latest reply on Nov 7, 2006 3:28 AM by jilles

    lucene integration questions

    jilles

      Hi, I'm currently working on an application that will make use of the experimental lucene implentation. I managed to get things working in the sense that I have an application with EJB3 entity beans that are being indexed by lucene in jboss 4.0.5. I've examined the created index with Luke (a tool for lucene) and things seem to be indexed properly. IMHO not bad for a few days of work.

      Similar to getting the ejb 3.0 stuff going in the first place, I had a lot of problems with inaccurate/incomplete documentation and had to figure out a lot of non trivial configuration details myself that the documentation seems to assume self evident. I'm not whining here, just observing. I understand this is a work in progress and am generally very excited about this cool new functionality in jboss. BTW I'll be glad to help out any other users with more details on how I got things working. (you can contact me offline at jilles AT jillesvangurp.com). You may also be interested in my reply to sergiu's question here: http://www.jboss.com/index.html?module=bb&op=viewtopic&t=92392

      Anyway I still have a lot of remaining questions regarding the lucene integration in jboss/hibernate.

      1) I'm expecting my application to be used in a clustered jboss environment eventually. I happen to know that lucene is pretty intolerant with respect to having multiple IndexWriter objects trying to get a file lock. How does jboss deal with this (if at all)? To be clear: my application will run on a cluster. At any time there may be multiple entity beans being created/updated and indexed on any of the nodes. At the same time, users may be searching for entity beans using the lucene index. So there will be simultaneous reads & writes on the index. Right now, I'm merely assuming that the hibernate lucene includes functionality that will deal with this in such a way that I don't get exceptions about file locks. Is that assumption at all correct? If not, what is the recommended course of action.

      2) Is it possible to configure the index that hibernate-lucene creates. Lucene has a lot of configuration options that tend to get highly relevant if you work with large indexes (think a few million records in the database and an index of several gigabytes). I'm anticipating that I might find myself trying to configure/know about those kinds of things at some point. How/where do I do this and how sensible are the defaults used by hibernate/lucene ?

      3) Can I create an IndexReader the normal way (i.e. as the lucene documentation describes) to search the indexes or do I need to do special things to avoid conflicts with the index being written to simultaneously?

      BTW. I understand all this stuff is under construction and not recommended yet for production usage. My backup plan if things go wrong is to create a separate, non clustered lucene based indexing server should things not work out with the hibernate integration. That server will be called using rmi. I've worked with lucene before, so I know how to do this. The reason I'm looking at hibernate lucene is that it seems to deal with most of the boring work of interacting with lucene. Also, it seems to simplify my deployment architecture since I won't need a seperate lucene server. My indexing needs are pretty straightforward (one or two simple entity beans will be all that is indexed).

        • 1. Re: lucene integration questions
          jilles

          Since nobody bothered to answer, I'll report the answer myself. I've been testing for a couple of days now and am sorry to report that the hibernate lucene integration is in no condition to be used in production. Essentially it doesn't handle concurrency, at all. Not even on a single node. So forget about clustering. Trivial use cases produce lots of exceptions in the logs about locks and out of date indexreaders (needed for delete).

          As reported here (http://lists.jboss.org/pipermail/hibernate-dev/2006-October/000563.html), synchronization is still being worked on.

          It would be appropriate to mark the lucene annotations as highly experimental and list the known issues with respect to synchronized usage in the documentation and discourage all use until it actually works. The inclusion of the hibernate lucene classes in a production quality server is rediculous since essentially using it in a multi user environement is likely to trigger failure.

          I'll be executing my backup plan now.