I- Origins
1. How I came to Lucene
In June 2009, I was asked by a customer to help them to solve a geo related problem with their hotel search engine.
They were (and still are) using lucene/solr aside mysql to do a faceted search of hotel near a POI (hotels 250m away from Eiffel Tower e.g)
At that time they had issues in trying lucene/solr spatial extension to implement this functionality.
2. Spatial origins in Lucene
The Lucene spatial contrib began with Patrick O' Leary's work on Local Lucene and Local Solr.
He did implement a grid search which was beeing used on AOL sites and did some echos in other companies (http://www.ibm.com/developerworks/opensource/library/j-spatial/)
He did offer this work to the lucene project as a spatial contrib module.
The module was then extended with a GeoHash implementation.
3. What I tried to do in Lucene Spatial
My work on spatial lucene was some patches on projection to index space, grid matching cells ids calculation and some geo related calculation (bouding box and limit cases like prime meridian crossing or near pole bouding box calculations) in the Tile/Cartesian version only.
II- Current status of Spatial in Lucene
Today the spatial contrib in Lucene is marked as deprecated in 3.X of Lucene and shall be removed in 4.X
The spatial functionalities have been moved to Solr with a double NumericRange (latitude between (latmin, latmax) and longitude between (longmin, longmax)) type request and a GeoHash implementation.
No more Grid/Tile/Cartesian support and by moving spatial code to solr level this cut access to Hibernate Search users.
III- A new beginning
So I decided implement a Grid spatial search in Hibernate Search : Here was born hibernate-search-spatial.
The work is here : https://github.com/nicolashelleringer/hibernate-search-spatial
IV- First feedbacks
By the way while doing benchmark I did add a double range query type of spatial request by re using a distance filter created for hibernate-search-spatial.
Cases are many and one solution does not fits all.
Facts :
- For getting Ids of a subset of document in a point,radius search Grid is quicker than Double Range
- Grid is less precise than double range doing the job => it return more docs than double range
- Distance filter is costly : you will have to fetch actual latitude and longitude for each docs of the sub set to verify
So some case and choices :
- You have a huge base (>1M records) with hot spot queries (area that are frequently requesteds vs the rest) and cache hits are great => go for Grid+Distance
- You have a somewhat shorter base (<25k records) with pure random queries with no cache hit => go for Double-Range+Distance
Between those extreme cases ? Benchmark !
V- More to come
Work is clearly not ended. The current master branch is on Hibernate Search 3.4.1 and shall be upgraded to 4
API, SPI, Impl split is still to come
JavaDoc is basic
Tests are basic
VI- Feel free
Test, reports bugs, missing functionnality, anything
The tracking JIRA on the subject lies here https://hibernate.onjira.com/browse/HSEARCH-923
I shall write a short first usage guide for people interested, please contact me if you do not want to wait