2011

I- Origins

1. How I came to Lucene

In June 2009, I was asked by a customer to help them to solve a geo related problem with their hotel search engine.

They were (and still are) using lucene/solr aside mysql to do a faceted search of hotel near a POI (hotels 250m away from Eiffel Tower e.g)

At that time they had issues in trying lucene/solr spatial extension to implement this functionality.

 

2. Spatial origins in Lucene

The Lucene spatial contrib began with Patrick O' Leary's work on Local Lucene and Local Solr.

He did implement a grid search which was beeing used on AOL sites and did some echos in other companies (http://www.ibm.com/developerworks/opensource/library/j-spatial/)

He did offer this work to the lucene project as a spatial contrib module.

The module was then extended with a GeoHash implementation.

 

3. What I tried to do in Lucene Spatial

My work on spatial lucene was some patches on projection to index space, grid matching cells ids calculation and some geo related calculation (bouding box and limit cases like prime meridian crossing or near pole bouding box calculations) in the Tile/Cartesian version only.

 

 

II- Current status of Spatial in Lucene

Today the spatial contrib in Lucene is marked as deprecated in 3.X of Lucene and shall be removed in 4.X

The spatial functionalities have been moved to Solr with a double NumericRange (latitude between (latmin, latmax) and longitude between (longmin, longmax)) type request and a GeoHash implementation.

No more Grid/Tile/Cartesian support and by moving spatial code to solr level this cut access to Hibernate Search users.

 

 

III- A new beginning

So I decided implement a Grid spatial search in Hibernate Search : Here was born hibernate-search-spatial.

 

The work is here : https://github.com/nicolashelleringer/hibernate-search-spatial

 

IV- First feedbacks

By the way while doing benchmark I did add a double range query type of spatial request by re using a distance filter created for hibernate-search-spatial.

 

Cases are many and one solution does not fits all.

 

Facts :

  • For getting Ids of a subset of document in a point,radius search Grid is quicker than Double Range
  • Grid is less precise than double range doing the job => it return more docs than double range
  • Distance filter is costly : you will have to fetch actual latitude and longitude for each docs of the sub set to verify

 

 

So some case and choices :

  • You have a huge base (>1M records) with hot spot queries (area that are frequently requesteds vs the rest) and cache hits are great => go for Grid+Distance
  • You have a somewhat shorter base (<25k records) with pure random queries with no cache hit => go for Double-Range+Distance

 

Between those extreme cases ? Benchmark !

 

V- More to come

Work is clearly not ended. The current master branch is on Hibernate Search 3.4.1 and shall be upgraded to 4

API, SPI, Impl split is still to come

JavaDoc is basic

Tests are basic

 

VI- Feel free

 

Test, reports bugs, missing functionnality, anything

 

The tracking JIRA on the subject lies here https://hibernate.onjira.com/browse/HSEARCH-923

 

I shall write a short first usage guide for people interested, please contact me if you do not want to wait