Subscribe via RSS Feed

Lucene – Open Source Search Engine Library

June 25, 2012 2 Comments


Now-a-days searching in applications are becoming more and more important feature. After all, Web is all about information and it is all about getting information at right time and at right hand.

Today I will try to put some light on search technologies in J2ee open source software areas for beginners. Also I will go to to different functional implementations on Search techniques in different posts which I have planned to post within next few weeks.

Now one thing to worth mention here, search algorithms are complex in nature. And big thanks to Mr. Doug Cutting and Apache Software Foundation by helping us by giving us a highly important search Library – Apache Lucene with standard set of API to make the life far easy to live (at least for me and fellow j2ee developers…)

Lucene is a search engine library – this is a well known fact in application development world.

Our question is, where is the need to implement such a library….

Yes, it is off course for search. But world is already there with database search – with much known SQL (Structured Query Language).

And the search is quite fast there for a table with 1000000 records with indexing the search field. Then….???

We need to think of flood of Contents in Web or in electronic library where the millions of unstructured data/documents are required to be stored as raw content in the Database.

I have gone through this above no of structured and unstructured data handling scenarios and I am sorry to say that relational databases are not performing well there. (May be I lack the knowledge related to Database optimisations, but I leave this portion of optimisation work for my fellow DBA friends and Hardware Guys…)

So, is there a low cost solution???

Yes, we can put lucene there and put a bit little extra logical (magic word) work to get rid of this situation.

In the language of lucene and from application development perspective -

We can

1>Store the files in file system.

2>While storing the files, just add a document in lucene index (Put content in Lucene Document Field).

3>While removing files, remove entry from lucene index.

4> Analyse the document with Lucene Standard Analyser.

5>Update the lucene index with an extra field such as file path in the document.

6>Start search with Lucene Standard Analyser and

7>Finally get the result at a far improved speed than the relational database search for millions of documents.

So how this magic happens….Because Lucene is doing only text based search in it’s index and return us the result.

8>Now if we have the link for the file in the file system, we can browse it…which may be one of our appliaction goal.

The above use case may not be all complex real-world problem interface. And database search requirement also exists there.

So, on this first post I am not putting any code related to Lucene use. I just want to give an idea related to open source search technologies – the primary of which is Lucene.

I will put more and more in-depth and scenario based posts related to search techniques and Lucene in near future.

So for now, just google about Lucene and try to grab as many as ideas related to Search techniques.

And just to mention….wait for my next posts…related to search techniques….

Enter your email address:

Delivered by FeedBurner