Thursday, February 16, 2012

SOLR: Improve relevancy by boosting exact and phrase match

Once we have the index ready for searching, the next implicit step is to improve the relevancy of the search index. SOLR of course provides ways to tune the search relevancy but one very obvious way to improve your relevancy almost always gets ignored. By boosting exact and phrase matching over the query matching we can achieve relevancy improvement by significant factor.

Exact Match Setup


To set a field(s) for exact matching, add another field in the Schema.xml and copy the content into it using copyField
<field name="title" type="text" indexed="true" stored="true" />
<field name="titleExact" type="textExact" indexed="true" stored="true" />
<copyField source="title" dest="titleExact"/>


You would notice that the data type for titleExact is set to "textExact" (defined below), although similar exact match effect can be achieved by setting the datatype to "string" but with adding our own datatype we can further fine tune by adding appropriate tokenizer and filters.
<fieldType name="textExact" class="solr.TextField" positionIncrementGap="100" >
   <analyzer type="index">
   <tokenizer class="solr.WhitespaceTokenizerFactory"/>
   <filter class="solr.LimitTokenCountFilterFactory" maxTokenCount="20"/>
   <filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
   <tokenizer class="solr.WhitespaceTokenizerFactory"/>
   <filter class="solr.LimitTokenCountFilterFactory" maxTokenCount="20"/>
   <filter class="solr.LowerCaseFilterFactory"/>
</analyzer>

Here I have used WhiteSpaceTokenizer without stopwords or stemming filters. I am using LimitTokenCounterFilter to limit the number of tokens and LowerCaseFilter to make the matching case-insensitive. We can further fine tune the textExact dataType to make the exact match a bit more lenient or strict per our use case.

Putting it All Together


Now to boost the exact match field and phrase matching, in the SolrConfig.xml -
<str name="qf">title titleExact^10</str>
<str name="pf">title^10 titleExact^100</str>

Now for both query and phrase matching we are boosting the exact matching field "titleExact" match higher than the non-exact matching field "title", also the same fields are boosted higher for phrase search (pf) compare to query or keyword search (qf). This would be a simple and first step to improving relevancy.

15 comments:

  1. Hello,

    Thanks for this nice article,

    I am trying to apply to this to my project. I have in schema.xml:



    in solrconfig.xml:



    explicit
    json
    true
    text



    How will I extend solrconfig to include title_exact?

    Thanks

    ReplyDelete
  2. @Service Broker, Based on how you are importing data, using the DataImportHandler or the curl based imports, you can apply exact matching on the fields you are already importing. In my example above, I am importing the field title and I have extended the field in my Schema.xml by copying the content from the same field into another field "title_exact" and applying fieldType to it.
    Hope this helps.

    ReplyDelete
  3. Thank you! It saved my day :)

    ReplyDelete
  4. helpful post. saved my day

    Thanks,

    ReplyDelete
  5. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in TECHNOLOGY , kindly contact us http://www.maxmunus.com/contact
    MaxMunus Offer World Class Virtual Instructor led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Saurabh Srivastava
    MaxMunus
    E-mail: saurabh@maxmunus.com
    Skype id: saurabhmaxmunus
    Ph:+91 8553576305 / 080 - 41103383
    http://www.maxmunus.com/


    ReplyDelete
  6. does the last part, with the added lines for solrconfig.xml need to go inside one of the handlers that's already defined in solrconfig.xml?

    ReplyDelete
    Replies
    1. Yes, "pf" and "qf" is defined within your SearchHandler.

      Delete
  7. Your blog has given me that thing which I never expect to get from all over the websites. Nice post guys!

    ReplyDelete
  8. I read this article. I think You put a lot of effort to create this article. I appreciate your work.
    thesis Writing Service

    ReplyDelete
  9. Thank you for sharing this information, Its has help me to know more about Sound Booster

    ReplyDelete
  10. Hello, I Like your blog, I wanted to leave a little comment to support you and wish you a good continuation. Wish you best of luck for all your best efforts. Rak free zone

    ReplyDelete