Wednesday, March 13, 2013

Minimum Match per index field: SOLR Ranking and Relevance improvement


With SOLR minimum match parameter (mm), the constraint is applied on all the fields used in the matching (qf) collectively. So, if the query is of two keywords, and each keyword was found in different matching fields, the document is deemed matched and relevant.
For example,
qf = title, description, keyword
mm=2>75%
q= adopt a pet dog, where the matching keywords are “adopt”, “pet” and “dog”.

This could match a document with title – “Adopting Animals”, description talks about all the pet animals and the keyword has the list of animals including dog. This could equally match a document with title – “How to adopt a dog” with the page describing it. But the second document might be ranked lower than the first due to document size and keyword count in the document description even though it would be more relevant to the query here.

Also matching tokens in description field can dilute the ranking relevance but the document might get ranked higher because of tf-idf.  We can lower the matching criteria of description field over title; eg, qf = title^5, description^2, keyword; and address the issue to some extent.

Here we will talk about setting different minimum match criteria (mm) for each index field to further restrict the matching and not let matching keywords found in different index fields dilute the relevancy. This solution can help improve the document relevancy by 12% -20% (per simple result text similarity score generator).
Configure new SearchComponent params in the SolrConfig.xml to setup per index field mm value. This is only an example, the format of the field.mm depends on your implementation-
com.test.solr.qparser.MinimumFieldMatchQueryProcessor

title_mm=3<75%||description_mm=3<75%||keyword_mm=3<75%
or


Since the minimum match (mm) field is processed and set in QParser class, we will set the minimum match criteria per field and update the parameters in this class.
Here is the QueryProcessor interface to extend from-

public interface QueryProcessor
{
     void preprocess(QParser qPlugin);
     Query process(QParser qPlugin, Query prevQuery) throws ParseException;
}

The MinimumMatchFieldQueryProcessor implementation-


public class MinimumMatchFieldQueryProcessor implements QueryProcessor {
     private Map minMatchFieldsMap = null;
     private String mmOP;
     private String lang = null;

    @Override
    public void preprocess(QParser qPlugin) {
                  String fieldsToMatch = qPlugin.getParams().get("minmatch.mm");
                  mmOP = qPlugin.getParams().get("minmatch.op", "AND");
              minMatchFieldsMap = new HashMap();
String[] fields = fieldsToMatch.split("\\|\\|");
              for (String field : fields) {
                int indx = field.indexOf("=");
                if (indx != -1)
               {
                       minMatchFieldsMap.put(field.substring(0, indx).replaceAll("_mm", "”),
               field.substring(indx + 1));
                }
          }
     }

 @Override
  public Query process(QParser qPlugin, Query prevQuery) throws ParseException {

       String queryString = CommonUtils.extractPureQuery(qPlugin.getString());

       if (StringUtils.isBlank(queryString)) return prevQuery;

      BooleanQuery bq = new BooleanQuery(true);
      for (Map.Entry entry : minMatchFieldsMap.entrySet())
      {
                  String subQueryString = String.format("_query_:\"{!edismax qf=%s mm=%s}%s\"", entry.getKey(), entry.getValue(), queryString);

             Query minMatchQuery = qPlugin.subQuery(subQueryString, "lucene")
             .getQuery();

                  if ("and".equalsIgnoreCase(mmOP))
                  {
                                    bq.add(minMatchQuery, Occur.MUST);
                  }
                  else
              {
                                    bq.add(minMatchQuery, Occur.SHOULD);
                  }
       }
       return bq;
      }
}

In the next blog I will talk about how to add a customized QueryProcessor.