With SOLR minimum match parameter (mm), the constraint is
applied on all the fields used in the matching (qf) collectively. So, if the
query is of two keywords, and each keyword was found in different matching
fields, the document is deemed matched and relevant.
For example,
qf = title, description, keyword
mm=2>75%
q= adopt a pet dog, where the matching keywords are “adopt”,
“pet” and “dog”.
This could match a document with title – “Adopting Animals”,
description talks about all the pet animals and the keyword has the list of
animals including dog. This could equally match a document with title – “How to
adopt a dog” with the page describing it. But the second document might be
ranked lower than the first due to document size and keyword count in the
document description even though it would be more relevant to the query here.
Also matching tokens in description field can dilute the
ranking relevance but the document might get ranked higher because of
tf-idf. We can lower the matching
criteria of description field over title; eg, qf = title^5, description^2,
keyword; and address the issue to some extent.
Here we will talk about setting different minimum match
criteria (mm) for each index field to further restrict the matching and not let
matching keywords found in different index fields dilute the relevancy. This
solution can help improve the document relevancy by 12% -20% (per simple result
text similarity score generator).
Configure new SearchComponent params in the SolrConfig.xml
to setup per index field mm value. This is only an example, the format of the
field.mm depends on your implementation-
|
|
|
|
|
Since the minimum match (mm) field is processed and set in
QParser class, we will set the minimum match criteria per field and update the
parameters in this class.
Here is the QueryProcessor interface to extend from-
public interface QueryProcessor
|
{
|
void preprocess(QParser
qPlugin);
|
Query process(QParser
qPlugin, Query prevQuery) throws ParseException;
|
}
|
The MinimumMatchFieldQueryProcessor implementation-
public class MinimumMatchFieldQueryProcessor implements QueryProcessor
{
|
private
Map
|
private String mmOP;
|
private String lang = null;
|
|
@Override
|
public void preprocess(QParser
qPlugin) {
|
String fieldsToMatch =
qPlugin.getParams().get("minmatch.mm");
|
mmOP =
qPlugin.getParams().get("minmatch.op", "AND");
|
minMatchFieldsMap =
new HashMap
|
String[] fields = fieldsToMatch.split("\\|\\|");
|
for (String field :
fields) {
|
int indx =
field.indexOf("=");
|
if (indx != -1)
|
{
|
minMatchFieldsMap.put(field.substring(0, indx).replaceAll("_mm",
"”),
|
field.substring(indx
+ 1));
|
}
|
}
|
}
|
|
@Override
|
public Query process(QParser
qPlugin, Query prevQuery) throws ParseException {
|
|
String queryString =
CommonUtils.extractPureQuery(qPlugin.getString());
|
|
if (StringUtils.isBlank(queryString))
return prevQuery;
|
|
BooleanQuery bq = new
BooleanQuery(true);
|
for (Map.Entry
|
{
|
String subQueryString =
String.format("_query_:\"{!edismax qf=%s mm=%s}%s\"", entry.getKey(),
entry.getValue(), queryString);
|
|
Query minMatchQuery =
qPlugin.subQuery(subQueryString, "lucene")
|
.getQuery();
|
|
if ("and".equalsIgnoreCase(mmOP))
|
{
|
bq.add(minMatchQuery,
Occur.MUST);
|
}
|
else
|
{
|
bq.add(minMatchQuery,
Occur.SHOULD);
|
}
|
}
|
return bq;
|
}
|
}
|
In the next blog I will talk about how to add a customized
QueryProcessor.