If your use cases
demands strict matching here is an example of how you can restrict
matching based on token count. In the example below, we are narrowing the
search to all the keywords of the query +/- one. You can certainly change the
range parameter to span over +/- any count. Also you can tune the
matching by adding list of filters in the field analyzer, add stop word filters,
remove duplicates etc.
Setting Token Count field
First
we will add the token count field in our Schema to hold the count of tokens for
the field “title”.
Next we override the prepare() method in the above class to add the token range in the filter query and update the ModifiableSolrParams with the new filter query on the token range.
< field name="titleToken"
type="int" indexed="true" stored="true" / >
|
< field name="title"
type="text" indexed="true" stored="true" / >
|
Next
we extend SearchComponent class to update the titleToken field with the count
of tokens in the field “title” after the analyzer setting comes to affect, in
the example case, the analyzer setting for fieldType=”text”.
Extend SearchComponent
Here we will extend the SearchComponent to read the field /fields on which we want to restrict the matching based on token count, title for example. Read the analyzer setup in inform() method to apply the settings for the title field in the Schema.xml.
public class
QueryTokenComponent extends SearchComponent implements SolrCoreAware {
private String fieldname = “title”;
private Analyzer analyzer;
@Override
public void init(NamedList args) {
super.init(args);
}
public void inform(SolrCore core) {
analyzer =
core.getSchema().getAnalyzer();
}
@Override
public void prepare(ResponseBuilder rb)
throws IOException {
}
|
Next we override the prepare() method in the above class to add the token range in the filter query and update the ModifiableSolrParams with the new filter query on the token range.
@Override
|
public void
prepare(ResponseBuilder rb) throws IOException {
|
SolrQueryRequest
req = rb.req;
|
SolrParams
params = req.getParams();
|
ModifiableSolrParams
modparams = new ModifiableSolrParams(params);
|
String
queryString = modparams.get(CommonParams.Q);
|
int
tokenCnt = AnalyzerUtils.getTokens(analyzer, fieldName, queryString);
|
modparams.add(CommonParams.FQ,
"titleToken:[ " + (tokenCnt - 1) + " TO " + (tokenCnt +1)
+"]");
|
req.setParams(modparams);
|
}
|
And here’s how the
getTokens method will look like-
public int getTokens( Analyzer analyzer, String
field, String query) throws IOException {
|
TokenStream
tokenStream = analyzer.tokenStream(field, new StringReader(query));
|
CharTermAttribute
termAttribute = tokenStream.getAttribute(CharTermAttribute.class);
|
String
term = “”;
|
List
|
while
(tokenStream.incrementToken())
{
|
term
= termAttribute.toString();
|
tokens.add(term);
|
}
|
return
tokens.size();
|
}
|
Debug
If you’d like to see the token count or the tokens that come to play, add the field in the schema.xml and update the values in UpdateRequestProcessor class extension.
…
class TokenCountProcessHandler extends
UpdateRequestProcessor
{
private Analyzer analyzer;
public TokenCountProcessHandler (
SolrQueryRequest req,
SolrQueryResponse rsp,
UpdateRequestProcessor next)
{
super( next );
analyzer =
req.getSchema().getAnalyzer();
}
…
@Override
|
public void processAdd(final AddUpdateCommand
cmd) throws IOException
|
{
|
SolrInputDocument doc =
cmd.getSolrInputDocument();
|
Object v = doc.getFieldValue( "title"
);
|
if( v != null )
{
|
String
title = v.toString();
|
doc.addField("wrd_cnt",
getTokens(analyzer, "title", title).size());
|
}
|
cmd.solrDoc
= doc;
|
//
pass it up the chain
|
super.processAdd(cmd);
|
}
…
}
|