Monday, 1 September 2014

Some concepts of Solr

Q. Difference in simple query parser and Dismax and eDismax query parser
A: Dismax is an abbreviation of Disjoint Max and it is a popular query mode with Solr. Default query parse in Solr is quite stupid, it doesn't support the syntactic parsing of the query and doesn't handles the exceptions smoothly. If you put some extra character, it might throw exception. Where as Dismax query parse is pretty safe with not to get exceptions, more over it understand the query with logical operator, weight allocation and results accordingly. Like ( "Caner" AND "Blood" ) OR "Blood Cancer"^2 : will be processed by Dismax.

The Extended DisMax Query Parser (eDisMax) is a robust parser designed to process advanced user input directly. It searches for the query words across multiple fields with different boosts, based on the significance of each field. Additional options let you influence the score based on rules specific to each use case (independent of user input).

Q. Using Token based searching in Solr
A. It's simple :) You should prepare a corpus of words and phrases and assign different weight to them based on relevance and importance to your domain. Now once you get the search query, use N gram tokenizer to replace the tokens by boosted factor based one their relevance or weight, and use the AND and OR operation wisely.

Example "Blood cancer symptoms", can be converted into "Blood Cancer symptoms"^100 OR ("Blood caner" AND symptom)^10 OR (Blood AND Cancer AND symptom)^5 OR "Blood Cancer"

This query will result you much relevant results than simple query.

Q. Highlighter to highlight the matched keywords
A : First to set the highlighting while making query, it's very simple you just to query.setHightlight(true) and you can also set other parameters.

        query.setHighlight(true);
        query.setParam("hl.fl", highlightingField);
        query.setParam("hl.mergeContiguous","true");
        query.setParam("hl.usePhraseHighlighter","true");
        query.setParam("hl.simple.pre","");
        query.setParam("hl.simple.post","
");        query.setParam("hl.snippets","2");

Now when you get the response, you have to extract the snippets from response. It is normally a Map of highlightings which returns the list of String (which are snippets) by passing the key as identifier field.

 if (response.getHighlighting()!=null
      && response.getHighlighting().get(object.getId().toString()) != null) {
                    List highlightSnippets = response.getHighlighting().get(object.getId().toString()).get(Highlighted_Field_Name);
                    if(highlightSnippets!=null){
                        StringBuilder contentToShow=new StringBuilder();
                        for(String snippet:highlightSnippets){
                            if(contentToShow.length()<170 p="">                                contentToShow.append(snippet).append(" ...");
                            } else {
                                break;
                            }
                        }
                        object.setContentToShow(contentToShow.toString())
                    }                
}

Refer at here http://wiki.apache.org/solr/HighlightingParameters