您的位置:首页 > 其它

2016重走solr长征之路:solr的多field字段查询

2016-06-15 18:34 337 查看
我们做搜索的时候,经常会遇到需要搜索多字段的情况。假设index格式如下:

document:{title:”solr的多field字段查询”,content:”solr的多field字段查询[b]********************[/b]”,describe:”这是一篇关于solr的技术文章”}。

搜索文章,会到“标题”(title)、“正文”(content)、“简介”(describe)里搜索是否包含相关关键字。区别无非是各个field字段的权重不同,通常title权重最高,describe最少。

lucene中可以如下操作:

Map<String , Float> boosts = new HashMap<String, Float>();
boosts.put("title", 1.0f);
boosts.put("content", 0.1f);
boosts.put("describe", 0.01f);

String[] fields = new String[]{"title","content","describe"};
QueryParser parser = new MultiFieldQueryParser(fields, new WhitespaceAnalyzer(),boosts);
Query query = parser.parse("直播");


solr中可以如下操作:

在SOLR中,有两种方式实现类似搜索效果,不过结果score大相径庭。

1:在q中直接拼接

q=title:直播^1+OR+content:直播^0.1+OR+describe:直播^0.01

2:

q=直播&defType=dismax/edismax/MydefType&qf=title^1+content^0.1+describe^0.01

MydefType指自定义的defType

很多情况下,两个方法,能够实现相同的排序结果,但是它们的计算规则并不相同,实际score也不同

第一钟方式,相当于多个field得分累加,最终匹配分值(sumOfSquaredWeights)=查询语句在两个域中的得分之和,实际上也就是上述lucene的实现方式;

第二种,是多个field计算各自的得分,然后取其中一个得分最高的field的得分,solr源码可见:

org.apache.lucene.search.DisjunctionMaxQuery:

public float getValueForNormalization() throws IOException {
float max = 0.0f, sum = 0.0f;
for (Weight currentWeight : weights) {
float sub = currentWeight.getValueForNormalization();
sum += sub;
max = Math.max(max, sub);

}
float boost = getBoost();
return (((sum - max) * tieBreakerMultiplier * tieBreakerMultiplier) + max) * boost * boost;
}


通过debugQuery也可以看到两种打分的区别:

"1067563474": "\n4.9853425 = max of:\n  2.2417927 = weight(yy_keyword:直播^0.2 in 3527) [ApplistSolrSimilarity], result of:\n    2.2417927 = score(doc=3527,freq=1.0), product of:\n      0.23708561 = queryWeight, product of:\n        0.2 = boost\n        9.455626 = idf(docFreq=17, maxDocs=84626)\n        0.12536749 = queryNorm\n      9.455626 = fieldWeight in 3527, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        9.455626 = idf(docFreq=17, maxDocs=84626)\n        1.0 = fieldNorm(doc=3527)\n  0.6107991 = weight(describe:直播^0.1 in 3527) [ApplistSolrSimilarity], result of:\n    0.6107991 = score(doc=3527,freq=1.0), product of:\n      0.08750677 = queryWeight, product of:\n        0.1 = boost\n        6.980021 = idf(docFreq=213, maxDocs=84626)\n        0.12536749 = queryNorm\n      6.980021 = fieldWeight in 3527, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        6.980021 = idf(docFreq=213, maxDocs=84626)\n        1.0 = fieldNorm(doc=3527)\n  4.9853425 = weight(title:直播 in 3527) [ApplistSolrSimilarity], result of:\n    4.9853425 = score(doc=3527,freq=1.0), product of:\n      0.99999994 = queryWeight, product of:\n        7.976549 = idf(docFreq=78, maxDocs=84626)\n        0.12536749 = queryNorm\n      4.985343 = fieldWeight in 3527, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        7.976549 = idf(docFreq=78, maxDocs=84626)\n        0.625 = fieldNorm(doc=3527)\n",


"1067563474": "\n5.574838 = sum of:\n  4.9663644 = weight(title:直播 in 3527) [ApplistSolrSimilarity], result of:\n    4.9663644 = score(doc=3527,freq=1.0), product of:\n      0.9961931 = queryWeight, product of:\n        7.976549 = idf(docFreq=78, maxDocs=84626)\n        0.12489024 = queryNorm\n      4.985343 = fieldWeight in 3527, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        7.976549 = idf(docFreq=78, maxDocs=84626)\n        0.625 = fieldNorm(doc=3527)\n  0.6084739 = weight(describe:直播^0.1 in 3527) [ApplistSolrSimilarity], result of:\n    0.6084739 = score(doc=3527,freq=1.0), product of:\n      0.08717365 = queryWeight, product of:\n        0.1 = boost\n        6.980021 = idf(docFreq=213, maxDocs=84626)\n        0.12489024 = queryNorm\n      6.980021 = fieldWeight in 3527, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        6.980021 = idf(docFreq=213, maxDocs=84626)\n        1.0 = fieldNorm(doc=3527)\n",


可见,一个是max(取最大值),一个是sum(相加合并)。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  solr 搜索 edismax qf