Github找了ik中分分词插件mapping的时候,很自然的使用这样的参数(参照官方分词文档实例){
"properties": {
"title": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
}
}
}
curl 127.0.0.1:9200/test/_search | jq
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "Video_1",
"_score": 1,
"_source": {
"id": 1,
"title": "打火车"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "Video_2",
"_score": 1,
"_source": {
"id": 2,
"title": "火车"
}
}
]
}
}
curl 127.0.0.1:9200/test/_search?q=打火车 | jq
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.21110919,
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "Video_2",
"_score": 0.21110919,
"_source": {
"id": 2,
"title": "火车"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "Video_1",
"_score": 0.160443,
"_source": {
"id": 1,
"title": "打火车"
}
}
]
}
}
火车的分值是0.21110919居然比打火车的0.160443还高curl 127.0.0.1:9200/test/_doc/Video_1/_termvectors?fields=title | jq
{
"_index": "test",
"_type": "_doc",
"_id": "Video_1",
"_version": 1,
"found": true,
"took": 0,
"term_vectors": {
"title": {
"field_statistics": {
"sum_doc_freq": 3,
"doc_count": 2,
"sum_ttf": 3
},
"terms": {
"打火": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 2
}
]
},
"火车": {
"term_freq": 1,
"tokens": [
{
"position": 1,
"start_offset": 1,
"end_offset": 3
}
]
}
}
}
}
}
打火和火车两个词, 所以这之中肯定有问题了(当然对于搜索引擎是没有问题的).打火车文档中的火车得到了分值,但打火会使搜索得分下降, 导致火车文档的排名靠前{
"properties": {
"title": {
"type": "text",
"analyzer": "ik_smart",
"search_analyzer": "ik_smart"
}
}
}
curl 127.0.0.1:9200/test/_doc/Video_1/_termvectors?fields=title | jq
{
"_index": "test",
"_type": "_doc",
"_id": "Video_1",
"_version": 1,
"found": true,
"took": 0,
"term_vectors": {
"title": {
"field_statistics": {
"sum_doc_freq": 3,
"doc_count": 2,
"sum_ttf": 3
},
"terms": {
"打": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 1
}
]
},
"火车": {
"term_freq": 1,
"tokens": [
{
"position": 1,
"start_offset": 1,
"end_offset": 3
}
]
}
}
}
}
}
curl 127.0.0.1:9200/test/_search?q=打火车 | jq
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.77041256,
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "Video_1",
"_score": 0.77041256,
"_source": {
"id": 1,
"title": "打火车"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "Video_2",
"_score": 0.21110919,
"_source": {
"id": 2,
"title": "火车"
}
}
]
}
}
这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。
V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。
V2EX is a community of developers, designers and creative people.