elasticsearch 聚合结果的问题

2022-01-30 10:42:32 +08:00
 rqxiao
最近看 es 聚合分页的时候看到了 es 聚合结果不准的问题。

首先创建一个 index (分片数量大于 1 才会出现聚合不准的问题)
PUT /my_aggs_3
{ "settings": { "number_of_shards": 3}}


POST /my_aggs_/_bulk
{ "index": {}}
{ "money": 50, "bid":"11" }
{ "index": {}}
{ "money": 40, "bid":"11" }
{ "index": {}}
{ "money": 20, "bid":"11" }
{ "index": {}}
{ "money": 10, "bid":"10" }
{ "index": {}}
{ "money": 10, "bid":"10" }
{ "index": {}}
{ "money": 10, "bid":"10" }
{ "index": {}}
{ "money": 10, "bid":"10" }
{ "index": {}}
{ "money": 10, "bid":"10" }
{ "index": {}}
{ "money": 10, "bid":"10" }
{ "index": {}}
{ "money": 10, "bid":"10" }
{ "index": {}}
{ "money": 10, "bid":"10" }
{ "index": {}}
{ "money": 10, "bid":"10" }
{ "index": {}}
{ "money": 10, "bid":"10" }
{ "index": {}}
{ "money": 10, "bid":"9" }
{ "index": {}}
{ "money": 20, "bid":"9" }
{ "index": {}}
{ "money": 20, "bid":"9" }
{ "index": {}}
{ "money": 20, "bid":"9" }
{ "index": {}}
{ "money": 20, "bid":"9" }
{ "index": {}}
{ "money": 60, "bid":"8" }
{ "index": {}}
{ "money": 10, "bid":"8" }
{ "index": {}}
{ "money": 10, "bid":"8" }
{ "index": {}}
{ "money": 60, "bid":"7" }
{ "index": {}}
{ "money": 10, "bid":"7" }
{ "index": {}}
{ "money": 20, "bid":"6" }
{ "index": {}}
{ "money": 40, "bid":"6" }
{ "index": {}}
{ "money": 10, "bid":"5" }
{ "index": {}}
{ "money": 20, "bid":"5" }
{ "index": {}}
{ "money": 20, "bid":"5" }
{ "index": {}}
{ "money": 40, "bid":"4" }
{ "index": {}}
{ "money": 30, "bid":"3" }
{ "index": {}}
{ "money": 10, "bid":"2" }
{ "index": {}}
{ "money": 10, "bid":"2" }
{ "index": {}}
{ "money": 10, "bid":"1" }

一开始怎么测试都测不出不正确的结果,后来调小了 shard_size (官网说默认是 1.5 * size + 10 )

size 是你想要取前几名数据的几
shard_size 则是 es 会去每个分片上找多少个记录

GET my_aggs/_search
{
"from": 0,
"size": 0,
"aggs": {
"aggs_bid": {
"terms": {
"field": "bid.keyword",
"size":3,
"shard_size": 3,
"order": {
"aggs_money": "desc"
}
},
"aggs": {
"aggs_money": {
"sum": {
"field": "money"
}
}
}
}
}
}

----------结果-------------------

{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 33,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"aggs_bid" : {
"doc_count_error_upper_bound" : -1,
"sum_other_doc_count" : 20,
"buckets" : [
{
"key" : "11",
"doc_count" : 3,
"aggs_money" : {
"value" : 110.0
}
},
{
"key" : "10",
"doc_count" : 8,
"aggs_money" : {
"value" : 80.0
}
},
{
"key" : "8",
"doc_count" : 2,
"aggs_money" : {
"value" : 70.0
}
}
]
}
}
}
2015 次点击
所在节点    Elasticsearch
1 条回复
rqxiao
2022-01-30 10:46:20 +08:00
所以想请教下 es 聚合中一般采用什么方式,现在我直接把 size 调到 Integer.MAX 。其他的做法其实还有增加分片数量。问下 es 聚合遇到的时候实际生产是怎么做的

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/831374

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX