为什么数据库分了表后性能会直线下降?

2020-06-17 23:10:58 +08:00
 Pendragon

原先有按月分的 12 张表,每张表里大概有 2000w 左右的数据

...
CREATE TABLE `parcel_with_no_partition_m05` (
...
)
CREATE TABLE `parcel_with_no_partition_m06` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '数据主键,毫无意义',
  `equip_id` int(11) DEFAULT NULL,
  `scan_id` int(11) DEFAULT NULL,
  `create_datetime` datetime DEFAULT NULL,
  `minute_of_hour` int(11) DEFAULT NULL,
  `hour_of_day` int(11) DEFAULT NULL,
  `day_of_month` int(11) DEFAULT NULL,
  `week_of_year` int(11) DEFAULT NULL,
  `month_of_year` int(11) DEFAULT NULL,
  `parcel_length` int(11) DEFAULT NULL
  PRIMARY KEY (`id`)
) ENGINE=innodb AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

CREATE TABLE `parcel_with_no_partition_m07` (
...
)
...

现在想尝试用 mysql8 的 partition 来进行分表,集中存储了 12 个月的一共 2 亿多条数据,每个分区 2000w 条左右的数据,与之前的表数据一致

CREATE TABLE `parcel_with_partition` (
  `equip_id` int(11) DEFAULT NULL,
  `scan_id` int(11) DEFAULT NULL,
  `create_datetime` datetime DEFAULT NULL,
  `minute_of_hour` int(11) DEFAULT NULL,
  `hour_of_day` int(11) DEFAULT NULL,
  `day_of_month` int(11) DEFAULT NULL,
  `week_of_year` int(11) DEFAULT NULL,
  `month_of_year` int(11) DEFAULT NULL,
  `parcel_length` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ENGINE=InnoDB PARTITION BY RANGE(`month_of_year`) (
PARTITION `JAN` VALUES LESS THAN (2),
PARTITION `FEB` VALUES LESS THAN (3),
PARTITION `MAR` VALUES LESS THAN (4),
PARTITION `APR` VALUES LESS THAN (5),
PARTITION `MAY` VALUES LESS THAN (6),
PARTITION `JUN` VALUES LESS THAN (7),
PARTITION `JUL` VALUES LESS THAN (8),
PARTITION `AUG` VALUES LESS THAN (9),
PARTITION `SEP` VALUES LESS THAN (10),
PARTITION `OCT` VALUES LESS THAN (11),
PARTITION `NOV` VALUES LESS THAN (12),
PARTITION `DEC` VALUES LESS THAN (13)
);

我尝试执行了一条业务 sql,发现效率下降了一倍

#从 mysql 自带的分区中取数据
select sql_no_cache count(*) as num,`day_of_month` as day,`hour_of_day` as hour,`scan_id` as scanId from `parcel_with_partition` partition(`JUN`) where `equip_id` = 3 and `create_datetime` >'2020-06-01' and `create_datetime` <= '2020-06-29' group by `scan_id`,`day_of_month`,`hour_of_day`
2018 rows in set (59.56 sec)

#从原始分表中取数据
 select sql_no_cache count(*) as num,`day_of_month` as day,`hour_of_day` as hour,`scan_id` as scanId from `parcel_with_no_partition_m06` where `equip_id` = 3 and `create_datetime` >'2020-06-01' and `sort_datetime` <= '2020-06-29' group by `scan_id`,`day_of_month`,`hour_of_day`
 2018 rows in set, 1 warning (29.13 sec)

又进一步用 explain 和 performance schema 分别观察了两种 sql

mysql 自带分区查询详情

explain:
+----+-------------+-----------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+
| id | select_type | table                 | partitions | type | possible_keys | key  | key_len | ref  | rows     | filtered | Extra                        |
+----+-------------+-----------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+
|  1 | SIMPLE      | parcel_with_partition | JUN        | ALL  | NULL          | NULL | NULL    | NULL | 20481855 |     1.11 | Using where; Using temporary |
+----+-------------+-----------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+

table i/o:
select * from table_io_waits_summary_by_table where count_star>0 \G;
*************************** 1. row ***************************
     OBJECT_TYPE: TABLE
   OBJECT_SCHEMA: test
     OBJECT_NAME: parcel_with_partition
      COUNT_STAR: 22123040
  SUM_TIMER_WAIT: 55709022384855
  MIN_TIMER_WAIT: 55709022384855
  AVG_TIMER_WAIT: 2517879
  MAX_TIMER_WAIT: 55709022384855
      COUNT_READ: 22123040
  SUM_TIMER_READ: 55709022384855
  MIN_TIMER_READ: 55709022384855
  AVG_TIMER_READ: 2517879
  MAX_TIMER_READ: 55709022384855
     COUNT_WRITE: 0
 SUM_TIMER_WRITE: 0
 MIN_TIMER_WRITE: 0
 AVG_TIMER_WRITE: 0
 MAX_TIMER_WRITE: 0
     COUNT_FETCH: 22123040
 SUM_TIMER_FETCH: 55709022384855
 MIN_TIMER_FETCH: 55709022384855
 AVG_TIMER_FETCH: 2517879
 MAX_TIMER_FETCH: 55709022384855
    COUNT_INSERT: 0
SUM_TIMER_INSERT: 0
MIN_TIMER_INSERT: 0
AVG_TIMER_INSERT: 0
MAX_TIMER_INSERT: 0
    COUNT_UPDATE: 0
SUM_TIMER_UPDATE: 0
MIN_TIMER_UPDATE: 0
AVG_TIMER_UPDATE: 0
MAX_TIMER_UPDATE: 0
    COUNT_DELETE: 0
SUM_TIMER_DELETE: 0
MIN_TIMER_DELETE: 0
AVG_TIMER_DELETE: 0
MAX_TIMER_DELETE: 0

file i/o:
                FILE_NAME: /home/wedo/mysql/data/mysql/test/parcel_with_partition#p#jun.ibd
               EVENT_NAME: wait/io/file/innodb/innodb_data_file
    OBJECT_INSTANCE_BEGIN: 140081009323904
               COUNT_STAR: 422002
           SUM_TIMER_WAIT: 3965120084812
           MIN_TIMER_WAIT: 5739112
           AVG_TIMER_WAIT: 9395831
           MAX_TIMER_WAIT: 498159493
               COUNT_READ: 422002
           SUM_TIMER_READ: 3965120084812
           MIN_TIMER_READ: 5739112
           AVG_TIMER_READ: 9395831
           MAX_TIMER_READ: 498159493
 SUM_NUMBER_OF_BYTES_READ: 6914080768
              COUNT_WRITE: 0
          SUM_TIMER_WRITE: 0
          MIN_TIMER_WRITE: 0
          AVG_TIMER_WRITE: 0
          MAX_TIMER_WRITE: 0
SUM_NUMBER_OF_BYTES_WRITE: 0
               COUNT_MISC: 0
           SUM_TIMER_MISC: 0
           MIN_TIMER_MISC: 0
           AVG_TIMER_MISC: 0
           MAX_TIMER_MISC: 0

原始分表查询详情

explain:
 +----+-------------+-----------------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+
| id | select_type | table                        | partitions | type | possible_keys | key  | key_len | ref  | rows     | filtered | Extra                        |
+----+-------------+------------------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+
|  1 | SIMPLE      | parcel_with_no_partition_m06 | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 22123040 |     1.11 | Using where; Using temporary |
+----+-------------+------------------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+

table i/o:
     OBJECT_TYPE: TABLE
   OBJECT_SCHEMA: test
     OBJECT_NAME: parcel_with_no_partition_m06
      COUNT_STAR: 22123040
  SUM_TIMER_WAIT: 23182194259854
  MIN_TIMER_WAIT: 23182194259854
  AVG_TIMER_WAIT: 1047813
  MAX_TIMER_WAIT: 23182194259854
      COUNT_READ: 22123040
  SUM_TIMER_READ: 23182194259854
  MIN_TIMER_READ: 23182194259854
  AVG_TIMER_READ: 1047813
  MAX_TIMER_READ: 23182194259854
     COUNT_WRITE: 0
 SUM_TIMER_WRITE: 0
 MIN_TIMER_WRITE: 0
 AVG_TIMER_WRITE: 0
 MAX_TIMER_WRITE: 0
     COUNT_FETCH: 22123040
 SUM_TIMER_FETCH: 23182194259854
 MIN_TIMER_FETCH: 23182194259854
 AVG_TIMER_FETCH: 1047813
 MAX_TIMER_FETCH: 23182194259854
    COUNT_INSERT: 0
SUM_TIMER_INSERT: 0
MIN_TIMER_INSERT: 0
AVG_TIMER_INSERT: 0
MAX_TIMER_INSERT: 0
    COUNT_UPDATE: 0
SUM_TIMER_UPDATE: 0
MIN_TIMER_UPDATE: 0
AVG_TIMER_UPDATE: 0
MAX_TIMER_UPDATE: 0
    COUNT_DELETE: 0
SUM_TIMER_DELETE: 0
MIN_TIMER_DELETE: 0
AVG_TIMER_DELETE: 0
MAX_TIMER_DELETE: 0

file i/o:
                FILE_NAME: /home/wedo/mysql/data/mysql/test/parcel_with_no_partition_m06.ibd
               EVENT_NAME: wait/io/file/innodb/innodb_data_file
    OBJECT_INSTANCE_BEGIN: 140081009372480
               COUNT_STAR: 424687
           SUM_TIMER_WAIT: 4045883911263
           MIN_TIMER_WAIT: 5301220
           AVG_TIMER_WAIT: 9526557
           MAX_TIMER_WAIT: 861143891
               COUNT_READ: 424687
           SUM_TIMER_READ: 4045883911263
           MIN_TIMER_READ: 5301220
           AVG_TIMER_READ: 9526557
           MAX_TIMER_READ: 861143891
 SUM_NUMBER_OF_BYTES_READ: 6958071808
              COUNT_WRITE: 0
          SUM_TIMER_WRITE: 0
          MIN_TIMER_WRITE: 0
          AVG_TIMER_WRITE: 0
          MAX_TIMER_WRITE: 0
SUM_NUMBER_OF_BYTES_WRITE: 0
               COUNT_MISC: 0
           SUM_TIMER_MISC: 0
           MIN_TIMER_MISC: 0
           AVG_TIMER_MISC: 0
           MAX_TIMER_MISC: 0

在 explain 和 file i/o 中都没有看出来异常情况,但是在 table i/o 时 内部分区 使用的时间远高于 原始的分表

这是不是说明多浪费的时间的罪魁祸首是逻辑 i/o 部分?为什么分表后会出现这样的结果呢?希望大神们多多指教

补充一下环境: 系统:centos7 存储引擎:mysql8 服务器配置:12 核 2.5Ghz 64G 内存 硬盘:机械硬盘 在执行查询 sql 时最大的内存占用 3G

3062 次点击
所在节点    MySQL
10 条回复
xuanbg
2020-06-17 23:36:50 +08:00
2000 万和 2.4 亿能一样? PARTITION BY RANGE 实际上还是 1 张表,只不过数据分开存罢了。
ajsonx
2020-06-18 00:02:23 +08:00
8 的窗口函数我试过几个,主要是为了编写方便吧。试过 1 千万的数据,性能上都不如写联表查询。
Pendragon
2020-06-18 00:09:11 +08:00
@xuanbg 我最开始也是这么想的,但是分析结果显示两种查询方法的硬盘 i/o 时间是一样的,占用的最大内存也是单张分表 /分区的大小,所以分区方式应该并没有把 2.4 亿的数据全部取出来
mahone3297
2020-06-18 00:18:12 +08:00
#从 mysql 自带的分区中取数据
select sql_no_cache count(*) as num,`day_of_month` as day,`hour_of_day` as hour,`scan_id` as scanId from `parcel_with_partition` partition(`JUN`) where `equip_id` = 3 and `create_datetime` >'2020-06-01' and `create_datetime` <= '2020-06-29' group by `scan_id`,`day_of_month`,`hour_of_day`
2018 rows in set (59.56 sec)

#从原始分表中取数据
select sql_no_cache count(*) as num,`day_of_month` as day,`hour_of_day` as hour,`scan_id` as scanId from `parcel_with_no_partition_m06` where `equip_id` = 3 and `create_datetime` >'2020-06-01' and `sort_datetime` <= '2020-06-29' group by `scan_id`,`day_of_month`,`hour_of_day`
2018 rows in set, 1 warning (29.13 sec)


* 2 个的条件不一样?
partition 都是 `create_datetime` >'2020-06-01' and `create_datetime` <= '2020-06-29'
分表 `create_datetime` >'2020-06-01' and `sort_datetime` <= '2020-06-29'
* 还有,你是 from `parcel_with_partition` partition(`JUN`) 这样指定其中一张分区表的?能否 from parcel_with_partition_JUN 这样来对比?
mahone3297
2020-06-18 00:19:28 +08:00
分区表的用法,应该是你 where 条件中,有了分区条件,使得,你的 sql 可以直接查询某几张表,而不是所有表。
flashrick
2020-06-18 09:40:30 +08:00
依我看,你分区的查询用错了啊,你用 month_of_year 分区的,查询的时候就是一条普通的 select 就行了,只是 where 条件要用到 month_of_year 。说错了请轻喷
Pendragon
2020-06-18 09:54:05 +08:00
@flashrick
@mahone3297
你说的没错 在 where 条件中指定 month_of_year 的值也能命中相应的分区,我在这里用 partition 强制限制分区是为了减少其他条件对查询的影响, 最终结果都是一样的
Still4
2020-06-18 10:00:05 +08:00
看了下分区表实现原理,每个分区一个文件,这样的话两个对照表文件大小一致,如果没有加索引,都是扫全表,执行时间不应该差很多啊

再复制一个 parcel_with_no_partition_m06 表,确认下机械盘内道外道的影响?
someonedeng
2020-06-24 17:13:16 +08:00
话说本来是分表的,为什么又合成一个大表再分区而不是直接对小表分区?
encro
2020-07-01 17:20:42 +08:00
既然你都是根据 create_datetime 查询的,那么就根据 create_datetime 分区即可

http://mysql.taobao.org/monthly/2017/11/09/

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/682506

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX