ON DUPLICATE KEY UPDATE 引起的死锁问题,求助一下

120 天前
 alexfarm

最近生产上遇到了一个死锁问题,看日志应该是和使用了 ON DUPLICATE KEY UPDATE 语法有关系。查阅了一些资料,这个语法的确在并发高时容易引起死锁问题,但主要都是 gap 锁和插入意图锁引起的。但我实际遇到的是和唯一索引、主键索引有关,在测试环境也无法复现,有没有大佬帮助一下。 以下是相关环境信息和日志信息, tb 是主要更新的表,tb1 和 tb2 是用于关联查询出插入内容

Mysql 5.7 RC 事务隔离级别

CREATE TABLE `tb` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
  `bid` varchar(38) NOT NULL,
  `sid` varchar(20) NOT NULL,
  `oid` varchar(20) NOT NULL,
  `emark` char(2) NOT NULL DEFAULT '00',
  `amark` char(2) NOT NULL DEFAULT '00',
  `bmark` char(2) NOT NULL DEFAULT '00',
  `cmark` char(2) NOT NULL DEFAULT '00',
  `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `update_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
  PRIMARY KEY (`id`),
  UNIQUE KEY `idx_bid_sid_oid` (`bid`,`sid`,`oid`),
  KEY `idx_oid` (`oid`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
CREATE TABLE `tb1` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `sid` varchar(20) NOT NULL,
  `bid` varchar(38) NOT NULL,
  `oid` varchar(20) DEFAULT NULL,
  `status` char(2) NOT NULL DEFAULT '1',
  `update_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
  PRIMARY KEY (`id`),
  UNIQUE KEY `idx_bid_sid` (`bid`,`sid`),
  KEY `idx_sid` (`sid`),
  KEY `idx_oid` (`oid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `tb2` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `bid` varchar(38) NOT NULL,
  `expire_time` datetime DEFAULT NULL,
  `status` char(2) NOT NULL,
  `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `update_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `idx_name_list_batch_id` (`batch_id`),
  KEY `idx_name_list_expire_time` (`expire_time`),
  KEY `index_name_list_market_no` (`market_case_no`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 

InnoDB: *** (1) TRANSACTION:

TRANSACTION 7367657071, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 3180164, OS thread handle 140056880891648, query id 2058483539
tb update

insert into tb
    ( 
    bid, sid, oid, emark, amark, bmark, cmark, update_time
   )
    values ('444444', '555555', '666666', '00', '00', '00', '00', now())
    ON DUPLICATE KEY UPDATE
    amark = values(amark),
    update_time = now()

2023-12-28T19:00:05.105996+08:00 3180165 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 270 page no 6338 n bits 352 index idx_bid_sid_oid of table db.tb trx id 7367657071 lock_mode X waiting Record lock, heap no 13 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 25; hex 4f4243574830315a3032363631323032333132323241303536; asc 444444;;
1: len 10; hex 30313939373333303331; asc 555555;;
2: len 6; hex 363733353839; asc 666666;;
3: len 8; hex 00000000000910e7; asc ;;

2023-12-28T19:00:05.106368+08:00 3180165 [Note]
InnoDB: *** (2) TRANSACTION:

TRANSACTION 7367656999, ACTIVE 2 sec inserting, thread declared inside InnoDB 236 mysql tables in use 3, locked 1 465 lock struct(s), heap size 57552, 38900 row lock(s), undo log entries 19415
MySQL thread id 3180165, OS thread handle 140056037168896, query id 2058483278
user Sending data

insert into tb
      ( 
    bid, sid, oid, emark, amark, bmark, update_time
   )
      select tb1.bid, tb1.sid, tb1.oid, '10', '00', '00', now()
      from tb1 left join tb2 
      on tb1.bid = tb2.bid
      where tb2.expire_time < '2023-12-31 19:00:03.963' and tb2.expire_time > now()
        and tb1.status IN
         (  
          '2'
         , 
          '3'
         ) 
       
      ON DUPLICATE KEY UPDATE
      emark = values(emark),
      update_time = now()

2023-12-28T19:00:05.106420+08:00 3180165 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 270 page no 6338 n bits 352 index idx_bid_sid_oid of table db.tb trx id 7367656999 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;;
Record lock, heap no 13 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 25; hex 4f4243574830315a3032363631323032333132323241303536; asc 444444;;
1: len 10; hex 30313939373333303331; asc 555555;;
2: len 6; hex 363733353839; asc 666666;;
3: len 8; hex 00000000000910e7; asc ;;

2023-12-28T19:00:05.139592+08:00 3180165 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 270 page no 7273 n bits 88 index PRIMARY of table db.tb trx id 7367656999 lock_mode X locks rec but not gap waiting
Record lock, heap no 14 PHYSICAL RECORD: n_fields 12; compact format; info bits 0
0: len 8; hex 00000000000aaf6c; asc l;;
1: len 6; hex 0001b725866f; asc % o;;
2: len 7; hex e20000118b0110; asc ;;
3: len 25; hex 4f4243574830315a3032363631323032333132323241303536; asc 444444;;
4: len 10; hex 30313939373333303331; asc 555555;;
5: len 6; hex 363733353839; asc 666666;;
6: len 2; hex 3030; asc 00;;
7: len 2; hex 3030; asc 00;;
8: len 2; hex 3030; asc 00;;
9: len 2; hex 3030; asc 00;;
10: len 5; hex 99b1f93004; asc 0 ;;
11: len 5; hex 99b1f93004; asc 0 ;;

2023-12-28T19:00:05.140247+08:00 3180165 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (1)

832 次点击
所在节点    MySQL
10 条回复
alexfarm
117 天前
求大神给点排查思路啊~
alexfarm
115 天前
h0099
115 天前
https://www.v2ex.com/t/908047#r_14219204
https://www.v2ex.com/t/1005536#r_14219181

我的评价是:建议先大脑升级`最新最热最凶最恶`mysql8.2
alexfarm
115 天前
@h0099 这个升级决定不了呀
h0099
115 天前
事务 2`INSERT INTO tb SELECT ...`的 subquery
```sql
select tb1.bid, tb1.sid, tb1.oid, '10', '00', '00', now()
from tb1 left join tb2
on tb1.bid = tb2.bid
where tb2.expire_time < '2023-12-31 19:00:03.963'
and tb2.expire_time > now()
and tb1.status IN ('2', '3')
```
返回中有没有`(bid, sid, oid)`=`('444444', '555555', '666666')`的行?
h0099
115 天前
h0099
115 天前
首先您 tb2 删漏了
```diff
CREATE TABLE `tb2` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`bid` varchar(38) NOT NULL,
`expire_time` datetime DEFAULT NULL,
`status` char(2) NOT NULL,
`create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`update_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
- UNIQUE KEY `idx_name_list_batch_id` (`batch_id`),
+ UNIQUE KEY `idx_name_bid` (`bid`),
KEY `idx_name_list_expire_time` (`expire_time`),
- KEY `index_name_list_market_no` (`market_case_no`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
```
在 mysql8.0.35 下创建表后
```sql
INSERT INTO tb1(sid, bid, oid, status) VALUES(0, 1, 2, 3);
INSERT INTO tb2(bid, expire_time, status) VALUES(1, ADDDATE(NOW(), 2), 3);
```
默认 RR 事务隔离级别下
```sql
-- session1
START TRANSACTION;
insert into tb(bid, sid, oid, emark, amark, bmark, cmark, update_time) values ('444444', '555555', '666666', '00', '00', '00', '00', now()) ON DUPLICATE KEY UPDATE amark = values(amark), update_time = now();
-- 重复执行
ROLLBACK;
```
```sql
-- session2 您的`'2023-12-31 19:00:03.963'`我改成了`ADDDATE(NOW(), 3)`以便 reprod
START TRANSACTION;
insert into tb(bid, sid, oid, emark, amark, bmark, update_time) select tb1.bid, tb1.sid, tb1.oid, '10', '00', '00', now() from tb1 left join tb2 on tb1.bid = tb2.bid where tb2.expire_time < ADDDATE(NOW(), 3) and tb2.expire_time > now() and tb1.status IN ('2', '3') ON DUPLICATE KEY UPDATE emark = values(emark), update_time = now();
-- 重复执行
ROLLBACK;
```
不论两个谁先执行执行几次它们都不会等待 acquire 另一事务占有的锁
而在 RC 下
|session1|session2|
|-|-|
|`SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;`|Ibid.|
|`START TRANSACTION;`|Ibid.|
|`SHOW SESSION VARIABLES LIKE '%iso%'\G`|Ibid.|
|`insert into tb(bid, sid, oid, emark, amark, bmark, cmark, update_time) values ('444444', '555555', '666666', '00', '00', '00', '00', now()) ON DUPLICATE KEY UPDATE amark = values(amark), update_time = now();`||
||`insert into tb(bid, sid, oid, emark, amark, bmark, update_time) select tb1.bid, tb1.sid, tb1.oid, '10', '00', '00', now() from tb1 left join tb2 on tb1.bid = tb2.bid where tb2.expire_time < ADDDATE(NOW(), 3) and tb2.expire_time > now() and tb1.status IN ('2', '3') ON DUPLICATE KEY UPDATE emark = values(emark), update_time = now();`|
||等待 session1 释放某个锁|
|`ROLLBACK;`||
||继续执行并返回|

而如果 session2 先执行那后续它们以任意顺序执行都不会有等待
这可能是因为`tb`此时作为空表(`CREATE`后没有`INSERT`过) session1`INSERT`后需要表级锁以保证其他事务能够 RC (看到其刚`INSERT`的这行,但由于不是`READ UNCOMMITTED`RU 级别所以需要等待 session1`ROLLBACK`/`COMMIT`)
https://stackoverflow.com/questions/56434878/mysql-deadlock-using-an-index-with-a-new-value/56435470#56435470
> In an empty table, the place that the 1 would have been is "anywhere in the table" (or anywhere from the start to the "supremum" mentioned in the deadlock) - which is consequently gaplocked by the delete. The same is true for the 2. And those locks do not conflict each other by definition.

而在 session2 先`INSERT`后`tb`不是空表所以 session1 无需表级锁而可以在索引`idx_bid_sid`上锁`('444444', '555555')`,而 session2`INSERT SELECT`subquery 所返回行`(bid, sid)=(1, 0)`中也没有跟`('444444', '555555')`冲突因而无需`ON DUPLICATE KEY UPDATE`从而避免等待 session1 释放对那行的锁
所以您的`tb`表在 session1 执行`INSERT`时是空的吗?
h0099
115 天前
如果您能够修改逻辑避免在 sql 中用到`ON DUPLICATE KEY UPDATE`aka 您不是只能加索引调参优化的 dba 而是透过 orm 层层抽象隔离写 java 业务的 dev 那您也可以像您 at 我过来的那帖中那样换成`RC 事务隔离级别`+`SELECT ... WHERE unique 约束或 PK FOR UPDATE`产生`表级 IX 锁` https://github.com/n0099/open-tbm/issues/32#issuecomment-1401223094 对`SELECT`出既有的 record 来`UPDATE ... SET a|emark = ... WHERE unique 约束或 PK`再对不存在的 record 进行`INSERT`此时可以保证`INSERT`不会返回`ERROR 1062 (23000): Duplicate entry`因为此前的`SELECT ... FOR UPDATE`已经上了`表级 IX 锁` https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html#innodb-intention-locks 而此时其他`INSERT`需要`行级 X 锁`就会阻塞等待`表级 IX 锁`释放
> Before a transaction can acquire an exclusive lock on a row in a table, it must first acquire an IX lock on the table.
h0099
115 天前
如果单独只看您发的那坨`SHOW ENGINE INNODB STATUS`deadlock detection log 的话那当时场景就是
session2 先执行因而持有着`(bid, sid, oid)`
```
RECORD LOCKS space id 270 page no 6338 n bits 352 index idx_bid_sid_oid of table db.tb trx id 7367656999 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;;
Record lock, heap no 13 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
```
上的末端`(某值, +∞)`[`nextkey`]( https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html#innodb-next-key-locks)X 锁因为有[`hex 73757072656d756d; asc supremum`]( https://en.wikipedia.org/wiki/Infimum_and_supremum) https://dev.mysql.com/blog-archive/innodb-data-locking-part-2-5-locks-deeper-dive/

而后执行的 session1 也在尝试 acquire`(bid, sid, oid)=('444444', '555555', '666666')`上的 recordlockX 锁但由于其位于 session2 的`nextkey`X 锁范围中而等待
```
RECORD LOCKS space id 270 page no 6338 n bits 352 index idx_bid_sid_oid of table db.tb trx id 7367657071 lock_mode X waiting Record lock, heap no 13 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
```

当 session1 在等待时 session2 也开始等待 PK (但下面的约束不是在`PRIMARY KEY (id)`上而是`INSERT`中的其他字段可能因为其是`AUTO_INCREMENT`AI 而您也没显式指定其值所以无法`WHERE`)`(bid, sid, oid, emark, amark, bmark, cmark)=('444444', '555555', '666666', '00', '00', '00', '00')`上的[recordlock(有`but not gap`)]( https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html#innodb-record-locks)X
```
RECORD LOCKS space id 270 page no 7273 n bits 88 index PRIMARY of table db.tb trx id 7367656999 lock_mode X locks rec but not gap waiting
[...]
3: len 25; hex 4f4243574830315a3032363631323032333132323241303536; asc 444444;;
4: len 10; hex 30313939373333303331; asc 555555;;
5: len 6; hex 363733353839; asc 666666;;
[...]
```
所以是互相等待对方而被 deadlock detection 并决定 rollback 后来的 session1 让 session2 先过
```
2023-12-28T19:00:05.140247+08:00 3180165 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (1)
```
alexfarm
110 天前
@h0099 感谢大佬,前几天没登录,我先慢慢看你写的,谢谢!

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/1008054

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX