小心区间锁

某日收到项目监控系统报警, 经查询业务日志后发现, 是数据库中发生一处死锁触发业务异常回滚导致的.

执行 show engine innodb status; 得到如下死锁日志(相关数据已做脱敏处理):

------------------------
LATEST DETECTED DEADLOCK  
------------------------
2018-01-01 20:09:11 7fa66d7be700  
*** (1) TRANSACTION:
TRANSACTION 138251491, ACTIVE 0.152 sec inserting  
mysql tables in use 1, locked 1  
LOCK WAIT 12 lock struct(s), heap size 2936, 7 row lock(s), undo log entries 6  
LOCK BLOCKING MySQL thread id: 84192442 block 17087795  
MySQL thread id 17087795, OS thread handle 0x7fa6897a0700, query id 2011577417 172.1.1.1 exchange update  
insert into orders (id, order_no, account_no,  
      account_type, level, symbol,
      action, order_type, price_limit,
      price_average, quantity, quantity_remaining,
      amount, amount_remaining, fee,
      fee_rate, status, state,
      from_client_type, utc_create, utc_update
      )
    values (null, 2222222, 3333333,
      1, 1, 'XXX',
      2, 1, 2000.00,
      0, 1.5000, 1.5000,
      0, 0, 0,
      0.00000000000, 0, 0,
      null, '2018-01-01 12:09:11.532', '2018-01-01 12:09:11.532'
      )
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 313 page no 182549 n bits 1000 index `uniq_order_no` of table `xxx`.`orders` trx id 138251491 lock_mode X insert intention waiting  
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0  
 0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) TRANSACTION:
TRANSACTION 138251502, ACTIVE 0.026 sec starting index read  
mysql tables in use 1, locked 1  
10 lock struct(s), heap size 2936, 10 row lock(s), undo log entries 3  
MySQL thread id 84192442, OS thread handle 0x7fa66d7be700, query id 2011577436 172.1.1.1 exchange statistics


SELECT id,account_no ,user_no , account_type ,currency ,amount ,freeze_amount ,status ,  
                        create_at ,update_at
                        FROM account_info
          WHERE ACCOUNT_TYPE = 'GENERAL_ACCOUNT' AND USER_NO = 1111111 AND CURRENCY = 'USD'  for update

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 313 page no 182549 n bits 1000 index `uniq_order_no` of table `xxx`.`orders` trx id 138251502 lock_mode X  
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0  
 0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 110 page no 11 n bits 448 index `uniq_user_no_account_type_currency` of table `xxx`.`accounts` trx id 138251502 lock_mode X locks rec but not gap waiting  
Record lock, heap no 105 PHYSICAL RECORD: n_fields 4; compact format; info bits 0  
 0: len 8; hex 9602ffaf05a03001; asc       0 ;;
 1: len 15; hex 47454e4552414c5f4143434f554e54; asc GENERAL_ACCOUNT;;
 2: len 3; hex 555344; asc USD;;
 3: len 8; hex 8000000000000086; asc         ;;

*** WE ROLL BACK TRANSACTION (2)

结合该日志可以看出, 发生死锁的语句发生在 orders 表和 accounts 表中, 根据日志内容可有如下分析结果:

[事务1] 在向 orders 表中插入数据时, 无法获得在 uniq_order_no 索引上该位置的排它锁, 说明该位置所在的区间已被其他事务持有

[事务2] 在获取 accounts 表上某行的排他锁时被阻塞, 但 [事务2] 持有了 [事务1] 所需要申请的在 orders 表中某位置的锁

可以推测出 [事务1] 在向 orders 表插入数据前, 也正好持有了 [事务2] 在 accounts 所需要的锁, 死锁因此发生

因为是在数据插入 orders 表之前发生了唯一键 order_no 上的锁被占用情况, 所以可以推测出, 该问题是由于 order_no=2222222 所处的区间(GAP)被锁导致的! 也就是说, 在 [事务2] 所涉及的业务里, 在锁账务之前的业务流程中, 发生了可能会锁住 orders 表中一段区间的SQL语句.

结合具体业务场景, 我们可作出如下推测:

事务1: 挂单业务. 具体流程如下: 冻结A用户账户 → 向 orders 表中插入一条记录.

事务2: 撮合业务. 具体流程如下: 查询某委托单A的对手单(B,C,D) → 为(A,B,C,D)加锁, 当A未被锁上时业务回滚, 当BCD未完全被锁上时, 业务可继续 → 冻结相关账户等

死锁点: 在事务2中, 当在锁BCD前其中有撤单,且撤掉的单是 orders 中的最后一条数据时,该位置到正无穷的区间会被锁住. 这样事务1在向 orders 被锁住的区间插入新数据时会被阻塞, 而事务1又可能刚好持有了事务2中相关账户的排它锁. 死锁因此形成.

具体死锁过程, 可通过以下例子来模拟:

建表:

CREATE TABLE `accounts` (  
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
  `number` int(10) NOT NULL COMMENT '编号',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uniq_number` (`number`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `accounts` (`id`, `number`)  
VALUES  
    (1,1),
    (2,2),
    (3,5);

CREATE TABLE `orders` (  
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
  `number` int(10) NOT NULL COMMENT '编号',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uniq_number` (`number`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `orders` (`id`, `number`)  
VALUES  
    (1,1),
    (2,7);

事务过程:

死锁日志如下:

------------------------
LATEST DETECTED DEADLOCK  
------------------------
2018-02-10 07:54:33 0x7f61545c6700  
*** (1) TRANSACTION:
TRANSACTION 10253692, ACTIVE 25 sec inserting  
mysql tables in use 1, locked 1  
LOCK WAIT 4 lock struct(s), heap size 1136, 5 row lock(s)  
MySQL thread id 21, OS thread handle 140056004962048, query id 21576 172.19.0.1 root update  
insert into orders values(null, 10)  
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 6104 page no 3 n bits 72 index PRIMARY of table `test`.`orders` trx id 10253692 lock_mode X insert intention waiting  
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0  
 0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) TRANSACTION:
TRANSACTION 10253693, ACTIVE 21 sec starting index read  
mysql tables in use 1, locked 1  
4 lock struct(s), heap size 1136, 4 row lock(s)  
MySQL thread id 26, OS thread handle 140056003897088, query id 21577 172.19.0.1 root Sending data  
select * from accounts where number=1 for update  
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 6104 page no 3 n bits 72 index PRIMARY of table `test`.`orders` trx id 10253693 lock_mode X  
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0  
 0: len 8; hex 73757072656d756d; asc supremum;;

Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0  
 0: len 4; hex 00000001; asc     ;;
 1: len 6; hex 0000009c7575; asc     uu;;
 2: len 7; hex ae000001e90110; asc        ;;
 3: len 4; hex 80000001; asc     ;;

Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0  
 0: len 4; hex 00000002; asc     ;;
 1: len 6; hex 0000009c7575; asc     uu;;
 2: len 7; hex ae000001e9011d; asc        ;;
 3: len 4; hex 80000007; asc     ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 6103 page no 3 n bits 72 index PRIMARY of table `test`.`accounts` trx id 10253693 lock_mode X waiting  
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0  
 0: len 4; hex 00000001; asc     ;;
 1: len 6; hex 0000009c756c; asc     ul;;
 2: len 7; hex a7000004a10110; asc        ;;
 3: len 4; hex 80000001; asc     ;;

*** WE ROLL BACK TRANSACTION (2)

在交易所的具体业务中, 出现该死锁需要满足以下几个条件:

  1. 账户A撤回了刚刚挂上的委托单, 且该单是 orders 表中最新的一个
  2. 撤单的前一个瞬间该委托单作为被动方被稍早时间插入的对手单匹配
  3. 在这笔撮合没有完成的极短时间范围内, 账户A又向 orders 表中挂入一个新单(出现概率低,且普通用户操作很难实现)
Show Comments