MySQL 5.7中锁的一个通用问题

时间:2022-05-05
本文章向大家介绍MySQL 5.7中锁的一个通用问题,主要内容包括其使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。

前几天分析了一个死锁的问题,有一个网友看了以后,就发了邮件给我问一个问题。一般来说,能够发送邮件提出问题的同学,都是很认真的,因为他要准备好日志,准备好操作过程,准备好他已经在做的事情。所以这类问题,我都会认真的分析一下,如果没有结果,那就继续分析再等等,掐指一算,有很多问题已经拖了好久了。

这位网友提的一个问题,我看了以后感觉很是奇怪,因为有些颠覆我对MySQL锁的一些认识。这该如何是好。

这个环境的事务隔离级别是RR,存在主键,存在范围查询。

如何复现这个问题,网友提供了信息。

创建表
mysql> create table tt(a int not null primary key) engine=innodb;
mysql> insert into tt values(10),(20),(30),(40),(50);
复现这个问题可以参考:
session1:
mysql> set session tx_isolation='repeatable-read';
mysql> begin;
mysql> select * from tt where a > 15 and a < 35 for update;
+----+
| a  |
+----+
| 20 |
| 30 |
+----+
session2:
mysql>  insert into tt select 1;
此时这个操作会被阻塞,如果你按照这个思路来看,总是会感觉不对劲。
怎么MySQL这么矫情了。
我带着疑问在新搭建的一套MySQL 5.7环境上做了测试,结果还真是。
接下来的任务就是如何说服我,然后我理解了来说服这个网友。

结果这样一个操作下来,我连连测试了5个场景,如何SQL稍作改变,结果又会大大不同。

#for update的场景1

先来做一个基于主键的操作。先来验证一个最基本的情况,稳定下自己的情绪。

#session1

mysql> begin;
Query OK, 0 rows affected (0.00 sec)
 
mysql> select *from tt;
+----+
| a  |
+----+
| 10 |
| 20 |
| 30 |
| 40 |
| 50 |
+----+
5 rows in set (0.00 sec)

mysql> select * from tt where a =10 for update;
+----+
| a  |
+----+
| 10 |
+----+
1 row in set (0.00 sec)

#session2
mysql> insert into tt select 1;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

这是一个最为保守的使用方法,如果这个还有问题,那就明显证明数据库有问题了,基于主键,去掉范围扫描,肯定妥妥的。

#for update的场景2

这个场景里面我们修改下范围,原来的(15,35)修改为(10,30),结果差别就很大了。有些阻塞的语句我直接就手工取消了。由此也可以看出其中的差别来,不过可能会看得有点懵了。

#session1
mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from tt where a <30 and a>10 for update;
+----+
| a  |
+----+
| 20 |
+----+
1 row in set (0.00 sec)

session2:
mysql> insert into tt select 35;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into tt select 31;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into tt select 30;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted

mysql> insert into tt select 5;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into tt select 10;
ERROR 1062 (23000): Duplicate entry '10' for key 'PRIMARY'
mysql> insert into tt select 11;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted

这里可以明显感觉到和当前环境的数据分布关系很微妙,范围只有一个20,但是似乎和0有着一定的联系,至少,我不能保证我的查询一定得按照这个精确的范围。

#for update 场景3

这个场景我把最开始碰到的问题做了一些扩展,看看其它范围的数据是否也有类似的情况。我扩大了数据范围,结果很明显的,结果让我有些意料之外。

session1:
mysql> select *from tt;
+----+
| a  |
+----+
| 10 |
| 20 |
| 30 |
| 40 |
| 50 |
+----+
5 rows in set (0.00 sec)
mysql> begin;select * from tt where a <35 and a>15 for update;
Query OK, 0 rows affected (0.00 sec)

+----+
| a  |
+----+
| 20 |
| 30 |
+----+
2 rows in set (0.00 sec)

session2:
mysql>  insert into tt select 1;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 9;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 10;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 35;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 36;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 40;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 50;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted

#for update 场景4

尽管这个时候我已经有些乱了,但是我还是耐着性子测试了另外几个场景。我把范围有(15,35)修改为(15,30),结果让我很意外。原本阻塞的insert就可以了。


session1
mysql> begin;select * from tt where a <30 and a>15 for update;
Query OK, 0 rows affected (0.00 sec)

+----+
| a  |
+----+
| 20 |
+----+
1 row in set (0.00 sec)

session2
mysql>  insert into tt select 1;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql>  insert into tt select 15;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 15;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 16;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 14;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 13;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 11;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 10;
ERROR 1062 (23000): Duplicate entry '10' for key 'PRIMARY'
mysql>  insert into tt select 9;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0
mysql>  insert into tt select 30;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 31;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings:

这一切的对比,从直观感受来看是和表里的数据分布是有一定的关系的。

比如场景4,如果我把范围由(15,35)修改为(15,30),这个数据的情况有什么特别之处吗,从我的猜测来看,应该是和里面的索引存储有一定的关系,我查看了Information_schema.innodb_trx,innodb_locks的细节,里面都是指向了同一行。

mysql> select * from INFORMATION_SCHEMA.innodb_locksG;
*************************** 1. row ***************************
    lock_id: 4081:36:3:2
lock_trx_id: 4081
  lock_mode: X,GAP
  lock_type: RECORD
 lock_table: `test`.`tt`
 lock_index: PRIMARY
 lock_space: 36
  lock_page: 3
   lock_rec: 2
  lock_data: 10
*************************** 2. row ***************************
    lock_id: 4078:36:3:2
lock_trx_id: 4078
  lock_mode: X
  lock_type: RECORD
 lock_table: `test`.`tt`
 lock_index: PRIMARY
 lock_space: 36
  lock_page: 3
   lock_rec: 2
  lock_data: 10
2 rows in set, 1 warning (0.00 sec)

通过上面的信息可以看到,都是只想了页面3的数据第2行,这个明显就不对应啊。

但是MySQL 5.7中出现这个问题,自己还是带着一丝的侥幸心理,在MGR上测试了一把,能够复现,结果今天继续耐着性子看了下这个问题,在5.6上模拟了一下,5.6全然没有这个问题,问题到了这里,就有了柳暗花明的一面,能够肯定的是这个问题在MySQL 5.7中可以复现,在MySQL 5.6中是正常的。

如此一来,问题的定论就有了方向,很快就在bugs.mysql.com里面找到了一个相关的bug(85749)

里面也做了类似的测试,能够复现,MySQL官方做了确认。

[31 Mar 18:10] Sinisa Milivojevic

Hi!
I have run your test case and got the same results as you have.
Upon further analysis, I concluded that this is a bug.  A small bug , but a bug.
Verified.
而有看点的是问题的提出者定位到了相关的代码,还是希望文档的部分能够把间隙锁的部分补充一下。
No locks are released in this case, but we do request X lock on the gap before the next, non-matching record when non-unique secondary index is used. Check code starting from this line (https://github.com/mysql/mysql-server/blob/71f48ab393bce80a59e5a2e498cd1f46f6b43f9a/storag...):
			/* Try to place a gap lock on the next index record
			to prevent phantoms in ORDER BY ... DESC queries */
			const rec_t*	next_rec = page_rec_get_next_const(rec);

			offsets = rec_get_offsets(next_rec, index, offsets,
						  ULINT_UNDEFINED, &heap);
			err = sel_set_rec_lock(pcur,
					       next_rec, index, offsets,
					       prebuilt->select_lock_type,
LOCK_GAP, thr, &mtr);

in row_search_mvcc(). See the (potential) reason to set this gap lock in the comment above.
Maybe there is another reason for the behavior we see. Then it should be also documented.