2、关于SHOW TABLE STATUS[like 'A'] Waiting for table metadata lock的分析
这是本案例中最重要的一环,SHOW TABLE STATUS[like 'A']居然被堵塞其STATE为Waiting for table metadata lock并且注意这里是table因为MDL LOCK类型分为很多。我在MDL介绍的那篇文章中提到了desc 一个表的时候会上MDL_SHARED_HIGH_PRIO(SH),其实在SHOW TABLE STATUS的时候也会对本表上MDL_SHARED_HIGH_PRIO(SH)。
mysql> SHOW TABLE STATUS like 'a' \G
2017-11-10T03:01:48.142334Z 6 [Note] (acquire_lock)**THIS MDL LOCK acquire WAIT(MDL_LOCK WAIT QUE)!**
2017-11-10T03:01:48.142381Z 6 [Note] (>MDL PRINT) Thread id is 6:
2017-11-10T03:01:48.142396Z 6 [Note] (->MDL PRINT) DB_name is:test
2017-11-10T03:01:48.142409Z 6 [Note] (-->MDL PRINT) OBJ_name is:a
2017-11-10T03:01:48.142421Z 6 [Note] (--->MDL PRINT) Namespace is:TABLE
2017-11-10T03:01:48.142434Z 6 [Note] (----->MDL PRINT) Mdl type is:MDL_SHARED_HIGH_PRIO(SH)
2017-11-10T03:01:48.142447Z 6 [Note] (------>MDL PRINT) Mdl duration is:MDL_TRANSACTION
*************************** 7. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: test
OBJECT_NAME: a
OBJECT_INSTANCE_BEGIN: 140733864665152
LOCK_TYPE: SHARED_HIGH_PRIO
LOCK_DURATION: TRANSACTION
LOCK_STATUS: PENDING
SOURCE: sql_base.cc:2821
OWNER_THREAD_ID: 38
OWNER_EVENT_ID: 1695
两种方式都能观察到MDL_SHARED_HIGH_PRIO(SH)的存在并且我模拟的是处于堵塞情况下的。
但是MDL_SHARED_HIGH_PRIO(SH) 是一个优先级非常高的一个MDL LOCK类型表现如下:
Request | Granted requests for lock |
type | S SH SR SW SWLP SU SRO SNW SNRW X |
----------+---------------------------------------------+
SH | + + + + + + + + + - |
Request | Pending requests for lock |
type | S SH SR SW SU SNW SNRW X |
----------+---------------------------------+
SH | + + + + + + + + |
其被堵塞的条件除了被MDL_EXCLUSIVE(X)堵塞没有其他的可能。那么这就是一个非常重要的突破口。
3、关于CREATE TABLE A AS SELECT B 对A表的加MDL LOCK的分析
这一点也是我以前不知道的,也是本案列中花时间最多的地方,前文已经分析过要让SHOW TABLE STATUS[like 'A']这种只会上MDL_SHARED_HIGH_PRIO(SH) MDL LOCK的语句堵塞在MDL LOCK上只有一种可能那就是A表上了MDL_EXCLUSIVE(X)。那么我开始
怀疑这个DDL语句在语句结束之前会对A表上MDL_EXCLUSIVE(X) ,然后进行实际测试不出所料确实是这样的如下:
2017-11-10T05:38:16.824713Z 4 [Note] (acquire_lock)THIS MDL LOCK acquire ok!
2017-11-10T05:38:16.824727Z 4 [Note] (>MDL PRINT) Thread id is 4:
2017-11-10T05:38:16.824739Z 4 [Note] (->MDL PRINT) DB_name is:test
2017-11-10T05:38:16.824752Z 4 [Note] (-->MDL PRINT) OBJ_name is:a
2017-11-10T05:38:16.824764Z 4 [Note] (--->MDL PRINT) Namespace is:TABLE
2017-11-10T05:38:16.824776Z 4 [Note] (---->MDL PRINT) Fast path is:(Y)
2017-11-10T05:38:16.824788Z 4 [Note] (----->MDL PRINT) Mdl type is:MDL_SHARED(S)
2017-11-10T05:38:16.824799Z 4 [Note] (------>MDL PRINT) Mdl duration is:MDL_TRANSACTION
2017-11-10T05:38:16.825286Z 4 [Note] (upgrade_shared_lock)THIS MDL LOCK upgrade TO
2017-11-10T05:38:16.825312Z 4 [Note] (>MDL PRINT) Thread id is 4:
2017-11-10T05:38:16.825332Z 4 [Note] (->MDL PRINT) DB_name is:test
2017-11-10T05:38:16.825345Z 4 [Note] (-->MDL PRINT) OBJ_name is:a
2017-11-10T05:38:16.825357Z 4 [Note] (--->MDL PRINT) Namespace is:TABLE
2017-11-10T05:38:16.825369Z 4 [Note] (----->MDL PRINT) Mdl type is:MDL_EXCLUSIVE(X)
2017-11-10T05:38:16.825381Z 4 [Note] (------>MDL PRINT) Mdl duration is:MDL_TRANSACTION
*************************** 1. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: test
OBJECT_NAME: a
OBJECT_INSTANCE_BEGIN: 140733998842016
LOCK_TYPE: SHARED
LOCK_DURATION: TRANSACTION
LOCK_STATUS: GRANTED
SOURCE: sql_parse.cc:6314
OWNER_THREAD_ID: 36
OWNER_EVENT_ID: 1553
这里比较遗憾在performance_schema.metadata_locks中并没有显示出MDL_EXCLUSIVE(X),而显示为MDL_SHARED(S) 但是我们在我输出的日志中可以看到这里做了升级操作将MDL_SHARED(S) 升级为了MDL_EXCLUSIVE(X)。并且由前面的兼容性列表来看,只有MDL_EXCLUSIVE(X)会堵塞MDL_SHARED_HIGH_PRIO(SH)。所以我们应该能够确认这里确实做了升级操作,否则SHOW TABLE STATUS[like 'A'] 是不会被堵塞的。
4、关于SELECT * FROM A Waiting for table metadata lock的分析
也许大家认为SELECT不会上锁,但是那是在innodb 层次,在MYSQL层会上MDL_SHARED_READ(SR) 如下:
select * from a;
2017-11-10T03:31:31.209772Z 6 [Note] (acquire_lock)THIS MDL LOCK acquire WAIT(MDL_LOCK WAIT QUE)!
2017-11-10T03:31:31.209824Z 6 [Note] (>MDL PRINT) Thread id is 6:
2017-11-10T03:31:31.209851Z 6 [Note] (->MDL PRINT) DB_name is:test
2017-11-10T03:31:31.209870Z 6 [Note] (-->MDL PRINT) OBJ_name is:a
2017-11-10T03:31:31.209885Z 6 [Note] (--->MDL PRINT) Namespace is:TABLE
2017-11-10T03:31:31.209965Z 6 [Note] (----->MDL PRINT) Mdl type is:MDL_SHARED_READ(SR)
2017-11-10T03:31:31.209985Z 6 [Note] (------>MDL PRINT) Mdl duration is:MDL_TRANSACTION
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: test
OBJECT_NAME: a
OBJECT_INSTANCE_BEGIN: 140733864625136
LOCK_TYPE: SHARED_READ
LOCK_DURATION: TRANSACTION
LOCK_STATUS: PENDING
SOURCE: sql_parse.cc:6314
OWNER_THREAD_ID: 38
OWNER_EVENT_ID: 1764
可以看到确实有MDL_SHARED_READ(SR)的存在,当前处于堵塞状态
其兼容性如下:
Request | Granted requests for lock |
type | S SH SR SW SWLP SU SRO SNW SNRW X |
----------+---------------------------------------------+
SR | + + + + + + + + - - |
显然MDL_SHARED_READ(SR) 和MDL_SHARED_HIGH_PRIO(SH)是不兼容的需要等待。
5、关于DROP TABLE A Waiting for table metadata lock的分析
这一点很好分析因为A表上了X锁而DROP TABLE A必然上MDL_EXCLUSIVE(X)锁它当然和MDL_EXCLUSIVE(X)不兼容。如下:
drop table a;
2017-11-09T10:58:28.673015Z 3 [Note] (acquire_lock)THIS MDL LOCK acquire ok!
2017-11-09T10:58:28.673030Z 3 [Note] (>MDL PRINT) Thread id is 3:
2017-11-09T10:58:28.673042Z 3 [Note] (->MDL PRINT) DB_name is:test
2017-11-09T10:58:28.673054Z 3 [Note] (-->MDL PRINT) OBJ_name is:t10
2017-11-09T10:58:28.673067Z 3 [Note] (--->MDL PRINT) Namespace is:TABLE
2017-11-09T10:58:28.673094Z 3 [Note] (----->MDL PRINT) Mdl type is:MDL_EXCLUSIVE(X)
2017-11-09T10:58:28.673109Z 3 [Note] (------>MDL PRINT) Mdl duration is:MDL_TRANSACTION
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: test
OBJECT_NAME: a
OBJECT_INSTANCE_BEGIN: 140733864625472
LOCK_TYPE: EXCLUSIVE
LOCK_DURATION: TRANSACTION
LOCK_STATUS: PENDING
SOURCE: sql_parse.cc:6314
OWNER_THREAD_ID: 38
OWNER_EVENT_ID: 1832
其中EXCLUSIVE就是我们说的MDL_EXCLUSIVE(X)它确实存在当前处于堵塞
6、为何use db也会堵塞?
如果使用mysql客户端不使用-A选项(或者 no-auto-rehash)在USE DB的时候至少要做如下事情:
1、 对db下每个表上MDL (SH) lock如下(调用MDL_context::acquire_lock 这里给出堵塞时候的信息):
use test
2017-11-10T03:46:50.223628Z 5 [Note] (acquire_lock)THIS MDL LOCK acquire WAIT(MDL_LOCK WAIT QUE)!
2017-11-10T03:46:50.223666Z 5 [Note] (>MDL PRINT) Thread id is 5:
2017-11-10T03:46:50.223696Z 5 [Note] (->MDL PRINT) DB_name is:test
2017-11-10T03:46:50.223714Z 5 [Note] (-->MDL PRINT) OBJ_name is:a
2017-11-10T03:46:50.223725Z 5 [Note] (--->MDL PRINT) Namespace is:TABLE
2017-11-10T03:46:50.223735Z 5 [Note] (----->MDL PRINT) Mdl type is:MDL_SHARED_HIGH_PRIO(SH)
2017-11-10T03:46:50.223755Z 5 [Note] (------>MDL PRINT) Mdl duration is:MDL_TRANSACTION
*************************** 7. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: test
OBJECT_NAME: a
OBJECT_INSTANCE_BEGIN: 140733797429008
LOCK_TYPE: SHARED_HIGH_PRIO
LOCK_DURATION: TRANSACTION
LOCK_STATUS: PENDING
SOURCE: sql_base.cc:2821
OWNER_THREAD_ID: 37
OWNER_EVENT_ID: 187
可以看到USE DB确实也因为MDL_SHARED_HIGH_PRIO(SH) 发生了堵塞。
2、对每个表加入到table cache,并且打开表(调用open_table_from_share())
那么这种情况就和SHOW TABLE STATUS[like 'A']被堵塞的情况一模一样了,也是由于MDL 锁不兼容造成的。
三、分析梳理
有了前面的分析那么我们可以梳理这个故障发生的原因如下:
1、有一个在B表上长期未提交的DML
语句会在innodb层对B表某些数据加innodb row lock。
2、由步骤1引起了CREATE TABLE A AS SELECT B的堵塞
因为RR模式下SELECT B必然对B表上满足的数据上锁,因为步骤1已经加锁所以触发等待,STATE为sending data。
3、由步骤2引起了其他语句的堵塞
因为CRATE TABLE A AS SELECT B在A表建立完成之前会上MDL_EXCLUSIVE(X),这把锁会堵塞其他全部的关于A表的语句,包括DESC/SHOW TABLE STATUS/USE DB(非-A) 这种只上MDL_SHARED_HIGH_PRIO(SH)MDL LOCK 的语句。STATE统一为Waiting for table metadata lock。
四、模拟测试
测试环境:
5.7.14
GITD关闭
RR隔离级别
使用脚本:
create table b (id int);
insert into b values(1);
set global innodb_lock_wait_timeout=1000;
UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME ='global_instrumentation';
UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' WHERE NAME ='wait/lock/metadata/sql/mdl';
select * from performance_schema.metadata_locks\G
(请重新连接让参数生效)