专职DBA-MySQL故障排查集锦

时间:2020-04-12
本文章向大家介绍专职DBA-MySQL故障排查集锦,主要包括专职DBA-MySQL故障排查集锦使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。
专职DBA-MySQL故障排查集锦

利用query rewrite plugin做应急SQL优化&降级
too many connection处理&8.0解决方案
利用strace分析数据库运行原理
利用sys库做性能分析
数据库crash分析方法
MySQL性能问题排查的三板斧


-- 利用query rewrite plugin做应急SQL优化&降级
ProxySQL支持
MySQL 5.7后支持
[dba@localhost:mysql.sock] [(none)]> show plugins;

db01 [/usr/local/mysql/share] 2020-02-04 16:50:14
root@pts/2 # mysql.dba <./install_rewriter.sql

db01 [/usr/local/mysql/share] 2020-02-04 16:51:16
root@pts/2 # mysql.dba

[dba@localhost:mysql.sock] [(none)]> show plugins;
+----------------------------+----------+--------------------+-------------+---------+
| Name                       | Status   | Type               | Library     | License |
+----------------------------+----------+--------------------+-------------+---------+
......
......
| Rewriter                   | ACTIVE   | AUDIT              | rewriter.so | GPL     |
+----------------------------+----------+--------------------+-------------+---------+
45 rows in set (0.00 sec)

[dba@localhost:mysql.sock] [(none)]> show global variables like 'rewriter_enabled';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| rewriter_enabled | ON    |
+------------------+-------+
1 row in set (0.00 sec)

vim my.cnf
[mysqld]
rewriter_enabled = on
说明:如果想禁用该特性,只需要使用相应的$basedir/share/uninstall_rewriter.sql

[dba@localhost:mysql.sock] [(none)]> show databases like "query_rewrite";
+--------------------------+
| Database (query_rewrite) |
+--------------------------+
| query_rewrite            |
+--------------------------+
1 row in set (0.00 sec)

-- too many connection处理&8.0解决方案
MySQL 5.x
    max_user_connections < max_connections
    max_connections      = 1000
    max_user_connections = 800
MySQL 8.0
    admin_port
    admin_address
    create_admin_listener_tread

thread_running特别高: pt-kill / max_execution_time
thread_pool
thread_running < core * 1.5

数据库能有多少连接
max_user_connections * 用户名个数 = conn
max_connections > max_user_connections * 用户名个数

-- 利用strace分析数据库运行原理
最简单使用
strace -T -tt -o /tmp/strace.log CMD
strace -T -tt CMD 2>&1 |tee /tmp/strace.log
strace -T -tt -s 100 -o /tmp/strace.log CMD
strace -T -tt -s 100 -ff -o /tmp/strace.log CMD
strace -T -tt -s 100 -e trace=XXXX /tmp/strace.log CMD

操作系统查看
db01 [~] 2020-02-04 17:42:30
root@pts/1 # ps -T `pidof mysqld`

利用pstack
db01 [~] 2020-02-04 17:43:40
root@pts/1 # pstack `pidof mysqld`

原理:
db01 [~] 2020-02-04 17:47:33
root@pts/1 # gdb -p `pidof mysqld`
里面调用bs打印信息。

推荐MySQL 5.7以上的版本
[dba@localhost:mysql.sock] [(none)]> select thread_id, name from performance_schema.threads;

MySQL内部的连接怎么跟系统上的进程对应上
db01 [~] 2020-02-04 18:08:42
root@pts/1 # mysql.dba

[dba@localhost:mysql.sock] [(none)]> use performance_schema;
Database changed

[dba@localhost:mysql.sock] [performance_schema]> desc threads;
+---------------------+---------------------+------+-----+---------+-------+
| Field               | Type                | Null | Key | Default | Extra |
+---------------------+---------------------+------+-----+---------+-------+
| THREAD_ID           | bigint(20) unsigned | NO   |     | NULL    |       |
| NAME                | varchar(128)        | NO   |     | NULL    |       |
| TYPE                | varchar(10)         | NO   |     | NULL    |       |
| PROCESSLIST_ID      | bigint(20) unsigned | YES  |     | NULL    |       |
| PROCESSLIST_USER    | varchar(32)         | YES  |     | NULL    |       |
| PROCESSLIST_HOST    | varchar(60)         | YES  |     | NULL    |       |
| PROCESSLIST_DB      | varchar(64)         | YES  |     | NULL    |       |
| PROCESSLIST_COMMAND | varchar(16)         | YES  |     | NULL    |       |
| PROCESSLIST_TIME    | bigint(20)          | YES  |     | NULL    |       |
| PROCESSLIST_STATE   | varchar(64)         | YES  |     | NULL    |       |
| PROCESSLIST_INFO    | longtext            | YES  |     | NULL    |       |
| PARENT_THREAD_ID    | bigint(20) unsigned | YES  |     | NULL    |       |
| ROLE                | varchar(64)         | YES  |     | NULL    |       |
| INSTRUMENTED        | enum('YES','NO')    | NO   |     | NULL    |       |
| HISTORY             | enum('YES','NO')    | NO   |     | NULL    |       |
| CONNECTION_TYPE     | varchar(16)         | YES  |     | NULL    |       |
| THREAD_OS_ID        | bigint(20) unsigned | YES  |     | NULL    |       |
+---------------------+---------------------+------+-----+---------+-------+
17 rows in set (0.00 sec)

[dba@localhost:mysql.sock] [performance_schema]> show processlist;
+------+------+-----------+--------------------+---------+------+----------+------------------+
| Id   | User | Host      | db                 | Command | Time | State    | Info             |
+------+------+-----------+--------------------+---------+------+----------+------------------+
| 1751 | dba  | localhost | performance_schema | Query   |    0 | starting | show processlist |
+------+------+-----------+--------------------+---------+------+----------+------------------+
1 row in set (0.00 sec)

[dba@localhost:mysql.sock] [performance_schema]> select * from performance_schema.threads where processlist_id='1751'\G
*************************** 1. row ***************************
          THREAD_ID: 1776
               NAME: thread/sql/one_connection
               TYPE: FOREGROUND
     PROCESSLIST_ID: 1751
   PROCESSLIST_USER: dba
   PROCESSLIST_HOST: localhost
     PROCESSLIST_DB: performance_schema
PROCESSLIST_COMMAND: Query
   PROCESSLIST_TIME: 0
  PROCESSLIST_STATE: Sending data
   PROCESSLIST_INFO: select * from performance_schema.threads where processlist_id='1751'
   PARENT_THREAD_ID: NULL
               ROLE: NULL
       INSTRUMENTED: YES
            HISTORY: YES
    CONNECTION_TYPE: Socket
       THREAD_OS_ID: 13005
1 row in set (0.00 sec)

db01 [~] 2020-02-04 18:14:02
root@pts/2 # strace -T -tt -s 100 -ff -o /tmp/strace.log -p 13005
strace: Process 13005 attached with 29 threads

db01 [~] 2020-02-04 18:15:55
root@pts/3 # cd /tmp/
db01 [/tmp] 2020-02-04 18:15:58
root@pts/3 # ll
total 116
-rw-r--r-- 1 root root  78 Feb  4 18:16 strace.log.12966
-rw-r--r-- 1 root root  45 Feb  4 18:16 strace.log.12967
-rw-r--r-- 1 root root 499 Feb  4 18:16 strace.log.12968
-rw-r--r-- 1 root root 499 Feb  4 18:16 strace.log.12969
-rw-r--r-- 1 root root 499 Feb  4 18:16 strace.log.12970
-rw-r--r-- 1 root root 499 Feb  4 18:16 strace.log.12971
-rw-r--r-- 1 root root 499 Feb  4 18:16 strace.log.12972
-rw-r--r-- 1 root root 499 Feb  4 18:16 strace.log.12973
-rw-r--r-- 1 root root 499 Feb  4 18:16 strace.log.12974
-rw-r--r-- 1 root root 499 Feb  4 18:16 strace.log.12975
-rw-r--r-- 1 root root 499 Feb  4 18:16 strace.log.12976
-rw-r--r-- 1 root root 499 Feb  4 18:16 strace.log.12977
-rw-r--r-- 1 root root 728 Feb  4 18:16 strace.log.12978
-rw-r--r-- 1 root root 728 Feb  4 18:16 strace.log.12980
-rw-r--r-- 1 root root 728 Feb  4 18:16 strace.log.12981
-rw-r--r-- 1 root root 287 Feb  4 18:16 strace.log.12982
-rw-r--r-- 1 root root 233 Feb  4 18:16 strace.log.12983
-rw-r--r-- 1 root root  60 Feb  4 18:16 strace.log.12984
-rw-r--r-- 1 root root  60 Feb  4 18:16 strace.log.12985
-rw-r--r-- 1 root root  60 Feb  4 18:16 strace.log.12986
-rw-r--r-- 1 root root  60 Feb  4 18:16 strace.log.12987
-rw-r--r-- 1 root root  60 Feb  4 18:16 strace.log.12988
-rw-r--r-- 1 root root 287 Feb  4 18:16 strace.log.12989
-rw-r--r-- 1 root root 287 Feb  4 18:16 strace.log.12990
-rw-r--r-- 1 root root  60 Feb  4 18:16 strace.log.12991
-rw-r--r-- 1 root root  62 Feb  4 18:16 strace.log.12992
-rw-r--r-- 1 root root  60 Feb  4 18:16 strace.log.12993
-rw-r--r-- 1 root root  78 Feb  4 18:16 strace.log.13005
-rw-r--r-- 1 root root  63 Feb  4 18:16 strace.log.13797

db01 [/tmp] 2020-02-04 18:17:31
root@pts/3 # ls -l *13005
-rw-r--r-- 1 root root 78 Feb  4 18:16 strace.log.13005

db01 [/tmp] 2020-02-04 18:18:09
root@pts/3 # tail -f strace.log.13005
18:16:02.840520 restart_syscall(<... resuming interrupted restart_syscall ...>

[dba@localhost:mysql.sock] [performance_schema]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| app01              |
| app02              |
| app03              |
| mysql              |
| performance_schema |
| query_rewrite      |
| sys                |
+--------------------+
8 rows in set (0.00 sec)

db01 [~] 2020-02-04 18:54:50
root@pts/4 # pidof mysqld
12966

db01 [~] 2020-02-04 18:55:51
root@pts/4 # ls -l /proc/12966/fd/
total 0
lrwx------ 1 root root 64 Jan 30 02:02 0 -> /dev/pts/0 (deleted)
l-wx------ 1 root root 64 Jan 30 02:02 1 -> /data/mysql/3306/log/error.log
lrwx------ 1 root root 64 Jan 30 02:02 10 -> /data/mysql/3306/innodb_log/ib_logfile2
lrwx------ 1 root root 64 Jan 30 02:02 11 -> /data/mysql/3306/innodb_ts/ibdata1
lrwx------ 1 root root 64 Jan 30 02:02 12 -> /data/mysql/3306/data/general_ts01.ibd
lrwx------ 1 root root 64 Jan 30 02:02 13 -> /data/mysql/3306/tmpdir/ib8FJMzO (deleted)
lrwx------ 1 root root 64 Jan 30 02:02 14 -> /data/mysql/3306/innodb_ts/ibtmp1
lrwx------ 1 root root 64 Jan 30 02:02 15 -> /data/mysql/3306/data/mysql/plugin.ibd
lrwx------ 1 root root 64 Jan 30 02:02 16 -> /data/mysql/3306/data/mysql/help_keyword.ibd
lrwx------ 1 root root 64 Jan 30 02:02 17 -> /data/mysql/3306/data/mysql/time_zone_name.ibd
l-wx------ 1 root root 64 Jan 30 02:02 18 -> /data/mysql/3306/slowlog/slow-query.log
lrwx------ 1 root root 64 Jan 30 02:02 19 -> /data/mysql/3306/data/mysql/innodb_table_stats.ibd
l-wx------ 1 root root 64 Jan 29 12:24 2 -> /data/mysql/3306/log/error.log
lrwx------ 1 root root 64 Jan 30 02:02 20 -> /data/mysql/3306/data/mysql/time_zone.ibd
lrwx------ 1 root root 64 Jan 30 02:02 21 -> socket:[3512944]
lrwx------ 1 root root 64 Jan 30 02:02 22 -> /data/mysql/3306/data/mysql/time_zone_transition.ibd
lrwx------ 1 root root 64 Jan 30 02:02 23 -> /data/mysql/3306/data/mysql/time_zone_transition_type.ibd
lrwx------ 1 root root 64 Jan 30 02:02 24 -> /data/mysql/3306/data/mysql/time_zone_leap_second.ibd
lrwx------ 1 root root 64 Jan 30 02:02 25 -> /data/mysql/3306/data/mysql/innodb_index_stats.ibd
lrwx------ 1 root root 64 Jan 30 02:02 26 -> /data/mysql/3306/data/mysql/slave_relay_log_info.ibd
lrwx------ 1 root root 64 Jan 30 02:02 27 -> /data/mysql/3306/data/mysql/gtid_executed.ibd
lrwx------ 1 root root 64 Jan 30 02:02 28 -> /data/mysql/3306/data/mysql/slave_master_info.ibd
lrwx------ 1 root root 64 Jan 30 02:02 29 -> /data/mysql/3306/data/mysql/slave_worker_info.ibd
lrwx------ 1 root root 64 Jan 30 02:02 3 -> /data/mysql/3306/binlog/mysql-bin.index
lrwx------ 1 root root 64 Jan 30 02:02 30 -> socket:[322516]
lrwx------ 1 root root 64 Jan 30 02:02 31 -> /data/mysql/3306/data/mysql/server_cost.ibd
lrwx------ 1 root root 64 Jan 30 02:02 32 -> socket:[322517]
lrwx------ 1 root root 64 Jan 30 02:02 33 -> /data/mysql/3306/data/mysql/engine_cost.ibd
lrwx------ 1 root root 64 Jan 30 02:02 34 -> /data/mysql/3306/data/sys/sys_config.ibd
lrwx------ 1 root root 64 Jan 30 02:02 35 -> /data/mysql/3306/data/mysql/user.MYI
lrwx------ 1 root root 64 Jan 30 02:02 36 -> /data/mysql/3306/data/app01/t1.ibd
lrwx------ 1 root root 64 Jan 30 02:02 37 -> /data/mysql/3306/data/mysql/user.MYD
lrwx------ 1 root root 64 Jan 30 02:02 38 -> /data/mysql/3306/data/mysql/db.MYI
lrwx------ 1 root root 64 Jan 30 02:02 39 -> /data/mysql/3306/data/mysql/db.MYD
lrwx------ 1 root root 64 Jan 30 02:02 4 -> /data/mysql/3306/innodb_log/ib_logfile0
lrwx------ 1 root root 64 Jan 30 02:02 40 -> /data/mysql/3306/data/mysql/proxies_priv.MYI
lrwx------ 1 root root 64 Jan 30 02:02 41 -> /data/mysql/3306/data/mysql/proxies_priv.MYD
lrwx------ 1 root root 64 Jan 30 02:02 42 -> /data/mysql/3306/data/app01/t2.ibd
lrwx------ 1 root root 64 Jan 30 02:02 43 -> /data/mysql/3306/data/app02/t1.ibd
lrwx------ 1 root root 64 Jan 30 02:02 44 -> /data/mysql/3306/data/mysql/tables_priv.MYI
lrwx------ 1 root root 64 Jan 30 02:02 45 -> /data/mysql/3306/data/mysql/tables_priv.MYD
lrwx------ 1 root root 64 Jan 30 02:02 46 -> /data/mysql/3306/data/mysql/columns_priv.MYI
lrwx------ 1 root root 64 Jan 30 02:02 47 -> /data/mysql/3306/data/app02/t2.ibd
lrwx------ 1 root root 64 Jan 30 02:02 48 -> /data/mysql/3306/data/mysql/columns_priv.MYD
lrwx------ 1 root root 64 Jan 30 02:02 49 -> /data/mysql/3306/data/mysql/procs_priv.MYI
lrwx------ 1 root root 64 Jan 30 02:02 5 -> /data/mysql/3306/tmpdir/ibqlqnjD (deleted)
lrwx------ 1 root root 64 Jan 30 02:02 50 -> /data/mysql/3306/data/mysql/procs_priv.MYD
lrwx------ 1 root root 64 Jan 30 02:02 51 -> /data/mysql/3306/data/app03/t1.ibd
lrwx------ 1 root root 64 Jan 30 02:02 52 -> /data/mysql/3306/data/mysql/servers.ibd
lrwx------ 1 root root 64 Jan 30 02:02 53 -> /data/mysql/3306/data/app03/t2.ibd
lrwx------ 1 root root 64 Jan 30 02:02 54 -> /data/mysql/3306/data/mysql/event.MYI
lrwx------ 1 root root 64 Jan 30 02:02 55 -> /data/mysql/3306/data/app01/t3.ibd
lrwx------ 1 root root 64 Jan 30 02:02 56 -> /data/mysql/3306/data/mysql/event.MYD
lrwx------ 1 root root 64 Feb  1 18:00 58 -> /data/mysql/3306/data/mysql/proc.MYI
lrwx------ 1 root root 64 Feb  1 18:00 59 -> /data/mysql/3306/data/mysql/proc.MYD
lrwx------ 1 root root 64 Jan 30 02:02 6 -> /data/mysql/3306/tmpdir/ibgROYk4 (deleted)
l-wx------ 1 root root 64 Feb  4 18:55 60 -> /data/mysql/3306/binlog/mysql-bin.000010
lrwx------ 1 root root 64 Feb  4 18:55 61 -> /data/mysql/3306/data/query_rewrite/rewrite_rules.ibd
lrwx------ 1 root root 64 Feb  4 18:55 62 -> /data/mysql/3306/data/mysql/func.MYI
lrwx------ 1 root root 64 Feb  4 18:55 63 -> /data/mysql/3306/data/mysql/func.MYD
lrwx------ 1 root root 64 Feb  4 18:55 64 -> /data/mysql/3306/data/mysql/proc.MYD
lrwx------ 1 root root 64 Jan 30 02:02 7 -> /data/mysql/3306/tmpdir/ibO5kAmv (deleted)
lrwx------ 1 root root 64 Jan 30 02:02 8 -> /data/mysql/3306/tmpdir/ibqUMlrn (deleted)
lrwx------ 1 root root 64 Jan 30 02:02 9 -> /data/mysql/3306/innodb_log/ib_logfile1

-- 用pstack分析MySQL
db01 [~] 2020-02-04 19:06:44
root@pts/4 # yum -y install gdb

db01 [~] 2020-02-04 19:09:25
root@pts/2 # pstack `pidof mysqld`

-- 用perf分析MySQL
db01 [~] 2020-02-04 19:12:38
root@pts/2 # yum -y install perf

db01 [~] 2020-02-04 19:16:30
root@pts/2 # perf top
Samples: 738  of event 'cpu-clock', 4000 Hz, Event count (approx.): 184500000 lost: 0/0 drop: 0/0                            
Overhead  Shared Object       Symbol                                                                                         
  11.65%  [kernel]            [k] vsnprintf
   8.27%  [kernel]            [k] kallsyms_expand_symbol.constprop.1
   6.23%  [kernel]            [k] format_decode
   5.15%  libc-2.17.so        [.] __GI_____strtoull_l_internal
   5.01%  perf                [.] rb_next
   4.07%  [kernel]            [k] __memcpy
   4.07%  perf                [.] __dso__load_kallsyms
   3.79%  [kernel]            [k] number.isra.2
   2.57%  [kernel]            [k] string.isra.7
   2.44%  [kernel]            [k] module_get_kallsym
   2.30%  perf                [.] 0x00000000000d5cc4
   1.90%  [kernel]            [k] strnlen
   1.76%  libc-2.17.so        [.] _IO_feof
   1.63%  [kernel]            [k] __do_page_fault
   1.36%  perf                [.] rb_insert_color
   1.22%  libc-2.17.so        [.] __memcpy_sse2
   1.22%  libc-2.17.so        [.] __strcmp_sse42
   1.08%  [kernel]            [k] _raw_spin_unlock_irqrestore
   1.08%  [kernel]            [k] mem_cgroup_charge_common
   1.08%  [kernel]            [k] s_show
   1.08%  [kernel]            [k] system_call_after_swapgs


-- 利用sys库做性能分析
推荐书籍《MySQL千金良方》

-- 数据库crash分析方法
dmesg & /var/log/message
MySQL的error.log

db01 [~] 2020-02-04 19:51:08
root@pts/4 # vim /etc/rsyslog.conf
kern.*                                                 /var/log/kern

db01 [~] 2020-02-04 19:54:40
root@pts/4 # systemctl restart rsyslog.service 


-- MySQL性能问题排查的三板斧
系统状态: CPU, 内存, I/O, 网络。
    CPU:
        top:
        us高,有order by, group by, 排序没索引。
            上去之后,找到这条SQL,加个索引,CPU立马下降到10%。
        sy高,(4-10%),numa没有关闭。numactl --hardware, numastat
            1.设置bios关闭numa.(建议)
            2.修改系统配置文件关闭numa. (grub.conf numa=off)
            3.启动MySQL的时候禁用numa. numactl --interleave=all /usr/local/mysql/bin/mysqld --defaults-file=/data/mysql/3306/conf/my.cnf &
        wa高,即io_wait高,
            # mysql.dba sys
                io_by_thread_by_latency
                io_global_by_file_by_bytes
                io_global_by_file_by_latency
                io_global_by_wait_by_bytes
                io_global_by_wait_by_latency
            desc io_global_by_file_by_bytes;
            select * from io_global_by_file_by_bytes limit 10;
            desc latest_file_io;
            select * from latest_file_io;
            pt-ioprofile
    网卡: 看是不是被打满,dstat, sar, 做网卡绑定。

MySQL状态
[dba@localhost:mysql.sock] [(none)]> show global status like "thread%";
[dba@localhost:mysql.sock] [(none)]> show processlist;
[dba@localhost:mysql.sock] [(none)]> show engine innodb status\G
[dba@localhost:mysql.sock] [(none)]> show global variables;
看slow.log
看vmstat 1 10

原文地址:https://www.cnblogs.com/zhouwanchun/p/12687610.html