MySQL 5.7 X Plugin:流水线技术vs.并行查询技术

时间:2022-04-28
本文章向大家介绍MySQL 5.7 X Plugin:流水线技术vs.并行查询技术,主要内容包括其使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。

本文将以MySQL 5.7 X Plugin为例,对比分析流水线(pipelining)和并行查询技术。

另一篇博文《MySQL 5.7 X Plugin支持异步查询》(Asynchronous Query Execution with MySQL 5.7 X Plugin),介绍了运行MySQL 5.7 X Plugin的方法:

  • Hash分区
  • 开放MySQL的CPU内核数连接

由于5.7 X Plugin只支持流水线技术(缩短往返延时),且不支持MySQL连接复用(MySQL在执行单项查询时,不启动CPU多核),所以需要手动操作(包括最终的结果排序)。

TL:DR; 版本

本文将分析MySQL 5.7 X Plugin、X协议和文档存储,并加以总结:

  1. 5.7 X Plugin不支持MySQL连接/会话复用。类似初始协议,每条X插件的连接都会打开一次MySQL会话;
  2. 一条5.7 X Plugin指令(在库支持的情况下)会立刻返回,且查询终止之前,仍可继续操作(异步调用)。MySQL以队列形式运行;
  3. 5.7 X Plugin没有附加的服务器级持久性配置。若不进行核查或等待服务器确认(异步),则不能确保数据写入MySQL(“fire and forget”模式)。

同时,X协议适用于以下情况:

  • 在MySQL表锁定时,想要实现一个异步客户端(不希望限制网络通信,如下载或API调用);
  • 想要使用MySQL队列,缩短往返延时。

Benchmark评价结果:流水线vs.并行vs.单项查询

多组测试对比流水线、并行处理和单项查询,结论如下:

1. 并行查询支持NodeJS:

$ time node async_wikistats.js
...
All done! Total: 17753
...
real    0m30.668s
user    0m0.256s
sys     0m0.028s

2. 流水线技术支持NojeJS:

$ time node async_wikistats_pipeline.js
...
All done! Total: 17753
...
real 5m39.666s
user 0m0.212s
sys 0m0.024s

测试中针对NojeJS环境下运行的流水线,复用相同连接(并且不对线程开放新连接)。

3. 直接查询——分区表:

mysql> select sum(tot_visits) from wikistats.wikistats_by_day_spark_part where url like ‘%postgresql%’;
+-----------------+
| sum(tot_visits) |
+-----------------+
| 17753           |
+-----------------+
1 row in set (5 min 31.44 sec)

4. 直接查询——非分区表

mysql> select sum(tot_visits) from wikistats.wikistats_by_day_spark where url like ‘%postgresql%’;
+-----------------+
| sum(tot_visits) |
+-----------------+
| 17753           |
+-----------------+
1 row in set (4 min 38.16 sec)

5.7 X Plugin流水线技术的优点:

尽管5.7 X Plugin的流水线技术没有增加太多查询响应时间(可以降低总延迟),其仍适用于某些场合,例如从网上下载东西的时候,需要缩减下载进程和文档元数据,这时就可使用youtube-dl进行搜索并下载YouTube的视频元数据。然后,将原数据JSON存入MySQL 5.7文本库。代码如下:

var mysqlx = require('mysqlx');
# This is the same as running $ youtube-dl -j -i ytsearch100:"mysql 5.7"
const spawn = require('child_process').spawn;
const yt = spawn('youtube-dl', ['-j', '-i', 'ytsearch100:"mysql 5.7"'], {maxBuffer: 1024 * 1024 * 128});
var mySession =
mysqlx.getSession({
    host: 'localhost',
    port: 33060,
    dbUser: 'root',
    dbPassword: '<your password>'
});
yt.stdout.on('data', (data) => {
        try {
                dataObj = JSON.parse(data);
                console.log(dataObj.fulltitle);
                mySession.then(session => {
                                                session.getSchema("yt").getCollection("youtube").add(  dataObj  )
                                                .execute(function (row) {
                                                }).catch(err => {
                                                        console.log(err);
                                                })
                                                .then( function (notices) { console.log("Wrote to MySQL: " + JSON.stringify(notices))  });
                                }).catch(function (err) {
                                              console.log(err);
                                              process.exit();
                                });
        } catch (e) {
                console.log(" --- Can't parse json" + e );
        }
});
yt.stderr.on('data', (data) => {
  console.log("Error receiving data");
});
yt.on('close', (code) => {
  console.log(`child process exited with code ${code}`);
  mySession.then(session => {session.close() } );
});

如上例所示,执行二进制youtube-dl(需要提前安装)来搜索MySQL 5.7相关视频。无需下载视频,只需获取JSON格式的元数据(“-j”flag)。因为是JSON格式,所以可以保存至MySQL文本库。表架构如下:

CREATE TABLE `youtube` (
  `doc` json DEFAULT NULL,
  `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL,
  UNIQUE KEY `_id` (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

执行案例:

$ node yt.js
What's New in MySQL 5.7
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["3f312c3b-b2f3-55e8-0ee9-b706eddf"]}}
MySQL 5.7: MySQL JSON data type example
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["88223742-9875-59f1-f535-f1cfb936"]}}
MySQL Performance Tuning: Part 1. Configuration (Covers MySQL 5.7)
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["c377e051-37e6-8a63-bec7-1b81c6d6"]}}
Dave Stokes — MySQL 5.7 - New Features and Things That Will Break — php[world] 2014
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["96ae0dd8-9f7d-c08a-bbef-1a256b11"]}}
MySQL 5.7 & JSON: New Opportunities for Developers - Thomas Ulin - Forum PHP 2015
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["ccb5c53e-561c-2ed5-6deb-1b325739"]}}
Cara Instal MySQL 5.7.10 NoInstaller pada Windows Manual Part3
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["95efbd79-8d79-e7b6-a535-271640c8"]}}
MySQL 5.7 Install and Configuration on Ubuntu 14.04
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["b8cfe132-aca4-1eba-c2ae-69e48db8"]}}

有趣的是:当NodeJS + X Plugin = Asynchronous + Pipelining时,一旦表被锁定,程序操作不会停止。现已打开两个会话:

  • session 1: $ node yt.js > test_lock_table.log
  • session 2:
mysql> lock table youtube read; select sleep(10); unlock tables;
Query OK, 0 rows affected (0.00 sec)
+-----------+
| sleep(10) |
+-----------+
|         0 |
+-----------+
1 row in set (10.01 sec)
Query OK, 0 rows affected (0.00 sec)

由此可见,前两项是即时写入。随后,表被锁定,且无MySQL查询。同时,下载进程(此处为速度最慢的环节)继续进行,没有锁定(上述titles没有附加lines“… => wrote to MySQL:”)。表解锁后,大量等待查询通过。

这有助于下载进程的运行,而网络则成了瓶颈。传统同步查询中,表锁定则应用程序锁定(包括网络通信)。但如果有NodeJS和5.7 X Plugin,则MySQL做队列运行,下载环节继续。

流水线持久性

为了研究流水线的持久性,或者说连接中断后会发生的事,做了一些测试。这一次,在开始NodeJS之前锁定表格,切断连接,最后再解锁表格。结果如下:

Session 1:
----------
mysql> truncate table youtube_new;
Query OK, 0 rows affected (0.25 sec)
mysql> lock table youtube_new read;
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from youtube_new;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)
Session 2:
----------
(when table is locked)
$ node yt1.js
11 03  MyISAM
Switching to InnoDB from MyISAM
tablas InnoDB a MyISAM
MongoDB vs MyISAM (MariaDB/MySQL)
MySQL Tutorial 35 - Foreign Key Constraints for the InnoDB Storage Engine
phpmyadmin foreign keys myisam innodb
Convert or change database manual from Myisam to Innodb
... >100 other results omited ...
^C
Session 1:
----------
mysql> select count(*) from youtube_new;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)
     Id: 4916
   User: root
   Host: localhost:33221
     db: NULL
Command: Query
   Time: 28
  State: Waiting for table metadata lock
   Info: PLUGIN: INSERT INTO `iot`.`youtube_new` (doc) VALUES ('{"upload_date":"20140319","protocol":"
mysql> unlock table;
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from youtube_new;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)
mysql>  select json_unquote(doc->'$.title') from youtube_new;
+---------------------------------+
| json_unquote(doc->'$.title')    |
+---------------------------------+
| 11 03  MyISAM                   |
| Switching to InnoDB from MyISAM |
+---------------------------------+
2 rows in set (0.00 sec)

注意:上例中,没有出现MySQL服务器发出的确认字符。代码接收MySQL回复,列出指令“Wrote to MySQL: {“_state”:{“rows_affected”:1,”doc_ids”:[“…”]}}”。此外,切断连接后,MySQL进程还在表锁等待。

有趣的是此处只有两行被插入文本库。X协议的作者Jan Kneschke被问及此处是否有“history length”或是其它可扩展的缓存(buffer)时,回答如下:

  • Q:是否有history length或缓存,而且我们能够加以调控吗?

A:此处完全不存在“history”或“缓存”,且其都处于连接器层面。

  • Q:为什么最终插入两行?

A:对于这个问题,我已经连接了tcpdump与port 33060(X协议),如下:

这条信息非常重要,要记住异步流水线的配置不是一尘不变的:如果应用程序出错且存在pending writes,则writes可能丢失(或被写入)。

为了充分了解该协议的运行方式,截获tcpdump(并在Jan Kneschke的帮助下进行分析):

tcpdump -i lo -s0 -w tests/node-js-pipelining.pcap "tcp port 33060"

(tcpdump可视化更新如下)

  • 点击CTRL+C,NodeJS关闭连接。由于表仍锁定,MySQL不能写,且无法返回插入结果。
  • 表解锁后,尽管连接已锁定,仍可开始第一个statement。随后确认第一项插入,并开始下一项。
  • 然而,此时脚本(客户端)已经关闭连接,且最后的数据包(此处是id)被否定。5.7 X Plugin由此发现客户端关闭连接并停止执行流水线。

实际上,这与初始MySQL协议的运行方式非常相像。如果切断脚本/应用程序,MySQL连接不会自动中断(除非在MySQL客户端点击CTRL+C,发出切断信号),且该连接会等待表解锁。表解锁后,它会插入一个文件中的第一个statement。

实施unique check

若重启脚本,会再次找出相同的视频,故而可能需要提升数据一致性。默认条件下,插件生成文档唯一键,从而阻止副本插入。

另一条执行unique check的途径是为youtube id创建唯一键。更新表结构如下:

CREATE TABLE `youtube` (
  `doc` json DEFAULT NULL,
  `youtube_id` varchar(11) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.id'))) STORED NOT NULL,
  UNIQUE KEY `youtube_id` (`youtube_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

将默认“id”列改为YouTube唯一ID。重启脚本,显示如下:

MySQL 5.7: Merge JSON data using MySQL
{ [Error: Document contains a field value that is not unique but required to be]
  info:
   { severity: 0,
     code: 5116,
     msg: 'Document contains a field value that is not unique but required to be',
     sql_state: 'HY000' } }
... => wrote to MySQL: undefined

……文档已加载完毕。

结论

尽管X插件流水线不一定能显著提升查询回复(可能缩短往返延时),但仍然能有助于一些应用程序的运行。例如当MySQL表被锁定时,可能不希望网络通信受阻(例如下载或API调用)。同时,除非查询/等待服务器发出的确认字符,否则数据不一定会写入MySQL。

数据分析

下载结果如下:

    "is_live": null,
    "license": "Standard YouTube License",
    "duration": 2965,
    "end_time": null,
    "playlist": ""mysql 5.7"",
    "protocol": "https",
    "uploader": "YUI Library",
    "_filename": "Douglas Crockford - The JSON Saga--C-JoyNuQJs.mp4",
    "age_limit": 0,
    "alt_title": null,
    "extractor": "youtube",
    "format_id": "18",
    "fulltitle": "Douglas Crockford: The JSON Saga",
    "n_entries": 571,
    "subtitles": {},
    "thumbnail": "https://i.ytimg.com/vi/-C-JoyNuQJs/hqdefault.jpg",
    "categories": ["Science & Technology"],
    "display_id": "-C-JoyNuQJs",
    "like_count": 251,
    "player_url": null,
    "resolution": "640x360",
    "start_time": null,
    "thumbnails": [{
        "id": "0",
        "url": "https://i.ytimg.com/vi/-C-JoyNuQJs/hqdefault.jpg"
    }],
    "view_count": 36538,
    "annotations": null,
    "description": "Yahoo! JavaScript architect Douglas Crockford tells the story of how JSON was discovered and how it became a major standard for describing data.",
    "format_note": "medium",
    "playlist_id": ""mysql 5.7"",
    "upload_date": "20110828",
    "uploader_id": "yuilibrary",
    "webpage_url": "https://www.youtube.com/watch?v=-C-JoyNuQJs",
    "uploader_url": "http://www.youtube.com/user/yuilibrary",
    "dislike_count": 5,
    "extractor_key": "Youtube",
    "average_rating": 4.921875,
    "playlist_index": 223,
    "playlist_title": null,
    "automatic_captions": {},
    "requested_subtitles": null,
    "webpage_url_basename": "-C-JoyNuQJs"

为了看到最流行的视频,在view_count基础上又添加了一个virtual field,并据此构建了一个索引。

CREATE TABLE `youtube` (
  `doc` json DEFAULT NULL,
  `youtube_id` varchar(11) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.id'))) STORED NOT NULL,
  `view_count` int(11) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.view_count'))) VIRTUAL,
  UNIQUE KEY `youtube_id` (`youtube_id`),
  KEY `view_count` (`view_count`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

查询运行如下:

mysql> select json_unquote(doc->'$.title'),
    -> view_count,
    -> json_unquote(doc->'$.dislike_count') as dislikes
    -> from youtube
    -> order by view_count desc
    -> limit 10;
+----------------------------------------------------------------------------------------------------+------------+----------+
| json_unquote(doc->'$.title')                                                                       | view_count | dislikes |
+----------------------------------------------------------------------------------------------------+------------+----------+
| Beginners MYSQL Database Tutorial 1 # Download , Install MYSQL and first SQL query                 |     664153 | 106      |
| MySQL Tutorial                                                                                     |     533983 | 108      |
| PHP and MYSQL - Connecting to a Database and Adding Data                                           |     377006 | 50       |
| PHP MySQL Tutorial                                                                                 |     197984 | 41       |
| Installing MySQL (Windows 7)                                                                       |     196712 | 28       |
| Understanding PHP, MySQL, HTML and CSS and their Roles in Web Development - CodersCult Webinar 001 |     195464 | 24       |
| jQuery Ajax Tutorial #1 - Using AJAX & API's (jQuery Tutorial #7)                                  |     179198 | 25       |
| How To Root Lenovo A6000                                                                           |     165221 | 40       |
| MySQL Tutorial 1 - What is MySQL                                                                   |     165042 | 45       |
| How to Send Email in Blackboard Learn                                                              |     144948 | 28       |
+----------------------------------------------------------------------------------------------------+------------+----------+
10 rows in set (0.00 sec)

为了找出最受欢迎的决议,如下:

mysql> select count(*) as cnt,
    -> sum(view_count) as sum_views,
    -> json_unquote(doc->'$.resolution') as resolution
    -> from youtube
    -> group by resolution
    -> order by cnt desc, sum_views desc
    -> limit 10;
+-----+-----------+------------+
| cnt | sum_views | resolution |
+-----+-----------+------------+
| 273 |   3121447 | 1280x720   |
|  80 |   1195865 | 640x360    |
|  18 |     33958 | 1278x720   |
|  15 |     18560 | 1152x720   |
|  11 |     14800 | 960x720    |
|   5 |      6725 | 1276x720   |
|   4 |     18562 | 1280x682   |
|   4 |      1581 | 1280x616   |
|   4 |       348 | 1280x612   |
|   3 |      2024 | 1200x720   |
+-----+-----------+------------+
10 rows in set (0.02 sec)

在此特别感谢Oracle的Jan Kneschke和Morgan Tocker在X协议内部构件方面提供的帮助。