MySQL replace into的使用细则(r10笔记第48天)

在Oracle中有merge into的语法，可以达到一个语句完成同时修改，添加数据的功能，MySQL里面没有merge into的语法，却有replace into。我们来看看replace into的使用细则。为了方便演示，我首先创建一个表 users create table users( user_id int(11) unsigned not null, user_name varchar(64) default null, primary key(user_id) )engine=innodb default charset=UTF8; 插入2行数据，可能搞Oracle的同学就不适应了，SQL怎么能这么写，不过用起来确实蛮有意思。 > insert into users (user_id,user_name) values(1,'aa'),(2,'bb'); Query OK, 2 rows affected (0.00 sec) Records: 2 Duplicates: 0 Warnings: 0 数据情况如下： > select * from users; +---------+-----------+ | user_id | user_name | +---------+-----------+ | 1 | aa | | 2 | bb | +---------+-----------+ 2 rows in set (0.00 sec) 好了，我们来看看replace into的使用，如果向表里插入数据，表里已经存在同样的数据，replace into是会直接更新还是会删除，然后插入。要搞明白这一点很重要，因为这个直接会影响到数据的准确性。我们先看看replace into的使用。比如插入下面的一条记录。 > replace into users(user_id, user_name) values(1, 'cc'); Query OK, 2 rows affected (0.00 sec) 完成之后数据的情况如下： > select * from users; +---------+-----------+ | user_id | user_name | +---------+-----------+ | 1 | cc | | 2 | bb | +---------+-----------+ 2 rows in set (0.00 sec) 看来数据像是被替换了，又好像是删除后，重新覆盖的。怎么验证呢。我们可以先试试trace的方法。是否能够有所收获。首先用explain extended的方式，这种方式会得到很多执行计划的细节信息。

根据输出来看，这种方式得不到预期的数据结果。我们换一个方式，在5.6以上版本使用optimizer_trace > set optimizer_trace="enabled=on"; Query OK, 0 rows affected (0.00 sec) > replace into users(user_id, user_name) values(1, 'dd'); Query OK, 2 rows affected (0.01 sec) 输出结果如下，还是没有得到很详细的信息。

这个时候不要气馁，要知道办法总比困难多。我们可以换一个新的思路来测试，而且还能顺带验证，何乐而不为。我们重新创建一个表users2,和users的唯一不同在于user_id使用了auto_increment的方式。 CREATE TABLE `users2` ( user_id int(11) unsigned not null AUTO_INCREMENT, user_name varchar(64) default null, primary key(user_id) )engine=innodb default charset=UTF8; 插入3行数据。 > INSERT INTO users2 (user_id,user_name) VALUES (1, 'aa'), (2, 'bb'), (3, 'cc'); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0 这个时候查看建表的DDL如下： > SHOW CREATE TABLE users2G *************************** 1. row *************************** Table: users2 Create Table: CREATE TABLE `users2` ( `user_id` int(11) unsigned NOT NULL AUTO_INCREMENT, `user_name` varchar(64) DEFAULT NULL, PRIMARY KEY (`user_id`) ) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 1 row in set (0.01 sec) 数据情况如下： > SELECT * FROM users2 ; +---------+-----------+ | user_id | user_name | +---------+-----------+ | 1 | aa | | 2 | bb | | 3 | cc | +---------+-----------+ 3 rows in set (0.00 sec) 我们先做一个replace into的操作。 > REPLACE INTO users2 (user_id,user_name) VALUES (1, 'dd'); Query OK, 2 rows affected (0.00 sec) 数据情况如下，原来user_id为1的数据做了变更。 > SELECT * FROM users2; +---------+-----------+ | user_id | user_name | +---------+-----------+ | 1 | dd | | 2 | bb | | 3 | cc | +---------+-----------+ 3 rows in set (0.01 sec) 再次查看auto_increment的值还是4 > SHOW CREATE TABLE users2G *************************** 1. row *************************** Table: users2 Create Table: CREATE TABLE `users2` ( `user_id` int(11) unsigned NOT NULL AUTO_INCREMENT, `user_name` varchar(64) DEFAULT NULL, PRIMARY KEY (`user_id`) ) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 1 row in set (0.00 sec) 这个时候还是很难得出一个结论，切记不要想当然。replace into需要表中存在主键或者唯一性索引，user_id存在主键，我们给user_name创建一个唯一性索引。 > alter table users2 add unique key users2_uq_name(user_name); Query OK, 0 rows affected (0.06 sec) Records: 0 Duplicates: 0 Warnings: 0 好了，重要的时刻到了，我们看看下面的语句的效果。只在语句中提及user_name,看看user_id是递增还是保留当前的值。 > REPLACE INTO users2 (user_name) VALUES ('dd'); Query OK, 2 rows affected (0.00 sec) 可以看到user_id做了递增，也就意味着这是一个全新的insert插入数据。 > select * from users2; +---------+-----------+ | user_id | user_name | +---------+-----------+ | 2 | bb | | 3 | cc | | 4 | dd | +---------+-----------+ 3 rows in set (0.00 sec) 这个时候再次查看建表的DDL如下,auto_increment确实是递增了。 CREATE TABLE `users2` ( `user_id` int(11) unsigned NOT NULL AUTO_INCREMENT, `user_name` varchar(64) DEFAULT NULL, PRIMARY KEY (`user_id`), UNIQUE KEY `users2_uq_name` (`user_name`) ) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8 所以通过上面的测试和推理我们知道，replace into是delete,insert的操作，而非基于当前数据的update。如此一来我们使用replace into的时候就需要格外注意，可能有些操作非我们所愿，如果插入数据时存在重复的数据，是更新当前记录的情况，该怎么办呢，可以使用replace into的姊妹篇语句，insert into on duplicate key 的方式，后面需要使用update选项。比如我们还是基于上面的数据，插入user_name为'dd'的数据，如果存在则修改。 > INSERT INTO users2 (user_name) VALUES ('dd') ON DUPLICATE KEY UPDATE user_name=VALUES(user_name); Query OK, 0 rows affected (0.00 sec) 根据运行结果来看，没有修改数据，比我们期望的还要好一些。所以任何语句和功能都不是万能的，还得看场景，脱离了使用场景就很难说得清了。此外，补充replace into的另外一种使用方式，供参考。 > replace into users2(user_id,user_name) select 2,'bbbb' ; Query OK, 2 rows affected (0.01 sec) Records: 1 Duplicates: 1 Warnings: 0 > select *from users2; +---------+-----------+ | user_id | user_name | +---------+-----------+ | 2 | bbbb | | 3 | cc | | 4 | dd | +---------+-----------+ 3 rows in set (0.00 sec) 其实再次查看replace into的使用，发现日志中已经赫然提醒，2 rows affected.当然我们有过程有结论，也算是一种不错的尝试了。