评论

小程序商城产品中MySQL索引优化实战

本文是我们通过test case来对mysql索引进行优化过程,并总结出研发在使用mysql语句的过程中需要注意到的效率差异。

创建 test 表

真正的工作中,一张表只有5-6个字段的话,不大可能会建4个索引,这里只是为了学习演示效果,来看索引的执行优化情况


CREATE TABLE `test` (


`id` int(11) NOT NULL AUTO_INCREMENT,


`c1` varchar(10) DEFAULT NULL,


`c2` varchar(10) DEFAULT NULL,


`c3` varchar(10) DEFAULT NULL,


`c4` varchar(10) DEFAULT NULL,


`c5` varchar(10) DEFAULT NULL,


PRIMARY KEY (`id`),


KEY `idx_test_c1234` (`c1`,`c2`,`c3`,`c4`) USING BTREE


) ENGINE=InnoDB DEFAULT CHARSET=utf8

创建测试数据


INSERT INTO test(c1 , c2 , c3 , c4 , c5)


VALUES('a1' , 'a2' , 'a3' , 'a4' , 'a5');




INSERT INTO test(c1 , c2 , c3 , c4 , c5)


VALUES('b1' , 'b2' , 'b3' , 'b4' , 'b5');




INSERT INTO test(c1 , c2 , c3 , c4 , c5)


VALUES('c1' , 'c2' , 'c3' , 'c4' , 'c5');




INSERT INTO test(c1 , c2 , c3 , c4 , c5)


VALUES('d1' , 'd2' , 'd3' , 'd4' , 'd5');




INSERT INTO test(c1 , c2 , c3 , c4 , c5)


VALUES('e1' , 'e2' , 'e3' , 'e4' , 'e5');



分析以下 Case 索引使用情况

Case 1


explain select * from test where c1='a1' and c2='a2' and c3='a3' and c4='a4';




explain select * from test where c1='a1' and c3='a3' and c2='a2' and c4='a4';




explain select * from test where c1='a1' and c4='a4' and c3='a3' and c2='a2';




explain select * from test where c4='a4' and c3='a3' and c2='a2' and c1='a1';

:以上四组explain是不是都能用到索引?


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------------------+------+----------+-------+


| id | select_type | table | partitions | type | possible_keys  | key            | key_len | ref                     | rows | filtered | Extra |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------------------+------+----------+-------+


|  1 | SIMPLE      | test  | NULL       | ref  | idx_test_c1234 | idx_test_c1234 | 132     | const,const,const,const |    1 |   100.00 | NULL  |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------------------+------+----------+-------+


1 row in set, 1 warning (0.00 sec)

分析:上述四组explain都能用到索引,并且type=ref,key_len=132,ref=const,const,const,const, 是因为MySQL底层的优化器会对sql进行优化,在执行常量等值查询时,改变索引列的顺序并不会更改explain的执行结果,但是推荐按照索引顺序列编写sql语句。

Case 2

Case 2.1


mysql> explain select * from test where c1='a1' and c2='a2';


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+-------+


| id | select_type | table | partitions | type | possible_keys  | key            | key_len | ref         | rows | filtered | Extra |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+-------+


|  1 | SIMPLE      | test  | NULL       | ref  | idx_test_c1234 | idx_test_c1234 | 66      | const,const |    1 |   100.00 | NULL  |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+-------+


1 row in set, 1 warning (0.00 sec)

分析:上述explain的两个条件肯定都会走索引


mysql> explain select * from test where c1='a1' and c2='a2' and c3 > 'a3' and c4='c4';


+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+


| id | select_type | table | partitions | type  | possible_keys  | key            | key_len | ref  | rows | filtered | Extra                 |


+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+


|  1 | SIMPLE      | test  | NULL       | range | idx_test_c1234 | idx_test_c1234 | 99      | NULL |    1 |    20.00 | Using index condition |


+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+


1 row in set, 1 warning (0.00 sec)

分析:当出现范围的时候,type=range,key_len=99,比不用范围key_len=66增加了,说明使用上了索引,但对比Case1中执行结果,说明c4上索引失效。结合底层的数据结构,必须要通过等于才能通过b+tree的算法找到某一个值,如果通过比较的话,相当于后面就断了,但是范围当前位置(c3)的索引是有效的,从key_len=99可证明

Case 2.2


mysql> explain select * from test where c1='a1' and c2='a2' and c4>'a4' and c3 = 'a3';


+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+


| id | select_type | table | partitions | type  | possible_keys  | key            | key_len | ref  | rows | filtered | Extra                 |


+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+


|  1 | SIMPLE      | test  | NULL       | range | idx_test_c1234 | idx_test_c1234 | 132     | NULL |    1 |   100.00 | Using index condition |


+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+


1 row in set, 1 warning (0.00 sec)

分析:与上面explain执行结果对比,key_len=132说明索引用到了4个,因为对此sql语句MySQL底层优化器会进行优化。

Case 2.3


mysql> explain select * from test where c1>'a1' and c2='a2' and c3='a3' and c4 = 'a4';


+----+-------------+-------+------------+------+----------------+------+---------+------+------+----------+-------------+


| id | select_type | table | partitions | type | possible_keys  | key  | key_len | ref  | rows | filtered | Extra       |


+----+-------------+-------+------------+------+----------------+------+---------+------+------+----------+-------------+


|  1 | SIMPLE      | test  | NULL       | ALL  | idx_test_c1234 | NULL | NULL    | NULL |    5 |    20.00 | Using where |


+----+-------------+-------+------------+------+----------------+------+---------+------+------+----------+-------------+


1 row in set, 1 warning (0.00 sec)

分析:上述explain我在没有执行时,是认为会走索引的,执行之后发现没有走索引,经过分析,这里违背了最左前缀法则(如果最左前列的索引失效,则后面的索引都失效),因为c1主要用于范围查询,而不是查询,解决方式可以使用覆盖索引

Case 3

Case 3.1


mysql> explain select * from test where c1='a1' and c2='a2' and c4 = 'a4' order by c3;


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+-----------------------+


| id | select_type | table | partitions | type | possible_keys  | key            | key_len | ref         | rows | filtered | Extra                 |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+-----------------------+


|  1 | SIMPLE      | test  | NULL       | ref  | idx_test_c1234 | idx_test_c1234 | 66      | const,const |    1 |    20.00 | Using index condition |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+-----------------------+


1 row in set, 1 warning (0.00 sec)

分析:利用最左前缀法则:中间兄弟不能断,因此查询时用到了c1c2索引,从key_len=66,ref=const,const可以看出来,另外排序时用到了c3索引。

Case 3.2


mysql> explain select * from test where c1='a1' and c2='a2'  order by c3;


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+-----------------------+


| id | select_type | table | partitions | type | possible_keys  | key            | key_len | ref         | rows | filtered | Extra                 |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+-----------------------+


|  1 | SIMPLE      | test  | NULL       | ref  | idx_test_c1234 | idx_test_c1234 | 66      | const,const |    1 |   100.00 | Using index condition |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+-----------------------+


1 row in set, 1 warning (0.00 sec)

分析: 从explain的执行结果来看:key_len=66,ref=const,const,从而查询时只用到c1c2索引,排序时用到了c3的索引。

Case 3.3


mysql> explain select * from test where c1='a1' and c2='a2'  order by c4;


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+---------------------------------------+


| id | select_type | table | partitions | type | possible_keys  | key            | key_len | ref         | rows | filtered | Extra                                 |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+---------------------------------------+


|  1 | SIMPLE      | test  | NULL       | ref  | idx_test_c1234 | idx_test_c1234 | 66      | const,const |    1 |   100.00 | Using index condition; Using filesort |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+---------------------------------------+


1 row in set, 1 warning (0.00 sec)

分析:从explain的执行结果来看:key_len=66,ref=const,const,查询使用了c1c2索引,由于用了c4进行排序,跳过了c3,所有c4没有用到索引,出现了Using filesort

Case 4

Case 4.1


mysql> explain select * from test where c1='a1' and c5='a5'  order by c2,c3;


+----+-------------+-------+------------+------+----------------+----------------+---------+-------+------+----------+------------------------------------+


| id | select_type | table | partitions | type | possible_keys  | key            | key_len | ref   | rows | filtered | Extra                              |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------+------+----------+------------------------------------+


|  1 | SIMPLE      | test  | NULL       | ref  | idx_test_c1234 | idx_test_c1234 | 33      | const |    1 |    20.00 | Using index condition; Using where |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------+------+----------+------------------------------------+


1 row in set, 1 warning (0.00 sec)




mysql>

分析:查询用到了c1索引,排序上用到c2c3的索引,看最后的Extra列中没有出现Using filesort(文件排序)。

Case 4.2


mysql> explain select * from test where c1='a1' and c5='a5'  order by c3,c2;


+----+-------------+-------+------------+------+----------------+----------------+---------+-------+------+----------+----------------------------------------------------+


| id | select_type | table | partitions | type | possible_keys  | key            | key_len | ref   | rows | filtered | Extra                                              |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------+------+----------+----------------------------------------------------+


|  1 | SIMPLE      | test  | NULL       | ref  | idx_test_c1234 | idx_test_c1234 | 33      | const |    1 |    20.00 | Using index condition; Using where; Using filesort |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------+------+----------+----------------------------------------------------+


1 row in set, 1 warning (0.00 sec)

分析:和Case 4.1的区别是,颠倒了排序的字段,通过explain的执行结果来看,出现了Using filesort,因为索引的创建顺序为c1,c2,c3,c4,但是排序的时候c2c3颠倒位置了, 不符合最左前缀法则

Case 4.3


mysql> explain select * from test where c1='a1' and c2='a2' and c5='a5'  order by c2,c3;


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+------------------------------------+


| id | select_type | table | partitions | type | possible_keys  | key            | key_len | ref         | rows | filtered | Extra                              |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+------------------------------------+


|  1 | SIMPLE      | test  | NULL       | ref  | idx_test_c1234 | idx_test_c1234 | 66      | const,const |    1 |    20.00 | Using index condition; Using where |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+------------------------------------+


1 row in set, 1 warning (0.00 sec)

分析: 在查询时增加了c5字段,通过explain的执行结果来看,只用到了c1c2的索引,因为c5并未创建索引。

Case 4.4


mysql>  explain select * from test where c1='a1' and c2='a2' and c5='a5'  order by c3,c2;


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+------------------------------------+


| id | select_type | table | partitions | type | possible_keys  | key            | key_len | ref         | rows | filtered | Extra                              |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+------------------------------------+


|  1 | SIMPLE      | test  | NULL       | ref  | idx_test_c1234 | idx_test_c1234 | 66      | const,const |    1 |    20.00 | Using index condition; Using where |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------------+------+----------+------------------------------------+


1 row in set, 1 warning (0.00 sec)

分析: 和Case 4.3相比,颠倒了排序字段, 那么颠倒之后为何也能用到索引?怎么没用到文件排序呢?
还是拿最左前缀原理来分析,因为c1c2在查询时,已经通过索引筛选出来了结果(并且是排好序的),那么order by中的c2字段已经没有多大意义,只需要做c3排序即可。

Case 5

Case 5.1


mysql> explain select * from test where c1='a1' and c4='a4' group by c2,c3;


+----+-------------+-------+------------+------+----------------+----------------+---------+-------+------+----------+-----------------------+


| id | select_type | table | partitions | type | possible_keys  | key            | key_len | ref   | rows | filtered | Extra                 |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------+------+----------+-----------------------+


|  1 | SIMPLE      | test  | NULL       | ref  | idx_test_c1234 | idx_test_c1234 | 33      | const |    1 |    20.00 | Using index condition |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------+------+----------+-----------------------+


1 row in set, 1 warning (0.00 sec)

请注意在MySQL 5.7.x版本中,默认开启了only_full_group_by模式,就是说如果select中的列,没有在group by中出现时,执行这个sql 就会报错,我本地为了方便演示,临时先去掉only_full_group_by的限制,通过set @@global.sql_mode ='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';语句来设置,建议生产环境别这么玩,写的sql还是要规范一些。

分析: 只用到c1上的索引,因为c4中间间断了,根据最左前缀法则key_len=33,ref=const结果来看,表示只用到一个c1索引。

Case 5.2


mysql> explain select * from test where c1='a1' and c4='a4' group by c3,c2;


+----+-------------+-------+------------+------+----------------+----------------+---------+-------+------+----------+--------------------------------------------------------+


| id | select_type | table | partitions | type | possible_keys  | key            | key_len | ref   | rows | filtered | Extra                                                  |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------+------+----------+--------------------------------------------------------+


|  1 | SIMPLE      | test  | NULL       | ref  | idx_test_c1234 | idx_test_c1234 | 33      | const |    1 |    20.00 | Using index condition; Using temporary; Using filesort |


+----+-------------+-------+------------+------+----------------+----------------+---------+-------+------+----------+--------------------------------------------------------+


1 row in set, 1 warning (0.00 sec)

分析: 和Case 5.1相比,颠倒了排序字段,这里出现了Using temporary; Using filesort,就是说做分组时,创建了临时表文件排序, 效率极低。因为c3c2的索引顺序是相反的。

MySQL在做group by时,首先会做一次order by的操作

Case 6


mysql> explain select * from test where c1>'a1' order by c1;


+----+-------------+-------+------------+------+----------------+------+---------+------+------+----------+-----------------------------+


| id | select_type | table | partitions | type | possible_keys  | key  | key_len | ref  | rows | filtered | Extra                       |


+----+-------------+-------+------------+------+----------------+------+---------+------+------+----------+-----------------------------+


|  1 | SIMPLE      | test  | NULL       | ALL  | idx_test_c1234 | NULL | NULL    | NULL |    5 |    80.00 | Using where; Using filesort |


+----+-------------+-------+------------+------+----------------+------+---------+------+------+----------+-----------------------------+


1 row in set, 1 warning (0.00 sec)

分析: 在第一个索引字段用范围查询的话,很有可能是不走索引的,会产生全表扫描, 在看 排序时,又出现了Using filesort,因此建议在用联合索引的第一个字段的检索时尽量精确。
对于这种情况,我就是要用联合索引的第一个字段做范围查询,如何优化?
可以采用覆盖索引, 如下:


mysql> explain select c1 from test where c1>'a1' order by c1;


+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+


| id | select_type | table | partitions | type  | possible_keys  | key            | key_len | ref  | rows | filtered | Extra                    |


+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+


|  1 | SIMPLE      | test  | NULL       | range | idx_test_c1234 | idx_test_c1234 | 33      | NULL |    4 |   100.00 | Using where; Using index |


+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+


1 row in set, 1 warning (0.00 sec)

Case 7


mysql> explain select * from test order by c1 asc, c2 desc;


+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+


| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra          |


+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+


|  1 | SIMPLE      | test  | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    5 |   100.00 | Using filesort |


+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+


1 row in set, 1 warning (0.00 sec)

分析: 虽然排序的字段索引顺序一样(order by默认升序),这里c2 desc变成了降序,导致与索引的排序方式不同(要么是升序,要么是降序),从而产生Using filesort

in 和 exists 优化

in

原则:小表驱动大表,即小的数据集驱动大的数据集

select * from A where id in (select id from B)

B表的数据集小于A表的数据集时,in优于existsin是后面的一张表先执行,看个伪代码可能更容易理解


for(int i = 0; i < 5; i++){


for(int j = 0; j < 1000; j++) {


// do something


}


}

通过以上的伪代码来分析, 假设B表查询出来的结果有5条数据(B=i),A表数据有1000条(A=j),用A表和B表做关联查询的话,只需要到B表关联查5次。如果将两者交换一下,B表查出来1000条数据,A表只有5条数据,那么关联需要查1000次,肯定第一种方案效果更高一些,极大的减少了匹配查询的次数。

exists

select * from A where exists (select 1 from B where B.id = A.id)

A表的数据集小于B表的数据集时,exists优于inexists 是前面一张表先执行。

exists 子查询往往也可以用join来代替,哪种最优需要具体问题具体分析

总结

  • MySQL支持两种方式的排序filesortindexUsing index是指MySQL扫描索引本身完成排序。index效率高,filesort效率低

  • order by满足两种情况会使用Using index

    • order by语句使用索引最左前列

    • 使用where子句order by子句条件列组合满足索引最左前列

  • 尽量在索引列上完成排序,遵循索引建立(索引创建的顺序)时的最佳左前缀法则

  • 如果order by的条件不在索引列上,就会产生Using filesort

  • group byorder by很类似,其实是先排序后分组,遵照索引创建顺序的最佳左前缀法则。注意where高于having,能写在where中的限定条件就不要去having限定了

最后一次编辑于  07-19  
点赞 1
收藏
评论