- 本文地址: https://www.yangdx.com/2020/05/151.html
- 转载请注明出处
使用 sql 语句查询数据的时候,如果表里有好几个索引,mysql 优化器会自己判断使用哪个索引进行搜索。但是,where 条件比较复杂的时候,优化器使用的索引就不一定是最优索引了。
例如,最近有个需求,查询 4 月份每个商家的微信、支付宝订单额,sql 语句如下:
SELECT
o.client_id,
c.client_name,
SUM(IF(o.pay_by IN(4, 5), o.money, 0)) AS m_alipay,
SUM(IF(o.pay_by IN(1, 2, 3), o.money, 0)) AS m_wechat
FROM
orders o
JOIN clients c ON o.client_id = c.id
WHERE
o.pay_time >= UNIX_TIMESTAMP("2020-04-01")
AND o.pay_time < UNIX_TIMESTAMP("2020-04-30 23:59:59")
AND c.status = 1
GROUP BY
o.client_id
orders 订单表数据量 400 万条,clients 商家信息表 2000 条,这个 sql 语句执行花了差不多 26 秒。
通常我们会在慢查询 sql 语句前面加上 explain
关键字,查看执行计划:
mysql> EXPLAIN
-> SELECT
-> o.client_id,
-> c.client_name,
-> SUM(IF(o.pay_by IN(4, 5), o.money, 0)) AS m_alipay,
-> SUM(IF(o.pay_by IN(1, 2, 3), o.money, 0)) AS m_wechat
-> FROM
-> orders o
-> JOIN clients c ON o.client_id = c.id
-> WHERE
-> o.pay_time >= UNIX_TIMESTAMP("2020-04-01")
-> AND o.pay_time < UNIX_TIMESTAMP("2020-04-30 23:59:59")
-> AND c.status = 1
-> GROUP BY
-> o.client_id;
+----+-------------+-------+------+---------------------------------------+---------------+---------+-----------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------------------------------+---------------+---------+-----------+------+----------------------------------------------+
| 1 | SIMPLE | c | ALL | PRIMARY | NULL | NULL | NULL | 715 | Using where; Using temporary; Using filesort |
+----+-------------+-------+------+---------------------------------------+---------------+---------+-----------+------+----------------------------------------------+
| 1 | SIMPLE | o | ref | idx_pay_time,idx_client_id,idx_lin... | idx_client_id | 99 | shop.c.id | 201 | Using where |
+----+-------------+-------+------+---------------------------------------+---------------+---------+-----------+------+----------------------------------------------+
2 rows in set, 1 warning (0.01 sec)
orders 订单表有好几个索引,如 idx_pay_time、idx_client_id,分别是 pay_time、client_id 字段的单列索引。在执行计划中可以看到,mysql 使用订单表的 idx_client_id 索引进行搜索,预估扫描的数据是 201 条,但是 4 月份订单不可能只有这么点!可见,优化器使用的 idx_client_id 并不是最优索引。
在 mysql 中,添加 force index()
可以强制 sql 查询时使用指定索引,如 force index(idx_pay_time)
强制使用 pay_time 字段的索引,sql 改成:
SELECT
o.client_id,
c.client_name,
SUM(IF(o.pay_by IN(4, 5), o.money, 0)) AS m_alipay,
SUM(IF(o.pay_by IN(1, 2, 3), o.money, 0)) AS m_wechat
FROM
orders o FORCE INDEX(idx_pay_time)
JOIN clients c ON o.client_id = c.id
WHERE
o.pay_time >= UNIX_TIMESTAMP("2020-04-01")
AND o.pay_time < UNIX_TIMESTAMP("2020-04-30 23:59:59")
AND c.status = 1
GROUP BY
o.client_id
在前面加上 explain
查看执行计划:
mysql> EXPLAIN
-> SELECT
-> o.client_id,
-> c.client_name,
-> SUM(IF(o.pay_by IN(4, 5), o.money, 0)) AS m_alipay,
-> SUM(IF(o.pay_by IN(1, 2, 3), o.money, 0)) AS m_wechat
-> FROM
-> orders o FORCE INDEX(idx_pay_time)
-> JOIN clients c ON o.client_id = c.id
-> WHERE
-> o.pay_time >= UNIX_TIMESTAMP("2020-04-01")
-> AND o.pay_time < UNIX_TIMESTAMP("2020-04-30 23:59:59")
-> AND c.status = 1
-> GROUP BY
-> o.client_id;
+----+-------------+-------+--------+---------------------------------------+--------------+---------+------------------+--------+-------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------------------------+--------------+---------+------------------+--------+-------------------------------------------------------+
| 1 | SIMPLE | o | range | idx_pay_time,idx_client_id,idx_lin... | idx_pay_time | 5 | NULL | 671630 | Using index condition; Using where; Using temporar... |
+----+-------------+-------+--------+---------------------------------------+--------------+---------+------------------+--------+-------------------------------------------------------+
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 98 | shop.o.client_id | 1 | Using where |
+----+-------------+-------+--------+---------------------------------------+--------------+---------+------------------+--------+-------------------------------------------------------+
2 rows in set, 1 warning (0.01 sec)
这次 orders 订单表扫描的数据是 67 万多条,这个订单数才是接近真实情况的。
最后,执行 sql 查询,优化后的语句耗时 2.7 秒。
前来学习的,感谢!
太强了!文中的“在执行计划中可以看到,mysql 使用订单表的 idx_client_id 索引进行搜索,预估扫描的数据是 201 条,但是 4 月份订单不可能只有这么点!”这段话什么意思不太懂。好像是索引扫描的数据越多越好的感觉?
博主回复:
4月份订单有67万条,索引才命中201条,所以这个索引不是最优的。索引命中越多越好!