MySQL 表的设计

napoleonu

2012-03-04 22:32:33 +08:00

当你深入了解MySQL的各种功能的实现方式，特征的时候应该就会更容易设计出比较优雅且性能好的表结构来。

就你上面的问题，我的一个小建议是做适当的数据冗余

例如
table subscription
*subscription_id, user_id,user_name, feed_id, ...
那么你上面的查询就是
select f.feed_name,f.feed_url from feed f,subscription s where s.user_name='Tom' and f.feed_id = s.feed_id;

再例如
table subscription
*subscription_id, user_id,user_name,feed_id, feed_name,feed_url, feed_description
那么你上面的查询就是
select feed_name,feed_url from subscription where user_name='Tom';

lenmore

2012-03-04 23:01:57 +08:00

LZ的表设计的非常好了~完全符合第三范式了啊。

查询时用feed_id in (...)来干就不会困惑了。有人会说In的性能问题，我想MySQL的查询优化器应该不会那么傻的吧。

@napoleonu 建议的数据冗余实在没必要，我倒是建议在程序里面完全用user_id来查询，username只是用来登陆和显示。

lepture

2012-03-04 23:04:58 +08:00

@lenmore 有索引的话 in 走索引还是不错的。没索引就悲剧了。

napoleonu

2012-03-04 23:54:46 +08:00

又不是计算机三级考试，不用在意范式的。现在计算机的计算能力和存储能力，范式已经不适合了。

in运算绝对禁止。

Tianpu

2012-03-05 01:01:46 +08:00

@napoleonu 请教select where in 的性能问题，去stackoverflow没找到，我大量使用了这个语句

napoleonu

2012-03-05 10:34:29 +08:00

说明：
1.in(list)运算，如果list是一个明确的项目列表，in(list)运算性能尚可。
2.in(list)运算，如果list是一个子查询，那么只有在满足 1)父查询是覆盖索引查询 2)过滤性够好(我的不知道错误还是正确的85%无效结果的排除率) 的时候父查询才会用的上索引，反之则用不上索引。

就如上面那句
select feed_name, feed_url from feed where feed_id = ( select feed_id from user,subscription where user.user_id = subscription.user_id and user_name = 'Tom');
大部分人的想法可能是首先子查询查询一堆feed_id 之后返回feed_id的列表给父查询
而实际上MySQL优化器并不会这么做，而是先对feed表进行全表扫描，或者如果有个 idx1(feed_id,feed_name, feed_url) 索引，则进行全索引扫描，之后把feed表的feed_id代入到子查询，效率有多低相信你应该知道。至少目前已经GA的版本都是这样，not in一样的方式。

mysql> show create table a\G
*************************** 1. row ***************************
Table: a
Create Table: CREATE TABLE `a` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`uid` int(10) NOT NULL,
`time` datetime NOT NULL,
`type` tinyint(4) DEFAULT NULL,
`status` varchar(320) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_uid` (`uid`)
) ENGINE=InnoDB AUTO_INCREMENT=500001 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

+----+-----+---------------------+------+-----------+
| id | uid | time | type | status |
+----+-----+---------------------+------+-----------+
| 1 | 1 | 2012-01-16 17:34:24 | 1 | vvvvvvvvv |
| 2 | 1 | 2012-01-16 17:34:10 | 2 | vvvvvvvvv |
| 3 | 1 | 2012-01-11 12:16:56 | 0 | vvvvvvvvv |
| 4 | 1 | 2012-01-16 17:34:24 | 1 | vvvvvvvvv |
| 5 | 1 | 2012-01-16 17:34:10 | 2 | vvvvvvvvv |
+----+-----+---------------------+------+-----------+

mysql> select * from c;
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
+----+
10 rows in set (0.00 sec)

1.如果list是一个明确的项目列表
mysql> explain select count(*) from a where uid in (1,2,3,4,5,6,7,8,9,10)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: a
type: range
possible_keys: idx_uid
key: idx_uid
key_len: 4
ref: NULL
rows: 4996
Extra: Using where; Using index
1 row in set (0.00 sec)

mysql> select sql_no_cache count(id) from b where uid in (1,2,3,4,5,6,7,8,9,10);
+-----------+
| count(id) |
+-----------+
| 5000 |
+-----------+
1 row in set (0.00 sec)

2.如果list是一个子查询，满足 1)父查询是覆盖索引查询 2)过滤性够好父查询用的上索引
mysql> explain select count(uid) from a where uid in (select id from c)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: a
type: index
possible_keys: NULL
key: idx_uid
key_len: 4
ref: NULL
rows: 500101
Extra: Using where; Using index
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: c
type: unique_subquery
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: func
rows: 1
Extra: Using index
2 rows in set (0.00 sec)

mysql> select sql_no_cache count(uid) from a where uid in (select id from c);
+------------+
| count(uid) |
+------------+
| 5000 |
+------------+
1 row in set (1.61 sec)

2.如果list是一个子查询，不满足 1)父查询是覆盖索引查询 2)过滤性够好父查询用不上索引
mysql> explain select count(time) from a where uid in (select id from c)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: a
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 500101
Extra: Using where
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: c
type: unique_subquery
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: func
rows: 1
Extra: Using index
2 rows in set (0.00 sec)

mysql> select sql_no_cache count(time) from a where uid in (select id from c);
+-------------+
| count(time) |
+-------------+
| 5000 |
+-------------+
1 row in set (1.68 sec)

napoleonu

2012-03-05 10:42:19 +08:00

@Tianpu @lepture @lenmore @toothpaste

测试看来，如果in(list)中list是一个子查询，不管父查询有没有用上索引，效果都很差。

lepture

2012-03-05 11:24:29 +08:00

@napoleonu 怎么能再来个子查询呢？嵌套自然会慢。这个不能说明in的效率。

napoleonu

2012-03-05 11:53:00 +08:00

@lepture 你看下顶楼是怎么写的。

lepture

2012-03-05 12:10:30 +08:00

@napoleonu sorry 。那他怎么都快不了了。

toothpaste

2012-03-05 12:51:32 +08:00

@napoleonu 谢谢！

Tianpu

2012-03-05 17:06:32 +08:00

@napoleonu 感谢我也动手测试了下

我使用的是确定的序列

测试20个字段的列

table test:
key unique
valuea
valueb

select * from `test` where `key` in (a,b,c,d,....);
3.3秒

select * from `test` where `key` in (a,b,c,d,....);
8.9秒

如果完全分解开来
select * from `test` where `key`='a';
select * from `test` where `key`='b';
select * from `test` where `key`='c';
select * from `test` where `key`='d';
...
这个语句比较多，php批量查询，时间是0.003秒

看来select where in还是不用的好，多个快速的查询比单个查询竟然有百倍的差距

测试表有2200万行的数据虽不是很多也能说明一定的问题

Tianpu

2012-03-05 17:08:57 +08:00

第二个 select * from `test` where `key` in (a,b,c,d,....); 应为
select `key` from `test` where `key` in (a,b,c,d,....);

glume

2012-03-05 20:41:22 +08:00

In 太坑爹了。我的教训也是用了一个in查询用户ID，在一个论坛上。结果网站当天就趴了。