Faking independent subqueries in MySQL
So one of my associates laid this little gem on me yesterday that kinda floored me.
Say you have a query that counts the number of members in a group that looks like this:
select count(*) as members from users where id in (select user_id from memberships where group_id = 1);
You may know that this query can take a long time as the number of rows in the users table grows. Why? Take a look at the explanation:
+----+--------------------+-------------+-------+---------------+---------+---------+------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------------+-------+---------------+---------+---------+------+------+--------------------------+ | 1 | PRIMARY | users | index | NULL | PRIMARY | 4 | NULL | 1 | Using where; Using index | | 2 | DEPENDENT SUBQUERY | memberships | ALL | NULL | NULL | NULL | NULL | 1 | Using where | +----+--------------------+-------------+-------+---------------+---------+---------+------+------+--------------------------+
That inner query is dependent, which means it’ll get run for each user in the table, even though it really doesn’t have to be. It would be nice to store this in a temporary table, but that’s a pain right? Or is it? Check out this “optimization”:
select count(*) as members from users where id in (select user_id from (select user_id from memberships where group_id = 1) as x);
That’s an extra query right? Why would that help? Turns out that wrapping the dependent subquery like this creates a derived table. MySQL just creates that temporary table for you for free. Here’s the explanation.
+----+--------------------+-------------+--------+---------------+---------+---------+------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------------+--------+---------------+---------+---------+------+------+--------------------------+ | 1 | PRIMARY | users | index | NULL | PRIMARY | 4 | NULL | 1 | Using where; Using index | | 2 | DEPENDENT SUBQUERY | <derived3> | system | NULL | NULL | NULL | NULL | 0 | const row not found | | 3 | DERIVED | memberships | ALL | NULL | NULL | NULL | NULL | 1 | Using where | +----+--------------------+-------------+--------+---------------+---------+---------+------+------+--------------------------+
So now instead of running that query against the memberships table it runs it against the derived table. This derived table only contains one column, resides in memory and never changes allowing it to run much much faster without changing the query structure too heavily.
Update: This may not be the best example by itself. Let me clarify my additional requirement of counting all the users that aren’t in the given group as well. The above query handles that easily via a “not”:
select count(*) as members from users where id NOT in (select user_id from (select user_id from memberships where group_id = 1) as x);







