JOINs and Set Operations¶
JOINs combine rows from two or more tables based on related columns. Set operations combine result sets vertically. Both are fundamental to relational querying.
Key Facts¶
- INNER JOIN returns only matching rows from both tables
- LEFT JOIN returns all rows from left table, NULLs where no match on right
- CROSS JOIN produces Cartesian product (every row paired with every row)
- JOIN columns should always be indexed for performance
- UNION removes duplicates (sorting required); UNION ALL keeps duplicates (faster)
- MySQL lacks native FULL OUTER JOIN - emulate with UNION of LEFT and RIGHT JOINs
Patterns¶
INNER JOIN¶
LEFT JOIN (LEFT OUTER JOIN)¶
SELECT u.firstname, o.amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
-- Anti-join: find users with no orders
SELECT u.* FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.id IS NULL;
CROSS JOIN¶
Self-JOIN¶
-- Employees and their managers
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
Multi-Table JOINs¶
SELECT c.name AS city, co.name AS country, cl.language
FROM city c
JOIN country co ON c.country_code = co.code
JOIN country_language cl ON co.code = cl.country_code
WHERE cl.is_official = 'T';
JOIN with Aggregation¶
SELECT u.firstname, COUNT(pm.id) AS msg_count
FROM users u
LEFT JOIN private_messages pm ON u.id = pm.sender_id
GROUP BY u.id, u.firstname
ORDER BY msg_count DESC;
Set Operations¶
-- UNION: combine results, remove duplicates
SELECT city FROM customers UNION SELECT city FROM suppliers;
-- UNION ALL: combine results, keep duplicates (faster)
SELECT city FROM customers UNION ALL SELECT city FROM suppliers;
-- INTERSECT: rows in both (PostgreSQL; MySQL uses INNER JOIN or IN)
SELECT genre FROM rock_albums INTERSECT SELECT genre FROM metal_albums;
-- EXCEPT: rows in first but not second (MySQL: LEFT JOIN WHERE IS NULL)
SELECT genre FROM all_albums EXCEPT SELECT genre FROM jazz_albums;
FULL OUTER JOIN (MySQL Workaround)¶
-- MySQL doesn't support FULL OUTER JOIN directly
SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.t1_id
UNION
SELECT * FROM t1 RIGHT JOIN t2 ON t1.id = t2.t1_id;
Gotchas¶
- Implicit join syntax (
FROM t1, t2 WHERE t1.id = t2.fk) is an anti-pattern - always use explicit JOIN - Set operations require same number of columns and compatible types
- Column names in UNION results come from the first query
- JOINs are generally faster than correlated subqueries
- Missing JOIN condition produces accidental Cartesian product (huge result set)
See Also¶
- [[subqueries-and-ctes]] - EXISTS as alternative to JOINs
- [[index-strategies]] - indexing FK columns for JOIN performance
- [[query-optimization-explain]] - Hash Join vs Merge Join vs Nested Loop