Skip to main content

Subqueries & CTEs

What are subqueries?

A subquery is a query nested inside another query. It lets you use the result of one query as part of another.

-- Find users who have expenses
SELECT * FROM users
WHERE id IN (
SELECT DISTINCT user_id FROM expenses WHERE user_id IS NOT NULL
);

The inner query (SELECT DISTINCT user_id FROM expenses) runs first, and its results are used in the outer query's WHERE clause.

Subqueries in where

The most common use case: filtering based on results from another table.

-- Users who have spent more than $100
SELECT * FROM users
WHERE id IN (
SELECT user_id
FROM expenses
GROUP BY user_id
HAVING SUM(amount) > 100
);

EXISTS vs IN

EXISTS is often more efficient than IN for subqueries:

-- Using IN
SELECT * FROM users
WHERE id IN (SELECT user_id FROM expenses);

-- Using EXISTS (often faster)
SELECT * FROM users
WHERE EXISTS (
SELECT 1 FROM expenses WHERE expenses.user_id = users.id
);

EXISTS stops as soon as it finds one matching row. IN must check all values.

Comparison operators with subqueries

-- Users older than the average age
SELECT * FROM users
WHERE age > (SELECT AVG(age) FROM users);

The subquery must return a single value for comparison operators (>, <, =, etc.).

Subqueries in select

You can use subqueries to calculate values in your SELECT clause:

-- Show user name and their total spending
SELECT
name,
(SELECT SUM(amount) FROM expenses WHERE expenses.user_id = users.id) as total_spent
FROM users;

This adds a calculated column showing each user's total expenses.

Common table expressions (CTEs)

A CTE (Common Table Expression) is a named temporary result set that exists only for the duration of a query. Think of it as a "temporary view."

CTEs use the WITH keyword:

WITH recent_orders AS (
SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '7 days'
)
SELECT COUNT(*) FROM recent_orders;

Why CTEs are powerful

CTEs make complex queries readable by breaking them into logical steps:

-- Without CTE: hard to read
SELECT
u.name,
COUNT(e.id) as expense_count
FROM users u
INNER JOIN expenses e ON u.id = e.user_id
WHERE e.created_at > (
SELECT MAX(created_at) - INTERVAL '30 days'
FROM expenses
)
GROUP BY u.id, u.name
HAVING COUNT(e.id) > 5;

-- With CTE: much clearer
WITH recent_expenses AS (
SELECT * FROM expenses
WHERE created_at > NOW() - INTERVAL '30 days'
),
active_users AS (
SELECT user_id, COUNT(*) as expense_count
FROM recent_expenses
GROUP BY user_id
HAVING COUNT(*) > 5
)
SELECT u.name, au.expense_count
FROM users u
INNER JOIN active_users au ON u.id = au.user_id;

Multiple CTEs

You can chain multiple CTEs:

WITH 
recent_expenses AS (
SELECT * FROM expenses
WHERE created_at > NOW() - INTERVAL '30 days'
),
user_totals AS (
SELECT user_id, SUM(amount) as total
FROM recent_expenses
GROUP BY user_id
),
top_spenders AS (
SELECT * FROM user_totals
ORDER BY total DESC
LIMIT 10
)
SELECT u.name, ts.total
FROM top_spenders ts
INNER JOIN users u ON ts.user_id = u.id;

Each CTE builds on the previous one, making the logic clear and easy to follow.

CTEs vs subqueries

Both solve similar problems, but CTEs are often preferred because they're:

  • More readable: Named steps instead of nested queries
  • Reusable: You can reference a CTE multiple times in the same query
  • Easier to debug: Test each CTE independently
-- Subquery: nested and harder to read
SELECT * FROM users
WHERE id IN (
SELECT user_id FROM expenses
WHERE amount > (SELECT AVG(amount) FROM expenses)
);

-- CTE: clear steps
WITH avg_expense AS (
SELECT AVG(amount) as avg_amount FROM expenses
),
large_expenses AS (
SELECT DISTINCT user_id
FROM expenses, avg_expense
WHERE amount > avg_expense.avg_amount
)
SELECT * FROM users
WHERE id IN (SELECT user_id FROM large_expenses);

Common mistakes

Subquery returns multiple rows

-- ❌ Error: subquery returns multiple rows
SELECT * FROM users
WHERE age > (SELECT age FROM users WHERE name LIKE 'A%');

-- ✅ Fix: use aggregation or IN
SELECT * FROM users
WHERE age > (SELECT MAX(age) FROM users WHERE name LIKE 'A%');

Forgetting to correlate subqueries

-- ❌ Wrong: subquery doesn't reference outer query
SELECT
name,
(SELECT SUM(amount) FROM expenses) as total_spent
FROM users;
-- This sums ALL expenses for every user

-- ✅ Correct: correlate with user_id
SELECT
name,
(SELECT SUM(amount) FROM expenses WHERE expenses.user_id = users.id) as total_spent
FROM users;

Overusing subqueries

Sometimes a JOIN is clearer:

-- Subquery (works, but less efficient)
SELECT name,
(SELECT SUM(amount) FROM expenses WHERE expenses.user_id = users.id) as total
FROM users;

-- JOIN (often better)
SELECT u.name, SUM(e.amount) as total
FROM users u
LEFT JOIN expenses e ON u.id = e.user_id
GROUP BY u.id, u.name;

Readability > cleverness

The best query is the one that's easiest to understand and maintain. CTEs help with this:

-- Clear, step-by-step logic
WITH
recent_expenses AS (SELECT * FROM expenses WHERE created_at > NOW() - INTERVAL '7 days'),
category_totals AS (
SELECT category, SUM(amount) as total
FROM recent_expenses
GROUP BY category
)
SELECT * FROM category_totals
ORDER BY total DESC;