PostgreSQL Subqueries: Nested SELECT Statements for Complex Logic
A subquery (or inner query) is a SELECT statement nested inside another SELECT. Subqueries let you break complex problems into smaller, more manageable pieces.
What You’ll Learn
This guide covers:
- Scalar subqueries (returning one value)
- IN subqueries (checking membership)
- Correlated subqueries (referencing outer query)
- EXISTS subqueries (checking existence)
- When to use subqueries vs. JOINs
- Performance considerations
Understanding Subqueries
A subquery is a SELECT inside a WHERE, FROM, or SELECT clause.
Basic Subquery Concept
Find employees earning more than the average salary:
SELECT first_name, salary FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
Breaking it down:
- Inner query:
SELECT AVG(salary) FROM employees— Returns 79,833 - Outer query: Find employees with salary > 79,833
Result:
first_name | salary
------------+----------
David | 105000.00
Jennifer | 102000.00
Susan | 99000.00
Richard | 98000.00
Joseph | 96000.00
Barbara | 95000.00
Charles | 92000.00
Mary | 78000.00
Patricia | 76000.00
(9 rows)
This is a scalar subquery—it returns a single value (79,833).
Types of Subqueries
1. Scalar Subqueries
Return a single value. Used with comparison operators.
Example: Compare to average
SELECT first_name, salary,
(SELECT AVG(salary) FROM employees) as avg_salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
ORDER BY salary DESC
LIMIT 3;
Result:
first_name | salary | avg_salary
------------+---------+------------
David | 105000 | 79833.33
Jennifer | 102000 | 79833.33
Susan | 99000 | 79833.33
(3 rows)
The subquery SELECT AVG(salary) returns 79,833.33, used for comparison.
Example: Find employee closest to average age
SELECT first_name, EXTRACT(YEAR FROM AGE(hire_date)) as tenure FROM employees
WHERE EXTRACT(YEAR FROM AGE(hire_date)) =
(SELECT ROUND(AVG(EXTRACT(YEAR FROM AGE(hire_date))))
FROM employees)
ORDER BY first_name;
This finds employees with tenure equal to the average tenure.
2. IN Subqueries
Check if a value exists in a subquery result.
Example: Find employees in engineering department
SELECT first_name FROM employees
WHERE dept_id IN (SELECT dept_id FROM departments WHERE dept_name = 'Engineering');
Result:
first_name
------------
David
Jennifer
Richard
Barbara
Charles
Susan
Joseph
Jessica
(8 rows)
The subquery returns dept_id 2 (Engineering), and we find all employees in that department.
Example: Find employees who don’t work in high-revenue departments
Assuming we have a revenue table (not shown), we could do:
SELECT first_name, dept_id FROM employees
WHERE dept_id NOT IN (SELECT dept_id FROM departments WHERE location = 'San Francisco');
This finds employees not in the San Francisco office.
3. Correlated Subqueries
A subquery that references columns from the outer query. Executes once per outer row (slower, but powerful).
Example: Employees earning more than their department average
SELECT e1.first_name, e1.salary, e1.dept_id FROM employees e1
WHERE salary > (
SELECT AVG(salary) FROM employees e2
WHERE e2.dept_id = e1.dept_id
);
Breaking it down:
- Outer query: Each employee
- For each employee, correlated subquery: What’s the average salary in their department?
- WHERE: Is their salary greater than their department average?
Result:
first_name | salary | dept_id
------------+---------+---------
David | 105000 | 2
Jennifer | 102000 | 2
Susan | 99000 | 2
Richard | 98000 | 2
Joseph | 96000 | 2
Mary | 78000 | 1
Patricia | 76000 | 1
Thomas | 82000 | 3
Karen | 80000 | 3
Lisa | 79000 | 3
(10 rows)
These employees earn above their department average.
4. EXISTS Subqueries
Check if a subquery returns any rows. Often faster than IN for large datasets.
Example: Find departments with employees
SELECT dept_name FROM departments d
WHERE EXISTS (SELECT 1 FROM employees e WHERE e.dept_id = d.dept_id);
Breaking it down:
- For each department, check: “Do any employees exist in this department?”
SELECT 1— The value doesn’t matter; we only check if rows exist- WHERE EXISTS — True if the subquery returns any rows
Result:
dept_name
-----------
Sales
Engineering
Marketing
Operations
(4 rows)
All departments have employees.
Example: Find departments WITHOUT employees
SELECT dept_name FROM departments d
WHERE NOT EXISTS (SELECT 1 FROM employees e WHERE e.dept_id = d.dept_id);
This returns empty (all departments have employees in our test data).
Subqueries in FROM Clause
Treat a subquery result as a table.
Example: Find departments with above-average employee count
SELECT dept_summary.dept_id, dept_summary.count
FROM (
SELECT dept_id, COUNT(*) as count FROM employees
GROUP BY dept_id
) as dept_summary
WHERE count > (SELECT AVG(count) FROM (
SELECT COUNT(*) as count FROM employees GROUP BY dept_id
) as avg_calc);
This is complex, but it shows subqueries in FROM.
Simpler version using a Common Table Expression (CTE):
WITH dept_counts AS (
SELECT dept_id, COUNT(*) as count FROM employees GROUP BY dept_id
)
SELECT dept_id, count FROM dept_counts
WHERE count > (SELECT AVG(count) FROM dept_counts);
Subqueries in SELECT Clause
Return a value for each row.
Example: Employee salary vs. department average
SELECT first_name, salary,
(SELECT AVG(salary) FROM employees e2
WHERE e2.dept_id = e1.dept_id) as dept_avg,
salary - (SELECT AVG(salary) FROM employees e2
WHERE e2.dept_id = e1.dept_id) as diff_from_avg
FROM employees e1
ORDER BY first_name
LIMIT 5;
Result:
first_name | salary | dept_avg | diff_from_avg
------------+---------+---------+---------------
Ashley | 66000 | 61500 | 4500
Anthony | 67000 | 61500 | 5500
Barbara | 95000 | 95500 | -500
Betty | 70000 | 61500 | 8500
Charles | 92000 | 95500 | -3500
(5 rows)
Each row shows the employee’s salary, their department’s average, and the difference.
Subqueries vs. JOINs
Both can solve similar problems. When should you use each?
Example: Find Employees in Engineering Department
With Subquery:
SELECT first_name FROM employees
WHERE dept_id IN (SELECT dept_id FROM departments WHERE dept_name = 'Engineering');
With JOIN:
SELECT DISTINCT e.first_name FROM employees e
JOIN departments d ON e.dept_id = d.dept_id
WHERE d.dept_name = 'Engineering';
When to Use Subqueries
- Simpler syntax for some problems
- Aggregates in WHERE clause (scalar subqueries)
- Existence checks (EXISTS)
- Breaking complex problems into pieces
When to Use JOINs
- Combine related data from multiple tables
- Better performance on large datasets (usually)
- Multiple joins (clearer than nested subqueries)
- Industry standard (most developers expect JOINs)
Rule of thumb: Start with a JOIN if it makes sense. Use subqueries if it’s clearer or necessary.
Common Errors & Solutions
Error: “More than one row returned by subquery”
Problem:
SELECT first_name FROM employees
WHERE salary = (SELECT salary FROM employees WHERE dept_id = 2);
The subquery returns multiple rows (8 engineers with different salaries), but the comparison operator = expects one value.
Solution:
Use IN instead of =:
SELECT first_name FROM employees
WHERE salary IN (SELECT salary FROM employees WHERE dept_id = 2);
Or aggregate:
SELECT first_name FROM employees
WHERE salary = (SELECT AVG(salary) FROM employees WHERE dept_id = 2);
Error: “Invalid reference to FROM-clause variable”
Problem:
In a correlated subquery, you reference an alias that doesn’t exist:
SELECT first_name FROM employees e
WHERE EXISTS (SELECT 1 FROM salaries WHERE emp_id = s.emp_id);
The alias s isn’t defined in the FROM clause.
Solution:
Add the table to the subquery FROM:
SELECT first_name FROM employees e
WHERE EXISTS (SELECT 1 FROM salaries s WHERE s.emp_id = e.emp_id);
Error: “Subquery is returning different columns”
Problem:
SELECT first_name, (SELECT salary, hire_date FROM employees) FROM employees;
The subquery returns 2 columns, but you need exactly 1 for SELECT clause subqueries.
Solution:
Select only one column or use it in FROM:
-- One column
SELECT first_name, (SELECT salary FROM employees LIMIT 1) FROM employees;
-- Multiple columns in FROM
SELECT * FROM (SELECT first_name, salary FROM employees) as sub;
Performance Tips
1. Scalar Subqueries Can Be Slow
-- Slow: subquery executes for EVERY row
SELECT first_name,
(SELECT COUNT(*) FROM salaries WHERE emp_id = e.emp_id)
FROM employees e;
On 1 million employees, this runs 1 million subqueries!
2. EXISTS is Usually Faster Than IN
-- Better for large datasets
WHERE EXISTS (SELECT 1 FROM other_table WHERE ...)
-- Slower for large datasets
WHERE id IN (SELECT id FROM other_table WHERE ...)
EXISTS stops at the first match. IN evaluates all matches.
3. Consider JOINs for Complex Problems
-- Complex nested subqueries
SELECT * FROM employees e WHERE dept_id IN (
SELECT dept_id FROM departments WHERE manager_id IN (
SELECT emp_id FROM employees WHERE salary > 80000
)
);
-- Clearer with JOINs
SELECT DISTINCT e.* FROM employees e
JOIN departments d ON e.dept_id = d.dept_id
JOIN employees mgr ON d.manager_id = mgr.emp_id
WHERE mgr.salary > 80000;
4. Use CTEs (WITH Clause) for Readability
-- Readable
WITH high_earners AS (
SELECT emp_id FROM salaries WHERE salary > 100000
)
SELECT * FROM employees WHERE emp_id IN (SELECT emp_id FROM high_earners);
-- Less readable
SELECT * FROM employees WHERE emp_id IN (
SELECT emp_id FROM salaries WHERE salary > 100000
);
FAQ
Q: Can a subquery return multiple columns? A: Yes, if it’s in FROM clause. Not if it’s in WHERE or SELECT.
Q: Are subqueries evaluated once or multiple times? A: Depends. Scalar subqueries in WHERE are usually evaluated once. Correlated subqueries evaluate once per outer row.
Q: Can subqueries be nested multiple levels deep? A: Yes, but readability suffers. Use CTEs instead.
Q: What’s the performance difference between subquery and JOIN? A: Modern query optimizers often treat them identically. Write whichever is clearer.
Q: Can I UPDATE with a subquery? A: Yes:
UPDATE employees SET salary = 75000 WHERE dept_id IN (SELECT dept_id FROM departments WHERE location = 'NYC');
Q: What’s a CTE and how does it differ from a subquery? A: CTEs (WITH clause) are subqueries you name for clarity:
WITH named_subquery AS (SELECT ...)
SELECT * FROM named_subquery;
We cover CTEs in advanced PostgreSQL guides.
Next Steps
You now understand subqueries—a powerful tool for complex queries. Your next steps:
- PostgreSQL JOINs Complete Guide - Understand the alternative: combining tables
- DISTINCT and GROUP BY Explained - Review aggregation for subqueries
- Back to SELECT Guide - Solidify fundamentals
Subqueries are essential knowledge. Combine them with JOINs and you can answer virtually any question your data can provide!