PostgreSQL Subqueries: Nested SELECT Statements for Complex Logic

A subquery (or inner query) is a SELECT statement nested inside another SELECT. Subqueries let you break complex problems into smaller, more manageable pieces.

What You’ll Learn

This guide covers:

  • Scalar subqueries (returning one value)
  • IN subqueries (checking membership)
  • Correlated subqueries (referencing outer query)
  • EXISTS subqueries (checking existence)
  • When to use subqueries vs. JOINs
  • Performance considerations

Understanding Subqueries

A subquery is a SELECT inside a WHERE, FROM, or SELECT clause.

Basic Subquery Concept

Find employees earning more than the average salary:

SELECT first_name, salary FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Breaking it down:

  1. Inner query: SELECT AVG(salary) FROM employees — Returns 79,833
  2. Outer query: Find employees with salary > 79,833

Result:

 first_name | salary
------------+----------
 David      | 105000.00
 Jennifer   | 102000.00
 Susan      | 99000.00
 Richard    | 98000.00
 Joseph     | 96000.00
 Barbara    | 95000.00
 Charles    | 92000.00
 Mary       | 78000.00
 Patricia   | 76000.00
(9 rows)

This is a scalar subquery—it returns a single value (79,833).

Types of Subqueries

1. Scalar Subqueries

Return a single value. Used with comparison operators.

Example: Compare to average

SELECT first_name, salary,
       (SELECT AVG(salary) FROM employees) as avg_salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
ORDER BY salary DESC
LIMIT 3;

Result:

 first_name | salary  | avg_salary
------------+---------+------------
 David      | 105000  | 79833.33
 Jennifer   | 102000  | 79833.33
 Susan      | 99000   | 79833.33
(3 rows)

The subquery SELECT AVG(salary) returns 79,833.33, used for comparison.

Example: Find employee closest to average age

SELECT first_name, EXTRACT(YEAR FROM AGE(hire_date)) as tenure FROM employees
WHERE EXTRACT(YEAR FROM AGE(hire_date)) =
      (SELECT ROUND(AVG(EXTRACT(YEAR FROM AGE(hire_date))))
       FROM employees)
ORDER BY first_name;

This finds employees with tenure equal to the average tenure.

2. IN Subqueries

Check if a value exists in a subquery result.

Example: Find employees in engineering department

SELECT first_name FROM employees
WHERE dept_id IN (SELECT dept_id FROM departments WHERE dept_name = 'Engineering');

Result:

 first_name
------------
 David
 Jennifer
 Richard
 Barbara
 Charles
 Susan
 Joseph
 Jessica
(8 rows)

The subquery returns dept_id 2 (Engineering), and we find all employees in that department.

Example: Find employees who don’t work in high-revenue departments

Assuming we have a revenue table (not shown), we could do:

SELECT first_name, dept_id FROM employees
WHERE dept_id NOT IN (SELECT dept_id FROM departments WHERE location = 'San Francisco');

This finds employees not in the San Francisco office.

3. Correlated Subqueries

A subquery that references columns from the outer query. Executes once per outer row (slower, but powerful).

Example: Employees earning more than their department average

SELECT e1.first_name, e1.salary, e1.dept_id FROM employees e1
WHERE salary > (
    SELECT AVG(salary) FROM employees e2
    WHERE e2.dept_id = e1.dept_id
);

Breaking it down:

  1. Outer query: Each employee
  2. For each employee, correlated subquery: What’s the average salary in their department?
  3. WHERE: Is their salary greater than their department average?

Result:

 first_name | salary  | dept_id
------------+---------+---------
 David      | 105000  |       2
 Jennifer   | 102000  |       2
 Susan      | 99000   |       2
 Richard    | 98000   |       2
 Joseph     | 96000   |       2
 Mary       | 78000   |       1
 Patricia   | 76000   |       1
 Thomas     | 82000   |       3
 Karen      | 80000   |       3
 Lisa       | 79000   |       3
(10 rows)

These employees earn above their department average.

4. EXISTS Subqueries

Check if a subquery returns any rows. Often faster than IN for large datasets.

Example: Find departments with employees

SELECT dept_name FROM departments d
WHERE EXISTS (SELECT 1 FROM employees e WHERE e.dept_id = d.dept_id);

Breaking it down:

  1. For each department, check: “Do any employees exist in this department?”
  2. SELECT 1 — The value doesn’t matter; we only check if rows exist
  3. WHERE EXISTS — True if the subquery returns any rows

Result:

 dept_name
-----------
 Sales
 Engineering
 Marketing
 Operations
(4 rows)

All departments have employees.

Example: Find departments WITHOUT employees

SELECT dept_name FROM departments d
WHERE NOT EXISTS (SELECT 1 FROM employees e WHERE e.dept_id = d.dept_id);

This returns empty (all departments have employees in our test data).

Subqueries in FROM Clause

Treat a subquery result as a table.

Example: Find departments with above-average employee count

SELECT dept_summary.dept_id, dept_summary.count
FROM (
    SELECT dept_id, COUNT(*) as count FROM employees
    GROUP BY dept_id
) as dept_summary
WHERE count > (SELECT AVG(count) FROM (
    SELECT COUNT(*) as count FROM employees GROUP BY dept_id
) as avg_calc);

This is complex, but it shows subqueries in FROM.

Simpler version using a Common Table Expression (CTE):

WITH dept_counts AS (
    SELECT dept_id, COUNT(*) as count FROM employees GROUP BY dept_id
)
SELECT dept_id, count FROM dept_counts
WHERE count > (SELECT AVG(count) FROM dept_counts);

Subqueries in SELECT Clause

Return a value for each row.

Example: Employee salary vs. department average

SELECT first_name, salary,
       (SELECT AVG(salary) FROM employees e2
        WHERE e2.dept_id = e1.dept_id) as dept_avg,
       salary - (SELECT AVG(salary) FROM employees e2
                 WHERE e2.dept_id = e1.dept_id) as diff_from_avg
FROM employees e1
ORDER BY first_name
LIMIT 5;

Result:

 first_name | salary  | dept_avg | diff_from_avg
------------+---------+---------+---------------
 Ashley     | 66000   | 61500   | 4500
 Anthony    | 67000   | 61500   | 5500
 Barbara    | 95000   | 95500   | -500
 Betty      | 70000   | 61500   | 8500
 Charles    | 92000   | 95500   | -3500
(5 rows)

Each row shows the employee’s salary, their department’s average, and the difference.

Subqueries vs. JOINs

Both can solve similar problems. When should you use each?

Example: Find Employees in Engineering Department

With Subquery:

SELECT first_name FROM employees
WHERE dept_id IN (SELECT dept_id FROM departments WHERE dept_name = 'Engineering');

With JOIN:

SELECT DISTINCT e.first_name FROM employees e
JOIN departments d ON e.dept_id = d.dept_id
WHERE d.dept_name = 'Engineering';

When to Use Subqueries

  • Simpler syntax for some problems
  • Aggregates in WHERE clause (scalar subqueries)
  • Existence checks (EXISTS)
  • Breaking complex problems into pieces

When to Use JOINs

  • Combine related data from multiple tables
  • Better performance on large datasets (usually)
  • Multiple joins (clearer than nested subqueries)
  • Industry standard (most developers expect JOINs)

Rule of thumb: Start with a JOIN if it makes sense. Use subqueries if it’s clearer or necessary.

Common Errors & Solutions

Error: “More than one row returned by subquery”

Problem:

SELECT first_name FROM employees
WHERE salary = (SELECT salary FROM employees WHERE dept_id = 2);

The subquery returns multiple rows (8 engineers with different salaries), but the comparison operator = expects one value.

Solution:

Use IN instead of =:

SELECT first_name FROM employees
WHERE salary IN (SELECT salary FROM employees WHERE dept_id = 2);

Or aggregate:

SELECT first_name FROM employees
WHERE salary = (SELECT AVG(salary) FROM employees WHERE dept_id = 2);

Error: “Invalid reference to FROM-clause variable”

Problem:

In a correlated subquery, you reference an alias that doesn’t exist:

SELECT first_name FROM employees e
WHERE EXISTS (SELECT 1 FROM salaries WHERE emp_id = s.emp_id);

The alias s isn’t defined in the FROM clause.

Solution:

Add the table to the subquery FROM:

SELECT first_name FROM employees e
WHERE EXISTS (SELECT 1 FROM salaries s WHERE s.emp_id = e.emp_id);

Error: “Subquery is returning different columns”

Problem:

SELECT first_name, (SELECT salary, hire_date FROM employees) FROM employees;

The subquery returns 2 columns, but you need exactly 1 for SELECT clause subqueries.

Solution:

Select only one column or use it in FROM:

-- One column
SELECT first_name, (SELECT salary FROM employees LIMIT 1) FROM employees;

-- Multiple columns in FROM
SELECT * FROM (SELECT first_name, salary FROM employees) as sub;

Performance Tips

1. Scalar Subqueries Can Be Slow

-- Slow: subquery executes for EVERY row
SELECT first_name,
       (SELECT COUNT(*) FROM salaries WHERE emp_id = e.emp_id)
FROM employees e;

On 1 million employees, this runs 1 million subqueries!

2. EXISTS is Usually Faster Than IN

-- Better for large datasets
WHERE EXISTS (SELECT 1 FROM other_table WHERE ...)

-- Slower for large datasets
WHERE id IN (SELECT id FROM other_table WHERE ...)

EXISTS stops at the first match. IN evaluates all matches.

3. Consider JOINs for Complex Problems

-- Complex nested subqueries
SELECT * FROM employees e WHERE dept_id IN (
    SELECT dept_id FROM departments WHERE manager_id IN (
        SELECT emp_id FROM employees WHERE salary > 80000
    )
);

-- Clearer with JOINs
SELECT DISTINCT e.* FROM employees e
JOIN departments d ON e.dept_id = d.dept_id
JOIN employees mgr ON d.manager_id = mgr.emp_id
WHERE mgr.salary > 80000;

4. Use CTEs (WITH Clause) for Readability

-- Readable
WITH high_earners AS (
    SELECT emp_id FROM salaries WHERE salary > 100000
)
SELECT * FROM employees WHERE emp_id IN (SELECT emp_id FROM high_earners);

-- Less readable
SELECT * FROM employees WHERE emp_id IN (
    SELECT emp_id FROM salaries WHERE salary > 100000
);

FAQ

Q: Can a subquery return multiple columns? A: Yes, if it’s in FROM clause. Not if it’s in WHERE or SELECT.

Q: Are subqueries evaluated once or multiple times? A: Depends. Scalar subqueries in WHERE are usually evaluated once. Correlated subqueries evaluate once per outer row.

Q: Can subqueries be nested multiple levels deep? A: Yes, but readability suffers. Use CTEs instead.

Q: What’s the performance difference between subquery and JOIN? A: Modern query optimizers often treat them identically. Write whichever is clearer.

Q: Can I UPDATE with a subquery? A: Yes:

UPDATE employees SET salary = 75000 WHERE dept_id IN (SELECT dept_id FROM departments WHERE location = 'NYC');

Q: What’s a CTE and how does it differ from a subquery? A: CTEs (WITH clause) are subqueries you name for clarity:

WITH named_subquery AS (SELECT ...)
SELECT * FROM named_subquery;

We cover CTEs in advanced PostgreSQL guides.

Next Steps

You now understand subqueries—a powerful tool for complex queries. Your next steps:

Subqueries are essential knowledge. Combine them with JOINs and you can answer virtually any question your data can provide!