PostgreSQL Subqueries: Scalar, IN, and Correlated Subqueries Guide

PostgreSQL Subqueries: Nested SELECT Statements for Complex Logic

A subquery (or inner query) is a SELECT statement nested inside another SELECT. Subqueries let you break complex problems into smaller, more manageable pieces.

What You’ll Learn

This guide covers:

Scalar subqueries (returning one value)
IN subqueries (checking membership)
Correlated subqueries (referencing outer query)
EXISTS subqueries (checking existence)
When to use subqueries vs. JOINs
Performance considerations

Understanding Subqueries

A subquery is a SELECT inside a WHERE, FROM, or SELECT clause.

Basic Subquery Concept

Find employees earning more than the average salary:

SELECT first_name, salary FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Breaking it down:

Inner query: SELECT AVG(salary) FROM employees — Returns 79,833
Outer query: Find employees with salary > 79,833

Result:

 first_name | salary
------------+----------
 David      | 105000.00
 Jennifer   | 102000.00
 Susan      | 99000.00
 Richard    | 98000.00
 Joseph     | 96000.00
 Barbara    | 95000.00
 Charles    | 92000.00
 Mary       | 78000.00
 Patricia   | 76000.00
(9 rows)

This is a scalar subquery—it returns a single value (79,833).

Types of Subqueries

1. Scalar Subqueries

Return a single value. Used with comparison operators.

Example: Compare to average

SELECT first_name, salary,
       (SELECT AVG(salary) FROM employees) as avg_salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
ORDER BY salary DESC
LIMIT 3;

Result:

 first_name | salary  | avg_salary
------------+---------+------------
 David      | 105000  | 79833.33
 Jennifer   | 102000  | 79833.33
 Susan      | 99000   | 79833.33
(3 rows)

The subquery SELECT AVG(salary) returns 79,833.33, used for comparison.

Example: Find employee closest to average age

SELECT first_name, EXTRACT(YEAR FROM AGE(hire_date)) as tenure FROM employees
WHERE EXTRACT(YEAR FROM AGE(hire_date)) =
      (SELECT ROUND(AVG(EXTRACT(YEAR FROM AGE(hire_date))))
       FROM employees)
ORDER BY first_name;

This finds employees with tenure equal to the average tenure.

2. IN Subqueries

Check if a value exists in a subquery result.

Example: Find employees in engineering department

SELECT first_name FROM employees
WHERE dept_id IN (SELECT dept_id FROM departments WHERE dept_name = 'Engineering');

Result:

 first_name
------------
 David
 Jennifer
 Richard
 Barbara
 Charles
 Susan
 Joseph
 Jessica
(8 rows)

The subquery returns dept_id 2 (Engineering), and we find all employees in that department.

Example: Find employees who don’t work in high-revenue departments

Assuming we have a revenue table (not shown), we could do:

SELECT first_name, dept_id FROM employees
WHERE dept_id NOT IN (SELECT dept_id FROM departments WHERE location = 'San Francisco');

This finds employees not in the San Francisco office.

3. Correlated Subqueries

A subquery that references columns from the outer query. Executes once per outer row (slower, but powerful).

Example: Employees earning more than their department average

SELECT e1.first_name, e1.salary, e1.dept_id FROM employees e1
WHERE salary > (
    SELECT AVG(salary) FROM employees e2
    WHERE e2.dept_id = e1.dept_id
);

Breaking it down:

Outer query: Each employee
For each employee, correlated subquery: What’s the average salary in their department?
WHERE: Is their salary greater than their department average?

Result:

 first_name | salary  | dept_id
------------+---------+---------
 David      | 105000  |       2
 Jennifer   | 102000  |       2
 Susan      | 99000   |       2
 Richard    | 98000   |       2
 Joseph     | 96000   |       2
 Mary       | 78000   |       1
 Patricia   | 76000   |       1
 Thomas     | 82000   |       3
 Karen      | 80000   |       3
 Lisa       | 79000   |       3
(10 rows)

These employees earn above their department average.

4. EXISTS Subqueries

Check if a subquery returns any rows. Often faster than IN for large datasets.

Example: Find departments with employees

SELECT dept_name FROM departments d
WHERE EXISTS (SELECT 1 FROM employees e WHERE e.dept_id = d.dept_id);

Breaking it down:

For each department, check: “Do any employees exist in this department?”
SELECT 1 — The value doesn’t matter; we only check if rows exist
WHERE EXISTS — True if the subquery returns any rows

Result:

 dept_name
-----------
 Sales
 Engineering
 Marketing
 Operations
(4 rows)

All departments have employees.

Example: Find departments WITHOUT employees

SELECT dept_name FROM departments d
WHERE NOT EXISTS (SELECT 1 FROM employees e WHERE e.dept_id = d.dept_id);

This returns empty (all departments have employees in our test data).

Subqueries in FROM Clause

Treat a subquery result as a table.

Example: Find departments with above-average employee count

SELECT dept_summary.dept_id, dept_summary.count
FROM (
    SELECT dept_id, COUNT(*) as count FROM employees
    GROUP BY dept_id
) as dept_summary
WHERE count > (SELECT AVG(count) FROM (
    SELECT COUNT(*) as count FROM employees GROUP BY dept_id
) as avg_calc);

This is complex, but it shows subqueries in FROM.

Simpler version using a Common Table Expression (CTE):

WITH dept_counts AS (
    SELECT dept_id, COUNT(*) as count FROM employees GROUP BY dept_id
)
SELECT dept_id, count FROM dept_counts
WHERE count > (SELECT AVG(count) FROM dept_counts);

Subqueries in SELECT Clause

Return a value for each row.

Example: Employee salary vs. department average

SELECT first_name, salary,
       (SELECT AVG(salary) FROM employees e2
        WHERE e2.dept_id = e1.dept_id) as dept_avg,
       salary - (SELECT AVG(salary) FROM employees e2
                 WHERE e2.dept_id = e1.dept_id) as diff_from_avg
FROM employees e1
ORDER BY first_name
LIMIT 5;

Result:

 first_name | salary  | dept_avg | diff_from_avg
------------+---------+---------+---------------
 Ashley     | 66000   | 61500   | 4500
 Anthony    | 67000   | 61500   | 5500
 Barbara    | 95000   | 95500   | -500
 Betty      | 70000   | 61500   | 8500
 Charles    | 92000   | 95500   | -3500
(5 rows)

Each row shows the employee’s salary, their department’s average, and the difference.

Subqueries vs. JOINs

Both can solve similar problems. When should you use each?

Example: Find Employees in Engineering Department

With Subquery:

SELECT first_name FROM employees
WHERE dept_id IN (SELECT dept_id FROM departments WHERE dept_name = 'Engineering');

With JOIN:

SELECT DISTINCT e.first_name FROM employees e
JOIN departments d ON e.dept_id = d.dept_id
WHERE d.dept_name = 'Engineering';

When to Use Subqueries

Simpler syntax for some problems
Aggregates in WHERE clause (scalar subqueries)
Existence checks (EXISTS)
Breaking complex problems into pieces

When to Use JOINs

Combine related data from multiple tables
Better performance on large datasets (usually)
Multiple joins (clearer than nested subqueries)
Industry standard (most developers expect JOINs)

Rule of thumb: Start with a JOIN if it makes sense. Use subqueries if it’s clearer or necessary.

Common Errors & Solutions

Error: “More than one row returned by subquery”

Problem:

SELECT first_name FROM employees
WHERE salary = (SELECT salary FROM employees WHERE dept_id = 2);

The subquery returns multiple rows (8 engineers with different salaries), but the comparison operator = expects one value.

Solution:

Use IN instead of =:

SELECT first_name FROM employees
WHERE salary IN (SELECT salary FROM employees WHERE dept_id = 2);

Or aggregate:

SELECT first_name FROM employees
WHERE salary = (SELECT AVG(salary) FROM employees WHERE dept_id = 2);

Error: “Invalid reference to FROM-clause variable”

Problem:

In a correlated subquery, you reference an alias that doesn’t exist:

SELECT first_name FROM employees e
WHERE EXISTS (SELECT 1 FROM salaries WHERE emp_id = s.emp_id);

The alias s isn’t defined in the FROM clause.

Solution:

Add the table to the subquery FROM:

SELECT first_name FROM employees e
WHERE EXISTS (SELECT 1 FROM salaries s WHERE s.emp_id = e.emp_id);

Error: “Subquery is returning different columns”

Problem:

SELECT first_name, (SELECT salary, hire_date FROM employees) FROM employees;

The subquery returns 2 columns, but you need exactly 1 for SELECT clause subqueries.

Solution:

Select only one column or use it in FROM:

-- One column
SELECT first_name, (SELECT salary FROM employees LIMIT 1) FROM employees;

-- Multiple columns in FROM
SELECT * FROM (SELECT first_name, salary FROM employees) as sub;

Performance Tips

1. Scalar Subqueries Can Be Slow

-- Slow: subquery executes for EVERY row
SELECT first_name,
       (SELECT COUNT(*) FROM salaries WHERE emp_id = e.emp_id)
FROM employees e;

On 1 million employees, this runs 1 million subqueries!

2. EXISTS is Usually Faster Than IN

-- Better for large datasets
WHERE EXISTS (SELECT 1 FROM other_table WHERE ...)

-- Slower for large datasets
WHERE id IN (SELECT id FROM other_table WHERE ...)

EXISTS stops at the first match. IN evaluates all matches.

3. Consider JOINs for Complex Problems

-- Complex nested subqueries
SELECT * FROM employees e WHERE dept_id IN (
    SELECT dept_id FROM departments WHERE manager_id IN (
        SELECT emp_id FROM employees WHERE salary > 80000
    )
);

-- Clearer with JOINs
SELECT DISTINCT e.* FROM employees e
JOIN departments d ON e.dept_id = d.dept_id
JOIN employees mgr ON d.manager_id = mgr.emp_id
WHERE mgr.salary > 80000;

4. Use CTEs (WITH Clause) for Readability

-- Readable
WITH high_earners AS (
    SELECT emp_id FROM salaries WHERE salary > 100000
)
SELECT * FROM employees WHERE emp_id IN (SELECT emp_id FROM high_earners);

-- Less readable
SELECT * FROM employees WHERE emp_id IN (
    SELECT emp_id FROM salaries WHERE salary > 100000
);

FAQ

Q: Can a subquery return multiple columns? A: Yes, if it’s in FROM clause. Not if it’s in WHERE or SELECT.

Q: Are subqueries evaluated once or multiple times? A: Depends. Scalar subqueries in WHERE are usually evaluated once. Correlated subqueries evaluate once per outer row.

Q: Can subqueries be nested multiple levels deep? A: Yes, but readability suffers. Use CTEs instead.

Q: What’s the performance difference between subquery and JOIN? A: Modern query optimizers often treat them identically. Write whichever is clearer.

Q: Can I UPDATE with a subquery? A: Yes:

UPDATE employees SET salary = 75000 WHERE dept_id IN (SELECT dept_id FROM departments WHERE location = 'NYC');

Q: What’s a CTE and how does it differ from a subquery? A: CTEs (WITH clause) are subqueries you name for clarity:

WITH named_subquery AS (SELECT ...)
SELECT * FROM named_subquery;

We cover CTEs in advanced PostgreSQL guides.

Next Steps

You now understand subqueries—a powerful tool for complex queries. Your next steps:

PostgreSQL JOINs Complete Guide - Understand the alternative: combining tables
DISTINCT and GROUP BY Explained - Review aggregation for subqueries
Back to SELECT Guide - Solidify fundamentals

Subqueries are essential knowledge. Combine them with JOINs and you can answer virtually any question your data can provide!