PostgreSQL: Select Every Nth Row

When you’re working with large datasets, sometimes you need to sample data—taking every 5th row, every 10th row, or every nth row for analysis, testing, or reporting. Maybe you’re selecting 10% of transactions for audit review, sampling customer records for surveys, or picking key points from time series data for visualization. It’s more efficient than loading all data and filtering in application code.

PostgreSQL makes this straightforward with window functions and modulo arithmetic. You assign row numbers, then use modulo to find every nth row. Once you understand this pattern, you’ll use it for all kinds of sampling scenarios.

The Problem

You have 30 employees and want to select every 5th row (rows 5, 10, 15, 20, 25, 30).

Solution 1: Using MOD with Row Number

SELECT
    emp_id,
    first_name,
    last_name,
    salary
FROM (
    SELECT
        emp_id,
        first_name,
        last_name,
        salary,
        ROW_NUMBER() OVER (ORDER BY emp_id) as rn
    FROM employees
) subquery
WHERE MOD(rn, 5) = 0
ORDER BY emp_id;

Result:

emp_id | first_name | last_name | salary
-------+------------+-----------+--------
     5 | Michael    | Davis     | 70000
    10 | Charles    | Anderson  | 92000
    15 | Thomas     | Harris    | 82000
    20 | Christopher| Lewis     | 78000
    25 | Nancy      | Walker    | 75000
    30 | Unknown    | Unknown   | 72000

How It Works

MOD(rn, 5) = 0 means: rows where the row number is evenly divisible by 5 (remainder of 0). This selects rows 5, 10, 15, 20, 25, 30.

Change the divisor to sample differently:

  • MOD(rn, 2) = 0 → every 2nd row (all even-numbered rows)
  • MOD(rn, 3) = 0 → every 3rd row
  • MOD(rn, 10) = 0 → every 10th row

Solution 2: Using FILTER for Different Starting Points

Select every nth row starting from a specific offset:

-- Select every 5th row starting from row 1 (1, 6, 11, 16, 21, 26...)
SELECT
    emp_id,
    first_name,
    salary
FROM (
    SELECT
        emp_id,
        first_name,
        salary,
        ROW_NUMBER() OVER (ORDER BY emp_id) as rn
    FROM employees
) subquery
WHERE MOD(rn - 1, 5) = 0
ORDER BY emp_id;

MOD(rn - 1, 5) = 0 shifts the pattern by 1, selecting rows 1, 6, 11, 16, 21, 26.

Solution 3: Sampling with LIMIT and OFFSET

Sample approximately every nth row without gaps:

-- Get 6 rows evenly distributed from 30 total rows
SELECT
    emp_id,
    first_name,
    salary
FROM employees
ORDER BY RANDOM()
LIMIT 6;

This randomly samples 6 employees (approximately every 5th row). Use RANDOM() for statistical sampling.

Real-World Example: Department Sampling

Select every 3rd employee from the Engineering department:

SELECT
    emp_id,
    first_name,
    last_name,
    salary,
    dept_name
FROM (
    SELECT
        e.emp_id,
        e.first_name,
        e.last_name,
        e.salary,
        d.dept_name,
        ROW_NUMBER() OVER (PARTITION BY d.dept_name ORDER BY e.emp_id) as dept_rn
    FROM employees e
    JOIN departments d ON e.dept_id = d.dept_id
    WHERE d.dept_name = 'Engineering'
) subquery
WHERE MOD(dept_rn, 3) = 0
ORDER BY emp_id;

PARTITION BY dept_name resets the row count for each department, so “every 3rd” applies per department, not globally.

Performance Tips

  • Use ROW_NUMBER() for consistent ordering
  • Add INDEX on ORDER BY column for faster row numbering
  • Use PARTITION BY to sample within groups
  • MOD is fast - minimal performance overhead

Common Variations

-- Every nth row starting from row 1
WHERE MOD(rn - 1, n) = 0

-- Every nth row starting from row 0
WHERE MOD(rn, n) = 0

-- Every nth row in each group
WHERE MOD(rn, n) = 0 AND PARTITION BY group_column

FAQ

Q: Which method is fastest? A: MOD with ROW_NUMBER is fastest for large datasets.

Q: Can I sample across multiple groups? A: Yes, use PARTITION BY in the OVER clause to reset counting per group.

Q: What if I want random sampling? A: Use ORDER BY RANDOM() LIMIT n instead of MOD.