PostgreSQL: Get Distinct Array Elements
I’ve worked with systems where arrays accumulate duplicates—maybe a user’s skill list has “Python” listed twice, or tags have been added inconsistently. You need to clean that up: remove duplicates, get a proper set of unique values, and ideally sort them for consistency. It’s more than just cosmetic; duplicates can break reporting, cause incorrect comparisons, and make your data look sloppy.
PostgreSQL’s array functions make this straightforward once you understand the pattern: unnest the array, apply DISTINCT, and reaggregate. It’s elegant and efficient.
Quick Answer: Use ARRAY_AGG(DISTINCT …)
To get only distinct array elements:
SELECT
emp_id,
first_name,
skills,
array_agg(DISTINCT skill ORDER BY skill) as unique_skills
FROM employees,
LATERAL UNNEST(skills) AS skill
WHERE skills IS NOT NULL
GROUP BY emp_id, first_name, skills
LIMIT 5;
Returns:
emp_id | first_name | skills | unique_skills
-------+------------+---------------------------+-----------------------------
1 | James | {Python,Java,Python,SQL} | {Java,Python,SQL}
2 | Mary | {Java,Go,Java,Go} | {Go,Java}
Duplicates removed, and results are sorted alphabetically.
How It Works
UNNEST converts the array into individual rows. ARRAY_AGG(DISTINCT ...) reassembles them, keeping only unique values. The ORDER BY sorts the result in the same operation.
Different Approaches & When to Use Them
Approach 1: ARRAY_AGG with DISTINCT (Simplest)
SELECT
emp_id,
array_agg(DISTINCT tag ORDER BY tag) as unique_tags
FROM employee_tags
WHERE tag IS NOT NULL
GROUP BY emp_id;
Best for: Straightforward deduplication.
Approach 2: Create Set-Like Behavior
SELECT
emp_id,
ARRAY(SELECT DISTINCT UNNEST(skills) ORDER BY 1) as unique_ordered_skills
FROM employees
WHERE skills IS NOT NULL;
Using ARRAY() constructor explicitly.
Approach 3: With Filter Conditions
Remove duplicates but only for specific values:
SELECT
emp_id,
array_agg(skill ORDER BY skill) as all_skills,
array_agg(DISTINCT skill) FILTER (WHERE skill LIKE 'Python%') as python_variants
FROM employees,
LATERAL UNNEST(skills) AS skill
GROUP BY emp_id;
Best for: Complex filtering while deduplicating.
Approach 4: Distinct and Count
See how many times each element appears:
SELECT
emp_id,
skill,
COUNT(*) as occurrences
FROM employees,
LATERAL UNNEST(skills) AS skill
WHERE skills IS NOT NULL
GROUP BY emp_id, skill
HAVING COUNT(*) > 1 -- Only show duplicates
ORDER BY emp_id, occurrences DESC;
Best for: Analyzing duplication patterns.
Real-World Example: Tag Deduplication
Consolidate duplicate tags in a tagging system:
SELECT
post_id,
title,
tags as original_tags,
array_agg(DISTINCT tag ORDER BY tag) as cleaned_tags,
ARRAY_LENGTH(tags, 1) as original_count,
array_length(array_agg(DISTINCT tag), 1) as unique_count
FROM posts,
LATERAL UNNEST(tags) AS tag
WHERE tags IS NOT NULL
GROUP BY post_id, title, tags
ORDER BY post_id;
Now you know exactly how much duplication exists and can clean data.
Update Columns with Distinct Values
Permanently clean arrays:
UPDATE employees
SET skills = (
SELECT array_agg(DISTINCT skill ORDER BY skill)
FROM UNNEST(skills) AS skill
)
WHERE skills IS NOT NULL;
-- Verify
SELECT emp_id, skills FROM employees LIMIT 5;
Performance Notes
| Operation | Speed | Notes |
|---|---|---|
| ARRAY_AGG(DISTINCT) | Medium | Fast for small arrays |
| UNNEST + GROUP BY | Medium | Same complexity |
| ARRAY() constructor | Medium | Alternative syntax |
All are efficient for typical array sizes (under 1000 elements).
Common Patterns
-- Simple distinct
array_agg(DISTINCT element ORDER BY element)
-- Distinct with custom sort
array_agg(DISTINCT element ORDER BY LENGTH(element), element)
-- Distinct then aggregate count
array_agg(DISTINCT element), COUNT(DISTINCT element)
-- Conditional distinct
array_agg(DISTINCT element) FILTER (WHERE condition)
-- Nested: distinct of distinct
SELECT DISTINCT unnest(unique_arrays) FROM (
SELECT array_agg(DISTINCT elem) as unique_arrays
FROM table
)
FAQ
Q: Does DISTINCT preserve order?
A: No, use ORDER BY to control order: array_agg(DISTINCT elem ORDER BY elem).
Q: Can I keep duplicates but see which are duplicated?
A: Use COUNT(*) > 1 in HAVING after GROUP BY.
Q: What about NULL values?
A: DISTINCT includes NULLs. Remove them first: ARRAY_REMOVE(array, NULL).
Q: Performance for very large arrays? A: UNNEST can be slow for 100k+ element arrays. Consider database design instead.
Q: Can I distinct based on array element property? A: Not directly. UNNEST first, then filter or GROUP BY the property.
Q: How do I deduplicate without sorting?
A: Use ARRAY() without ORDER BY: ARRAY(SELECT DISTINCT UNNEST(array)).
Wrapping Up
Getting distinct array elements requires expanding the array with UNNEST, deduplicating with ARRAY_AGG(DISTINCT …), and reassembling. It’s a standard pattern that works reliably.
The key: UNNEST → DISTINCT → ARRAY_AGG = clean, unique values.
Related Articles
- PostgreSQL Arrays - Complete array guide
- Remove Element from Array - Element removal
- Sort Array Elements - Sorting arrays