PostgreSQL Distinct Array Elements: Remove Duplicate Values

PostgreSQL: Get Distinct Array Elements

I’ve worked with systems where arrays accumulate duplicates—maybe a user’s skill list has “Python” listed twice, or tags have been added inconsistently. You need to clean that up: remove duplicates, get a proper set of unique values, and ideally sort them for consistency. It’s more than just cosmetic; duplicates can break reporting, cause incorrect comparisons, and make your data look sloppy.

PostgreSQL’s array functions make this straightforward once you understand the pattern: unnest the array, apply DISTINCT, and reaggregate. It’s elegant and efficient.

Quick Answer: Use ARRAY_AGG(DISTINCT …)

To get only distinct array elements:

SELECT
    emp_id,
    first_name,
    skills,
    array_agg(DISTINCT skill ORDER BY skill) as unique_skills
FROM employees,
LATERAL UNNEST(skills) AS skill
WHERE skills IS NOT NULL
GROUP BY emp_id, first_name, skills
LIMIT 5;

Returns:

emp_id | first_name | skills                    | unique_skills
-------+------------+---------------------------+-----------------------------
     1 | James      | {Python,Java,Python,SQL}  | {Java,Python,SQL}
     2 | Mary       | {Java,Go,Java,Go}         | {Go,Java}

Duplicates removed, and results are sorted alphabetically.

How It Works

UNNEST converts the array into individual rows. ARRAY_AGG(DISTINCT ...) reassembles them, keeping only unique values. The ORDER BY sorts the result in the same operation.

Different Approaches & When to Use Them

Approach 1: ARRAY_AGG with DISTINCT (Simplest)

SELECT
    emp_id,
    array_agg(DISTINCT tag ORDER BY tag) as unique_tags
FROM employee_tags
WHERE tag IS NOT NULL
GROUP BY emp_id;

Best for: Straightforward deduplication.

Approach 2: Create Set-Like Behavior

SELECT
    emp_id,
    ARRAY(SELECT DISTINCT UNNEST(skills) ORDER BY 1) as unique_ordered_skills
FROM employees
WHERE skills IS NOT NULL;

Using ARRAY() constructor explicitly.

Approach 3: With Filter Conditions

Remove duplicates but only for specific values:

SELECT
    emp_id,
    array_agg(skill ORDER BY skill) as all_skills,
    array_agg(DISTINCT skill) FILTER (WHERE skill LIKE 'Python%') as python_variants
FROM employees,
LATERAL UNNEST(skills) AS skill
GROUP BY emp_id;

Best for: Complex filtering while deduplicating.

Approach 4: Distinct and Count

See how many times each element appears:

SELECT
    emp_id,
    skill,
    COUNT(*) as occurrences
FROM employees,
LATERAL UNNEST(skills) AS skill
WHERE skills IS NOT NULL
GROUP BY emp_id, skill
HAVING COUNT(*) > 1  -- Only show duplicates
ORDER BY emp_id, occurrences DESC;

Best for: Analyzing duplication patterns.

Real-World Example: Tag Deduplication

Consolidate duplicate tags in a tagging system:

SELECT
    post_id,
    title,
    tags as original_tags,
    array_agg(DISTINCT tag ORDER BY tag) as cleaned_tags,
    ARRAY_LENGTH(tags, 1) as original_count,
    array_length(array_agg(DISTINCT tag), 1) as unique_count
FROM posts,
LATERAL UNNEST(tags) AS tag
WHERE tags IS NOT NULL
GROUP BY post_id, title, tags
ORDER BY post_id;

Now you know exactly how much duplication exists and can clean data.

Update Columns with Distinct Values

Permanently clean arrays:

UPDATE employees
SET skills = (
    SELECT array_agg(DISTINCT skill ORDER BY skill)
    FROM UNNEST(skills) AS skill
)
WHERE skills IS NOT NULL;

-- Verify
SELECT emp_id, skills FROM employees LIMIT 5;

Performance Notes

Operation	Speed	Notes
ARRAY_AGG(DISTINCT)	Medium	Fast for small arrays
UNNEST + GROUP BY	Medium	Same complexity
ARRAY() constructor	Medium	Alternative syntax

All are efficient for typical array sizes (under 1000 elements).

Common Patterns

-- Simple distinct
array_agg(DISTINCT element ORDER BY element)

-- Distinct with custom sort
array_agg(DISTINCT element ORDER BY LENGTH(element), element)

-- Distinct then aggregate count
array_agg(DISTINCT element), COUNT(DISTINCT element)

-- Conditional distinct
array_agg(DISTINCT element) FILTER (WHERE condition)

-- Nested: distinct of distinct
SELECT DISTINCT unnest(unique_arrays) FROM (
    SELECT array_agg(DISTINCT elem) as unique_arrays
    FROM table
)

FAQ

Q: Does DISTINCT preserve order? A: No, use ORDER BY to control order: array_agg(DISTINCT elem ORDER BY elem).

Q: Can I keep duplicates but see which are duplicated? A: Use COUNT(*) > 1 in HAVING after GROUP BY.

Q: What about NULL values? A: DISTINCT includes NULLs. Remove them first: ARRAY_REMOVE(array, NULL).

Q: Performance for very large arrays? A: UNNEST can be slow for 100k+ element arrays. Consider database design instead.

Q: Can I distinct based on array element property? A: Not directly. UNNEST first, then filter or GROUP BY the property.

Q: How do I deduplicate without sorting? A: Use ARRAY() without ORDER BY: ARRAY(SELECT DISTINCT UNNEST(array)).

Wrapping Up

Getting distinct array elements requires expanding the array with UNNEST, deduplicating with ARRAY_AGG(DISTINCT …), and reassembling. It’s a standard pattern that works reliably.

The key: UNNEST → DISTINCT → ARRAY_AGG = clean, unique values.

PostgreSQL Arrays - Complete array guide
Remove Element from Array - Element removal
Sort Array Elements - Sorting arrays

PostgreSQL: Get Distinct Array Elements - Remove Duplicates