PostgreSQL Data Types: Complete Reference Guide

Every database column needs a data type. You might think this is just a technicality, but choosing the right data type affects query performance, storage size, and what operations you can perform. PostgreSQL offers far more data types than most databases, giving you precision and control.

This guide covers everything from basic numbers and strings to advanced JSON, arrays, and specialized types. You’ll learn not just the syntax, but when and why to use each type.

What You’ll Learn

This guide covers:

  • Numeric types - INTEGER, BIGINT, DECIMAL, REAL for different number requirements
  • String types - VARCHAR, TEXT, CHAR and when each is best
  • Date/Time types - DATE, TIMESTAMP, INTERVAL for temporal data
  • JSON & JSONB - Storing and querying semi-structured data
  • Arrays - Multiple values in a single column
  • Serial types - Auto-incrementing primary keys
  • UUID - Globally unique identifiers
  • Specialized types - Boolean, bytea, money, and more

Why Data Types Matter

Bad data type choices lead to real problems:

  1. Performance: INTEGER queries are orders of magnitude faster than TEXT comparisons for numbers
  2. Storage: BIGINT uses twice the space of INTEGER. TEXT is flexible but can bloat databases
  3. Correctness: Using TEXT for dates prevents date arithmetic. Using REAL for money causes rounding errors
  4. Constraints: VARCHAR(50) prevents entering data that’s too long. TEXT allows anything

We’ll cover the employee database that appears throughout this tutorial. You’re already familiar with its structure, but now we’ll understand why each column uses its data type.

Understanding PostgreSQL Data Types

PostgreSQL is strict about types. You can’t compare a date to a string without explicit conversion. You can’t store more decimals in INTEGER than it supports. This strictness prevents bugs.

Each data type has:

  • Storage size - How many bytes it occupies (affects query speed and disk space)
  • Range - What values it can hold
  • Operations - What you can do with it (arithmetic, comparison, string functions)
  • Performance - How fast queries using it run

Data Type Categories

PostgreSQL groups types into categories:

  • Numeric - Numbers (integers, decimals, floats)
  • String - Text data
  • Date/Time - Temporal information
  • Boolean - True/false values
  • Geometric - Points, lines, polygons
  • Network - IP addresses, MAC addresses
  • Bit - Binary data
  • JSON - Semi-structured data
  • Arrays - Multiple values
  • Range - Continuous ranges
  • UUID - Unique identifiers
  • Special - money, bytea, and more

Numeric Data Types

Numeric types are for storing numbers. PostgreSQL offers several, each with different precision, range, and storage requirements.

INTEGER (int, int4)

Standard whole numbers from -2,147,483,648 to 2,147,483,647 (4 bytes).

CREATE TABLE products (
    product_id INTEGER PRIMARY KEY,
    stock_quantity INTEGER NOT NULL,
    rating INTEGER CHECK (rating >= 1 AND rating <= 5)
);

-- Insert examples
INSERT INTO products (product_id, stock_quantity, rating) VALUES
(1, 250, 5),
(2, 0, 3),
(3, 1500, 4);

SELECT * FROM products;

Result:

product_id | stock_quantity | rating
-----------+----------------+-------
          1 |            250 |      5
          2 |              0 |      3
          3 |           1500 |      4

Use INTEGER for:

  • Counts and quantities
  • IDs and references
  • Ratings and scores
  • Any whole number where the range -2B to 2B is sufficient

BIGINT (int8)

Very large whole numbers from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 (8 bytes).

-- Employee salaries are usually fine with INTEGER,
-- but suppose we track cumulative payroll across all employees:
CREATE TABLE company_payroll (
    year INTEGER,
    total_paid_this_year BIGINT
);

INSERT INTO company_payroll (year, total_paid_this_year) VALUES
(2023, 2450000),  -- $2.45 million
(2024, 2650000);  -- $2.65 million

SELECT year, total_paid_this_year
FROM company_payroll;

Result:

year | total_paid_this_year
-----+---------------------
2023 |           2450000
2024 |           2650000

Use BIGINT when:

  • Numbers exceed 2 billion
  • Tracking very large sums
  • Dealing with millisecond timestamps
  • You need guaranteed future growth

SMALLINT (int2)

Small whole numbers from -32,768 to 32,767 (2 bytes).

CREATE TABLE office_locations (
    location_id SMALLINT PRIMARY KEY,
    building_floors SMALLINT,
    parking_spaces SMALLINT
);

INSERT INTO office_locations (location_id, building_floors, parking_spaces) VALUES
(1, 12, 200),
(2, 8, 150);

SELECT * FROM office_locations;

Result:

location_id | building_floors | parking_spaces
-------------+-----------------+---------------
            1 |              12 |            200
            2 |               8 |            150

Use SMALLINT when:

  • Numbers are guaranteed to be small (under 32K)
  • Saving storage space matters (rare in modern systems)
  • You’re designing for capacity constraints

DECIMAL (numeric)

Fixed-point numbers with arbitrary precision. Perfect for financial data where rounding errors are unacceptable.

-- Employee salaries and financial data
CREATE TABLE employees (
    emp_id INTEGER PRIMARY KEY,
    first_name VARCHAR(50),
    salary DECIMAL(10, 2),  -- 10 total digits, 2 after decimal
    bonus DECIMAL(8, 2)     -- Up to 999,999.99
);

INSERT INTO employees (emp_id, first_name, salary, bonus) VALUES
(1, 'James', 75000.50, 5000.00),
(2, 'Mary', 78000.00, 7500.25),
(3, 'David', 105000.99, 15000.75);

SELECT emp_id, first_name, salary, bonus,
       (salary + bonus) as total_comp
FROM employees;

Result:

emp_id | first_name | salary   | bonus     | total_comp
-------+------------+----------+-----------+----------
      1 | James      | 75000.50 | 5000.00   | 80000.50
      2 | Mary       | 78000.00 | 7500.25   | 85500.25
      3 | David      | 105000.99| 15000.75  | 120000.74

The DECIMAL(10, 2) means:

  • Maximum 10 total digits
  • Exactly 2 digits after the decimal point
  • Stores 99,999,999.99 as maximum

Use DECIMAL for:

  • Money (always use DECIMAL, never REAL)
  • Precise measurements
  • Financial calculations
  • Anything where rounding errors matter

REAL and DOUBLE PRECISION

Floating-point numbers for approximate values. Faster than DECIMAL but with rounding errors.

-- Scientific data, measurements where slight imprecision is acceptable
CREATE TABLE sensor_readings (
    reading_id INTEGER,
    temperature REAL,          -- 4 bytes
    pressure DOUBLE PRECISION  -- 8 bytes
);

INSERT INTO sensor_readings (reading_id, temperature, pressure) VALUES
(1, 98.6, 1013.25),
(2, 102.3, 1011.50);

SELECT * FROM sensor_readings;

Result:

reading_id | temperature | pressure
-----------+-------------+----------
          1 |       98.6  | 1013.25
          2 |      102.3  | 1011.5

Use REAL/DOUBLE PRECISION for:

  • Scientific calculations
  • Machine learning models
  • Situations where speed matters more than perfect precision
  • Never for money

NUMERIC vs DECIMAL

They’re identical in PostgreSQL. Use whichever you prefer.

String Data Types

String types store text. PostgreSQL offers several with different performance characteristics.

VARCHAR(n)

Variable-length strings up to n characters.

CREATE TABLE employees (
    emp_id INTEGER PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    phone VARCHAR(20)
);

INSERT INTO employees (emp_id, first_name, last_name, email, phone) VALUES
(1, 'James', 'Wilson', '[email protected]', '555-1234'),
(2, 'Mary', 'Johnson', '[email protected]', NULL);

SELECT * FROM employees;

Result:

emp_id | first_name | last_name |           email            |  phone
-------+------------+-----------+----------------------------+---------
      1 | James      | Wilson    | [email protected]   | 555-1234
      2 | Mary       | Johnson   | [email protected]   | NULL

Use VARCHAR(n) when:

  • You know the maximum length (names, phone numbers, codes)
  • You want to prevent accidentally storing enormous strings
  • You want clarity about data constraints

The constraint is enforced. Attempting to insert 'Alexander Reginald Fitzgerald' (28 chars) into VARCHAR(20) fails with an error.

TEXT

Unlimited-length strings. No length constraint.

CREATE TABLE employee_notes (
    note_id INTEGER PRIMARY KEY,
    emp_id INTEGER,
    notes TEXT
);

INSERT INTO employee_notes (note_id, emp_id, notes) VALUES
(1, 1, 'High performer. Promoted to Senior Engineer in Q2 2024.'),
(2, 2, 'Excellent communication skills. Led training program for new hires. Can handle complex projects independently.');

SELECT * FROM employee_notes;

Result:

note_id | emp_id |                              notes
---------+--------+----------------------------------------------------------------
       1 |      1 | High performer. Promoted to Senior Engineer in Q2 2024.
       2 |      2 | Excellent communication skills. Led training program...

Use TEXT when:

  • Length is truly unknown (descriptions, comments, notes)
  • You don’t need to enforce a maximum
  • You want flexibility

CHAR(n)

Fixed-length strings, padded with spaces if shorter.

CREATE TABLE departments (
    dept_code CHAR(3) PRIMARY KEY,
    dept_name VARCHAR(50)
);

INSERT INTO departments (dept_code, dept_name) VALUES
('ENG', 'Engineering'),
('SAL', 'Sales');

SELECT dept_code, dept_name FROM departments;

Result:

dept_code | dept_name
-----------+-----------
ENG       | Engineering
SAL       | Sales

Use CHAR(n) rarely - only for fixed-length codes like state abbreviations (always 2 chars), currency codes (always 3 chars), etc.

Performance Comparison

For typical queries, all three are fast. TEXT performs identically to VARCHAR - the length constraint is just validation. Use what makes sense for your data, not for performance reasons.

Date and Time Data Types

PostgreSQL handles temporal data precisely. You have four types with different precision levels.

DATE

Dates only, no time component. Ranges from 4713 BC to 5874897 AD.

CREATE TABLE employees (
    emp_id INTEGER PRIMARY KEY,
    first_name VARCHAR(50),
    hire_date DATE,
    birth_date DATE
);

INSERT INTO employees (emp_id, first_name, hire_date, birth_date) VALUES
(1, 'James', '2020-01-15', '1985-06-22'),
(2, 'Mary', '2019-03-20', '1988-11-10');

SELECT emp_id, first_name, hire_date,
       EXTRACT(YEAR FROM hire_date) as hire_year,
       CURRENT_DATE - hire_date as days_employed
FROM employees;

Result:

emp_id | first_name | hire_date  | hire_year | days_employed
-------+------------+------------+-----------+---------------
      1 | James      | 2020-01-15 |      2020 |           2167
      2 | Mary       | 2019-03-20 |      2019 |           2528

Use DATE when:

  • You only need the date (birth dates, hire dates, deadlines)
  • Time is irrelevant
  • You want to do date arithmetic (subtract dates, add intervals)

TIME

Time only, no date component. Range: 00:00:00 to 23:59:59.999999.

CREATE TABLE office_hours (
    location_id INTEGER PRIMARY KEY,
    opening_time TIME,
    closing_time TIME
);

INSERT INTO office_hours (location_id, opening_time, closing_time) VALUES
(1, '08:00:00', '17:00:00'),
(2, '09:00:00', '18:00:00');

SELECT * FROM office_hours;

Result:

location_id | opening_time | closing_time
-------------+--------------+---------------
            1 | 08:00:00     | 17:00:00
            2 | 09:00:00     | 18:00:00

Use TIME when:

  • You’re storing just a time (work hours, meeting times)
  • The date isn’t relevant
  • Rarely used - most applications need both

TIMESTAMP (without time zone)

Date and time together. Assumes all times are in the same unknown timezone.

CREATE TABLE employee_activities (
    activity_id INTEGER,
    emp_id INTEGER,
    activity TEXT,
    created_at TIMESTAMP
);

INSERT INTO employee_activities (activity_id, emp_id, activity, created_at) VALUES
(1, 1, 'Logged in', '2026-02-21 09:30:45'),
(2, 2, 'Submitted report', '2026-02-21 14:15:30');

SELECT * FROM employee_activities;

Result:

activity_id | emp_id | activity          | created_at
-----------+--------+-------------------+---------------------
            1 |      1 | Logged in         | 2026-02-21 09:30:45
            2 |      2 | Submitted report  | 2026-02-21 14:15:30

This is useful for application logs where the timezone is implicit.

TIMESTAMP WITH TIME ZONE

Date and time with explicit timezone. Stored in UTC internally, displays in server timezone.

CREATE TABLE international_meetings (
    meeting_id INTEGER PRIMARY KEY,
    meeting_title VARCHAR(100),
    scheduled_at TIMESTAMP WITH TIME ZONE
);

INSERT INTO international_meetings (meeting_id, meeting_title, scheduled_at) VALUES
(1, 'Team Sync', '2026-03-15 14:00:00 EST'),
(2, 'Board Meeting', '2026-03-15 20:00:00 PST');

SELECT meeting_id, meeting_title, scheduled_at AT TIME ZONE 'UTC' as utc_time
FROM international_meetings;

Result:

meeting_id | meeting_title |         scheduled_at         | utc_time
-----------+---------------+------------------------------+---------------------
            1 | Team Sync     | 2026-03-15 19:00:00+00      | 2026-03-15 19:00:00
            2 | Board Meeting | 2026-03-16 04:00:00+00      | 2026-03-16 04:00:00

Use TIMESTAMP WITH TIME ZONE when:

  • You have global users across timezones
  • You need accurate scheduling (correct choice for most applications)
  • Consistency matters across regions

Use TIMESTAMP (without zone) when:

  • All users are in one timezone
  • You’re logging application events in local time

INTERVAL

Duration between two times.

-- How long have employees worked?
SELECT emp_id, first_name, hire_date,
       NOW()::DATE - hire_date as days_worked,
       AGE(NOW(), hire_date) as employment_duration
FROM employees
LIMIT 3;

Result:

emp_id | first_name | hire_date  | days_worked | employment_duration
-------+------------+------------+-------------+---------------------
      1 | James      | 2020-01-15 |        2167 | 6 years 1 mon 6 days
      2 | Mary       | 2019-03-20 |        2528 | 6 years 11 mons 1 day

Use INTERVAL for:

  • Duration between times
  • Time calculations
  • AGE() function for human-readable durations

JSON and JSONB Data Types

Store semi-structured data - flexible when the exact schema isn’t fixed.

JSON (text-based)

Stores JSON as text. Slower queries but preserves exact formatting.

CREATE TABLE employee_metadata (
    emp_id INTEGER PRIMARY KEY,
    extra_data JSON
);

INSERT INTO employee_metadata (emp_id, extra_data) VALUES
(1, '{"skills": ["Python", "PostgreSQL"], "certifications": ["AWS", "PMP"]}'),
(2, '{"skills": ["Java", "Spring"], "years_experience": 8}');

SELECT emp_id, extra_data
FROM employee_metadata;

Result:

emp_id | extra_data
-------+--------------------------------------------------------------------
      1 | {"skills": ["Python", "PostgreSQL"], "certifications": ["AWS", "PMP"]}
      2 | {"skills": ["Java", "Spring"], "years_experience": 8}

JSONB (binary format)

Stores JSON in parsed binary format. Faster queries, supports indexing, but whitespace is lost.

CREATE TABLE employee_settings (
    emp_id INTEGER PRIMARY KEY,
    preferences JSONB
);

INSERT INTO employee_settings (emp_id, preferences) VALUES
(1, '{"theme": "dark", "notifications": true, "language": "en"}'),
(2, '{"theme": "light", "notifications": false, "language": "es"}');

-- Query JSON data
SELECT emp_id,
       preferences->>'theme' as theme,
       preferences->'notifications' as notifications_enabled
FROM employee_settings;

Result:

emp_id | theme | notifications_enabled
-------+-------+-------------------
      1 | dark  | true
      2 | light | false

Use JSON when:

  • You have semi-structured data
  • JSONB is usually better, but use JSON if you need exact formatting preserved

Use JSONB when:

  • You need to query inside the JSON
  • You’ll be indexing JSON fields
  • Performance matters
  • Default choice for JSON storage

Array Data Types

Store multiple values in a single column.

Basic Arrays

Any type can be an array: INTEGER[], TEXT[], DATE[], etc.

CREATE TABLE skills_matrix (
    emp_id INTEGER PRIMARY KEY,
    programming_languages TEXT[],
    project_ids INTEGER[],
    certifications TEXT[]
);

INSERT INTO skills_matrix (emp_id, programming_languages, project_ids, certifications) VALUES
(1, ARRAY['Python', 'JavaScript', 'SQL'], ARRAY[101, 102, 105], ARRAY['AWS', 'Docker']),
(2, ARRAY['Java', 'C++'], ARRAY[103, 104], ARRAY['PMP']);

SELECT emp_id, programming_languages, certifications
FROM skills_matrix;

Result:

emp_id | programming_languages | certifications
-------+-----------------------+-------------------
      1 | {Python,JavaScript,SQL}| {AWS,Docker}
      2 | {Java,C++}            | {PMP}

Query array elements:

-- Find all employees who know Python
SELECT emp_id, programming_languages
FROM skills_matrix
WHERE 'Python' = ANY(programming_languages);

Result:

emp_id | programming_languages
-------+-----------------------
      1 | {Python,JavaScript,SQL}

Use arrays when:

  • You have a known set of related values
  • You want simplicity over normalization
  • Performance of these queries isn’t critical

Note: If you’re often querying array elements, normalization (separate table) is usually better for performance.

UUID Type

Universally Unique Identifiers - 128-bit values guaranteed to be globally unique.

CREATE TABLE audit_log (
    log_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    emp_id INTEGER,
    action VARCHAR(50),
    timestamp TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

INSERT INTO audit_log (emp_id, action) VALUES
(1, 'Logged in'),
(2, 'Updated profile');

SELECT log_id, emp_id, action, timestamp
FROM audit_log;

Result:

                 log_id                 | emp_id | action          | timestamp
-------------------------------------+--------+-----------------+---------------------
550e8400-e29b-41d4-a716-446655440000 |      1 | Logged in       | 2026-02-21 10:30:15
6ba7b810-9dad-11d1-80b4-00c04fd430c8 |      2 | Updated profile | 2026-02-21 10:31:22

Use UUID when:

  • Distributing data across multiple databases (UUID is unique globally)
  • You need unguessable IDs for security
  • You’re integrating with external systems
  • You want to generate IDs before inserting to the database

Use INTEGER/BIGINT with SERIAL when:

  • You need simple sequential IDs
  • You don’t need global uniqueness
  • You want smaller storage (4 bytes vs 16 bytes)

Serial and Identity Types

Auto-incrementing integers for primary keys.

SERIAL

Creates an INTEGER with auto-increment.

CREATE TABLE departments (
    dept_id SERIAL PRIMARY KEY,
    dept_name VARCHAR(100) NOT NULL
);

INSERT INTO departments (dept_name) VALUES
('Sales'),
('Engineering'),
('Marketing');

SELECT * FROM departments;

Result:

dept_id | dept_name
--------+-----------
      1 | Sales
      2 | Engineering
      3 | Marketing

SERIAL automatically increments starting at 1.

BIGSERIAL

For very large sequences (same as SERIAL but BIGINT).

CREATE TABLE transactions (
    transaction_id BIGSERIAL PRIMARY KEY,
    amount DECIMAL(10, 2),
    timestamp TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

INSERT INTO transactions (amount) VALUES (250.00), (125.50), (89.99);

SELECT * FROM transactions;

Result:

transaction_id | amount  | timestamp
---------------+---------+---------------------
              1 | 250.00  | 2026-02-21 10:35:00
              2 | 125.50  | 2026-02-21 10:35:01
              3 |  89.99  | 2026-02-21 10:35:02

Use SERIAL for:

  • Auto-incrementing primary keys
  • Simple, sequential IDs
  • Small to medium tables (under 2 billion rows)

Use BIGSERIAL when:

  • You’re close to 2 billion rows
  • You need guaranteed growth capacity

Boolean Type

True/false values. Occupies 1 byte.

CREATE TABLE employee_status (
    emp_id INTEGER PRIMARY KEY,
    is_active BOOLEAN,
    is_manager BOOLEAN,
    has_office BOOLEAN
);

INSERT INTO employee_status (emp_id, is_active, is_manager, has_office) VALUES
(1, TRUE, TRUE, TRUE),
(2, TRUE, FALSE, FALSE),
(3, FALSE, FALSE, NULL);

SELECT emp_id, is_active, is_manager, has_office
FROM employee_status;

Result:

emp_id | is_active | is_manager | has_office
-------+-----------+------------+-----------
      1 | t         | t          | t
      2 | t         | f          | f
      3 | f         | f          | NULL

PostgreSQL displays TRUE/FALSE as t/f. You can insert using TRUE, FALSE, ’t’, ‘f’, ‘yes’, ’no’, ‘on’, ‘off’, ‘1’, ‘0’.

Use BOOLEAN for:

  • Binary flags and toggles
  • Membership and permissions
  • Anything true/false

Specialized Types

PostgreSQL has several other types for specific use cases.

BYTEA (binary data)

Raw bytes - images, files, encrypted data.

CREATE TABLE document_storage (
    doc_id INTEGER PRIMARY KEY,
    doc_name VARCHAR(255),
    file_content BYTEA
);

-- In practice, you'd load binary files here
-- For this example, we'll use hex encoding
INSERT INTO document_storage (doc_id, doc_name, file_content) VALUES
(1, 'contract.pdf', E'\\x255044462d312e34');  -- PDF file header

SELECT doc_id, doc_name, length(file_content) as file_size_bytes
FROM document_storage;

Result:

doc_id | doc_name    | file_size_bytes
-------+-------------+----------------
      1 | contract.pdf|              9

Use BYTEA for:

  • Binary files (PDFs, images)
  • Encrypted data
  • Serialized objects

Usually it’s better to store files on disk and keep only the path in the database, but BYTEA works for small binaries.

MONEY

Fixed-point currency type. Automatically handles currency operations.

CREATE TABLE salaries (
    emp_id INTEGER PRIMARY KEY,
    annual_salary MONEY
);

INSERT INTO salaries (emp_id, annual_salary) VALUES
(1, '$75,000.00'),
(2, '$105,000.50');

SELECT emp_id, annual_salary
FROM salaries;

Result:

emp_id | annual_salary
-------+---------------
      1 | $75,000.00
      2 | $105,000.50

Honestly, DECIMAL(10,2) is usually better than MONEY. MONEY handles currency arithmetic but DECIMAL is more flexible.

Common Errors & Solutions

Error: “value too long for type character varying(n)”

Problem:

CREATE TABLE employees (
    last_name VARCHAR(10)
);

INSERT INTO employees (last_name) VALUES ('Fitzgerald');  -- 10 characters, OK
INSERT INTO employees (last_name) VALUES ('Cunningham'); -- 11 characters, ERROR

Solution:

Increase the VARCHAR limit or use TEXT:

-- Option 1: Increase VARCHAR limit
ALTER TABLE employees ALTER COLUMN last_name TYPE VARCHAR(50);

-- Option 2: Use TEXT for unlimited length
ALTER TABLE employees ALTER COLUMN last_name TYPE TEXT;

Error: “operator does not exist: date - date”

Problem:

SELECT CURRENT_DATE - '2020-01-01';  -- Works
SELECT CURRENT_DATE - INTERVAL '1 day';  -- Works
SELECT CURRENT_DATE - 'not a date';  -- ERROR

Solution:

Cast the value to the correct type:

SELECT CURRENT_DATE - '2020-01-01'::DATE;
SELECT CAST('2020-01-01' AS DATE);

Error: “invalid input syntax for type integer”

Problem:

CREATE TABLE scores (score INTEGER);
INSERT INTO scores VALUES ('abc');  -- ERROR - not a number

Solution:

Ensure the data is numeric:

INSERT INTO scores VALUES ('123');  -- OK, string that can convert
INSERT INTO scores VALUES (123);    -- OK, actual number

Rounding Errors with REAL/DOUBLE PRECISION

Problem:

SELECT 0.1 + 0.2 = 0.3;  -- Returns FALSE with floating point

Solution:

Use DECIMAL for money:

SELECT 0.1::DECIMAL + 0.2::DECIMAL = 0.3::DECIMAL;  -- TRUE

Performance Tips

1. Choose Fixed Sizes When Possible

VARCHAR(50) is no slower than VARCHAR(1000), but VARCHAR(1000) wastes space if values are typically 20 characters.

-- Good: Constraint prevents bloat
CREATE TABLE posts (
    title VARCHAR(255) NOT NULL,
    content TEXT NOT NULL  -- Unlimited is fine here
);

-- Avoid: Overly generous limits
CREATE TABLE posts (
    title VARCHAR(10000),  -- Wastes space
    content VARCHAR(1000000)  -- Just use TEXT
);

2. Use Proper Numeric Types

BIGINT doesn’t slow queries versus INTEGER, but it uses twice the storage. Use INTEGER unless you need the range.

-- Good: Appropriate for the domain
CREATE TABLE orders (
    order_id INTEGER PRIMARY KEY,  -- 2B is enough
    total_sold_this_year BIGINT    -- Millions, so use BIGINT
);

3. Index JSON Strategically

Querying unindexed JSONB is slow on large datasets:

-- Create index on JSON field
CREATE INDEX idx_preferences_theme
  ON employee_settings USING GIN(preferences);

-- Now this query uses the index
SELECT * FROM employee_settings
WHERE preferences @> '{"theme": "dark"}';

4. Use JSONB, Not JSON

JSONB is faster for queries and supports indexing. Use it unless you have a specific reason to preserve whitespace.

FAQ

Q: Should I use VARCHAR or TEXT? A: TEXT when length is unknown (descriptions, notes). VARCHAR(n) when you want to enforce a maximum (names, codes, emails).

Q: What’s the difference between TIMESTAMP and TIMESTAMP WITH TIME ZONE? A: TIMESTAMP assumes all times are in the same timezone (faster). TIMESTAMP WITH TIME ZONE handles multiple timezones and converts to UTC (correct for global systems).

Q: When should I use DECIMAL vs REAL? A: DECIMAL for money (always). REAL for approximate values where speed matters (scientific data, ML models).

Q: Is UUID slower than INTEGER? A: UUID takes more storage (16 bytes vs 4-8 bytes) and comparisons are slower, but for most applications it’s negligible. Use UUID for distributed systems, INTEGER for simple sequential IDs.

Q: Can I change a column’s data type after creating it? A: Yes, with ALTER TABLE: ALTER TABLE employees ALTER COLUMN salary TYPE DECIMAL(10,2); But data must be convertible to the new type.

Q: Should I use ENUM for a fixed set of values? A: ENUM is useful for restricted values like status: CREATE TYPE status_type AS ENUM ('active', 'inactive', 'onleave'); But VARCHAR with a CHECK constraint also works.

Q: What’s the difference between serial and identity? A: SERIAL uses a sequence behind the scenes. IDENTITY is the SQL standard way. Both auto-increment; IDENTITY is newer and slightly preferable but SERIAL is more common in existing PostgreSQL code.

Next Steps

You now understand PostgreSQL’s rich data type system. Your next learning areas:

Choosing the right data types is foundational to database design. With this knowledge, you can design schemas that are both performant and maintainable.