Item 2 – Use citext for Case-Insensitive Identifiers

The Problem: Case Sensitivity Creates User Experience Nightmares

You’ve built a user registration system. Alice signs up with [email protected]. A week later, she tries to log in but types [email protected]. Your login fails because PostgreSQL’s TEXT type is case-sensitive:

-- Alice's account
INSERT INTO users (email, password_hash) 
VALUES ('[email protected]', '$2b$12$...');

-- Alice tries to log in
SELECT * FROM users 
WHERE email = '[email protected]';
-- Returns 0 rows! Authentication fails.

This is frustrating for users and creates duplicate account problems:

-- Alice creates a "new" account with different casing
INSERT INTO users (email, password_hash) 
VALUES ('[email protected]', '$2b$12$...');
-- ✓ Success! Now Alice has two accounts.

The Naive Solution: LOWER() Everywhere

The common workaround is to use LOWER() in queries:

-- Store everything in lowercase
INSERT INTO users (email, password_hash) 
VALUES (LOWER('[email protected]'), '$2b$12$...');

-- Query with LOWER()
SELECT * FROM users 
WHERE LOWER(email) = LOWER('[email protected]');

Problems with this approach:

Problem 1: You Must Remember LOWER() Everywhere

One forgotten LOWER() and you have a bug:

-- ❌ Developer forgets LOWER() in a new feature
SELECT * FROM users 
WHERE email = '[email protected]';  -- Returns nothing

-- ❌ Background job doesn't use LOWER()
UPDATE users SET last_login = NOW() 
WHERE email = user_input;  -- Case-sensitive match fails

Problem 2: Can’t Use Standard Indexes

A regular index on email won’t help queries with LOWER():

CREATE INDEX idx_users_email ON users(email);

-- This query can't use the index!
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
-- Seq Scan on users (SLOW!)

You need a functional index instead:

CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Now this query can use the index
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
-- Index Scan using idx_users_email_lower (FAST!)

Problem 3: Unique Constraints Don’t Work as Expected

CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    email TEXT UNIQUE,  -- This is case-sensitive!
    password_hash TEXT
);

INSERT INTO users VALUES (1, '[email protected]', 'hash1');
INSERT INTO users VALUES (2, '[email protected]', 'hash2');
-- ✓ Both succeed! You now have duplicate emails.

To fix this, you need a partial unique index:

CREATE UNIQUE INDEX idx_users_email_unique_lower 
ON users(LOWER(email));

Problem 4: Code Complexity and Maintenance Burden

Every developer needs to remember:

  • Always use LOWER() in INSERT/UPDATE
  • Always use LOWER() in WHERE clauses
  • Always use LOWER() in JOIN conditions
  • Create functional indexes on LOWER(column)
  • Create unique indexes on LOWER(column)

This is error-prone and creates a maintenance nightmare.


The Effective Solution: citext Type

PostgreSQL provides the citext (case-insensitive text) extension that handles all of this automatically:

-- Enable the extension (once per database)
CREATE EXTENSION IF NOT EXISTS citext;

-- Create table with citext
CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    email citext UNIQUE,
    password_hash TEXT
);

Now case-insensitivity works automatically:

-- Insert with mixed case
INSERT INTO users VALUES (1, '[email protected]', 'hash1');

-- Query with different case - it works!
SELECT * FROM users WHERE email = '[email protected]';
-- Returns Alice's row ✓

-- Try to insert duplicate with different case
INSERT INTO users VALUES (2, '[email protected]', 'hash2');
-- ERROR: duplicate key value violates unique constraint "users_email_key"
-- DETAIL: Key (email)=([email protected]) already exists.

Magic! The database handles case-insensitivity at the type level.


How citext Works Under the Hood

citext is not a native type—it’s a domain built on top of TEXT with custom comparison operators:

-- These are all equivalent when comparing citext values
'Alice'::citext = 'alice'::citext  -- true
'Alice'::citext = 'ALICE'::citext  -- true
'Alice'::citext = 'aLiCe'::citext  -- true

Comparison Operators

All standard operators work case-insensitively:

SELECT 
    'alice'::citext = 'ALICE'::citext as equals,           -- true
    'alice'::citext != 'bob'::citext as not_equals,        -- true
    'alice'::citext < 'BOB'::citext as less_than,          -- true (lexicographic)
    'charlie'::citext > 'BOB'::citext as greater_than,     -- true
    'alice'::citext LIKE 'AL%' as pattern_match;           -- true

LIKE and ILIKE

-- With citext, LIKE is already case-insensitive!
SELECT email FROM users WHERE email LIKE 'alice%';
-- Matches: '[email protected]', '[email protected]', '[email protected]'

-- ILIKE still works but is redundant
SELECT email FROM users WHERE email ILIKE 'alice%';
-- Same results

Real-World Examples

Example 1: User Authentication System

CREATE TABLE users (
    id INTEGER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    email citext UNIQUE NOT NULL,
    username citext UNIQUE NOT NULL,
    password_hash TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Create index (standard B-tree works perfectly!)
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_username ON users(username);

Registration:

-- User registers with mixed case
INSERT INTO users (email, username, password_hash) 
VALUES ('[email protected]', 'AliceSmith', '$2b$12$...')
RETURNING id;
-- ✓ Success: id = 1

-- Someone tries to register with different casing
INSERT INTO users (email, username, password_hash) 
VALUES ('[email protected]', 'alicesmith', '$2b$12$...');
-- ✗ ERROR: duplicate key value violates unique constraint

Login (email or username):

-- Login with different case variations
SELECT id, password_hash FROM users 
WHERE email = '[email protected]' OR username = 'alicesmith';
-- Returns Alice's record ✓

SELECT id, password_hash FROM users 
WHERE email = '[email protected]' OR username = 'AliceSmith';
-- Returns Alice's record ✓

Example 2: Product SKU Codes

SKUs are often case-insensitive in practice:

CREATE TABLE products (
    id INTEGER PRIMARY KEY,
    sku citext UNIQUE NOT NULL,
    name TEXT NOT NULL,
    price NUMERIC(10, 2) NOT NULL
);

INSERT INTO products VALUES 
    (1, 'WIDGET-001', 'Basic Widget', 19.99),
    (2, 'GADGET-002', 'Premium Gadget', 49.99);

Queries work regardless of case:

-- Customer searches with lowercase
SELECT * FROM products WHERE sku = 'widget-001';
-- Returns: WIDGET-001 ✓

-- Warehouse system uses uppercase
SELECT * FROM products WHERE sku = 'WIDGET-001';
-- Returns: WIDGET-001 ✓

-- API uses mixed case
SELECT * FROM products WHERE sku = 'Widget-001';
-- Returns: WIDGET-001 ✓

Duplicate prevention:

-- Try to insert duplicate with different case
INSERT INTO products VALUES (3, 'widget-001', 'Duplicate', 29.99);
-- ✗ ERROR: duplicate key value violates unique constraint

Example 3: Tags and Categories

CREATE TABLE tags (
    id INTEGER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    name citext UNIQUE NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Users create tags with various casings
INSERT INTO tags (name) VALUES ('PostgreSQL');
INSERT INTO tags (name) VALUES ('Database');
INSERT INTO tags (name) VALUES ('Tutorial');

Finding or creating tags:

-- Function to get or create tag (case-insensitive lookup)
CREATE FUNCTION get_or_create_tag(tag_name citext) 
RETURNS INTEGER AS $$
DECLARE
    tag_id INTEGER;
BEGIN
    -- Try to find existing tag (case-insensitive)
    SELECT id INTO tag_id FROM tags WHERE name = tag_name;
    
    IF tag_id IS NULL THEN
        -- Create new tag
        INSERT INTO tags (name) VALUES (tag_name) RETURNING id INTO tag_id;
    END IF;
    
    RETURN tag_id;
END;
$$ LANGUAGE plpgsql;

-- These all return the same tag ID
SELECT get_or_create_tag('postgresql');   -- Returns 1
SELECT get_or_create_tag('PostgreSQL');   -- Returns 1
SELECT get_or_create_tag('POSTGRESQL');   -- Returns 1
SELECT get_or_create_tag('PoStGrEsQl');   -- Returns 1

Example 4: Domain Names and Hostnames

Domain names are case-insensitive by specification:

CREATE TABLE websites (
    id INTEGER PRIMARY KEY,
    domain citext UNIQUE NOT NULL,
    owner_id INTEGER NOT NULL,
    ssl_enabled BOOLEAN DEFAULT false
);

INSERT INTO websites VALUES 
    (1, 'Example.com', 100, true),
    (2, 'GitHub.com', 101, true);

Lookups work naturally:

-- User types domain in various ways
SELECT * FROM websites WHERE domain = 'example.com';     -- ✓ Found
SELECT * FROM websites WHERE domain = 'Example.com';     -- ✓ Found
SELECT * FROM websites WHERE domain = 'EXAMPLE.COM';     -- ✓ Found
SELECT * FROM websites WHERE domain = 'ExAmPlE.CoM';     -- ✓ Found

Example 5: File Paths (Case-Insensitive Filesystems)

For systems running on Windows or macOS (case-insensitive filesystems):

CREATE TABLE file_metadata (
    id INTEGER PRIMARY KEY,
    path citext UNIQUE NOT NULL,
    size_bytes BIGINT NOT NULL,
    modified_at TIMESTAMPTZ NOT NULL
);

-- Store file with one casing
INSERT INTO file_metadata VALUES 
    (1, '/Users/Alice/Documents/Report.pdf', 1048576, NOW());

-- Find file with different casing
SELECT * FROM file_metadata 
WHERE path = '/users/alice/documents/report.pdf';
-- ✓ Found (matches filesystem behavior on Mac/Windows)

Advanced Patterns

Pattern 1: Partial citext for Mixed-Case Storage with Case-Insensitive Queries

You want to preserve the original case but query case-insensitively:

CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    email_display TEXT NOT NULL,      -- Original case preserved
    email_search citext UNIQUE NOT NULL  -- For searching
);

-- Trigger to keep them in sync
CREATE FUNCTION sync_email() RETURNS TRIGGER AS $$
BEGIN
    NEW.email_search := NEW.email_display;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER sync_email_trigger
BEFORE INSERT OR UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION sync_email();

-- Insert with original casing
INSERT INTO users (id, email_display) 
VALUES (1, '[email protected]');
-- email_search is automatically set to '[email protected]' (as citext)

-- Query case-insensitively, get original casing
SELECT email_display FROM users WHERE email_search = '[email protected]';
-- Returns: '[email protected]' ✓

Pattern 2: Case-Insensitive JOINs

CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    email citext UNIQUE
);

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_email citext NOT NULL,
    total NUMERIC(10, 2)
);

-- JOIN works case-insensitively
SELECT c.id, o.total
FROM customers c
JOIN orders o ON c.email = o.customer_email;
-- Matches regardless of case differences in the data

Pattern 3: Migrating Existing TEXT Columns

-- Start with a TEXT column
CREATE TABLE products (
    id INTEGER PRIMARY KEY,
    sku TEXT UNIQUE
);

-- Enable citext
CREATE EXTENSION IF NOT EXISTS citext;

-- Migrate to citext
BEGIN;

-- Drop old unique constraint
ALTER TABLE products DROP CONSTRAINT products_sku_key;

-- Change column type
ALTER TABLE products ALTER COLUMN sku TYPE citext;

-- Add unique constraint back
ALTER TABLE products ADD CONSTRAINT products_sku_key UNIQUE (sku);

COMMIT;

Pattern 4: Case-Insensitive Arrays

CREATE TABLE articles (
    id INTEGER PRIMARY KEY,
    title TEXT,
    tags citext[] NOT NULL DEFAULT '{}'
);

-- Insert with mixed case tags
INSERT INTO articles VALUES 
    (1, 'PostgreSQL Tips', ARRAY['Database', 'SQL', 'Tutorial']::citext[]);

-- Search for tag (case-insensitive)
SELECT * FROM articles WHERE 'database'::citext = ANY(tags);
-- ✓ Found (matches 'Database')

SELECT * FROM articles WHERE 'TUTORIAL'::citext = ANY(tags);
-- ✓ Found (matches 'Tutorial')

Performance Considerations

Index Performance

citext uses the same B-tree index structure as TEXT:

-- Standard B-tree index works perfectly
CREATE INDEX idx_users_email ON users(email);

-- Query uses index efficiently
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';
-- Index Scan using idx_users_email

Performance is nearly identical to TEXT + functional index, but with simpler code.

Storage Overhead

citext has zero storage overhead compared to TEXT—it’s stored identically:

SELECT pg_column_size('Hello'::text);    -- 9 bytes
SELECT pg_column_size('Hello'::citext);  -- 9 bytes (same!)

Comparison Speed

citext comparisons are slightly slower than TEXT because they perform case-folding:

-- Benchmark: 1 million comparisons
-- TEXT:   ~50ms
-- citext: ~65ms
-- Overhead: ~30% slower

But this is negligible because:

  • Comparisons are very fast (microseconds)
  • You avoid full table scans from forgotten LOWER()
  • You eliminate bugs from inconsistent case handling

When NOT to Use citext

Case 1: Case Matters for Business Logic

If case differences are meaningful, don’t use citext:

-- ❌ WRONG: Password hashes should be case-sensitive!
CREATE TABLE users (
    email citext,
    password_hash citext  -- DON'T DO THIS!
);
-- ✅ CORRECT: Only case-insensitive columns use citext
CREATE TABLE users (
    email citext,
    password_hash TEXT  -- Keep case-sensitive
);

Case 2: Binary Data or Exact Matching Required

-- ❌ WRONG: API keys should be exact matches
CREATE TABLE api_keys (
    key citext PRIMARY KEY  -- Don't do this
);
-- ✅ CORRECT: Use TEXT for exact matching
CREATE TABLE api_keys (
    key TEXT PRIMARY KEY
);

Case 3: International Characters

citext uses database locale for case-folding, which can have issues with some Unicode characters:

-- Turkish 'i' problem
SELECT 'i'::citext = 'İ'::citext;  -- May not work as expected in some locales

For full Unicode case-insensitivity, consider using ICU collations (PostgreSQL 12+):

CREATE COLLATION case_insensitive (
    provider = icu,
    locale = 'und-u-ks-level2',
    deterministic = false
);

CREATE TABLE users (
    email TEXT COLLATE case_insensitive
);

Common Pitfalls

Pitfall 1: Mixing TEXT and citext in Comparisons

CREATE TABLE users (email citext);
INSERT INTO users VALUES ('[email protected]');

-- ❌ This works but is implicit casting
SELECT * FROM users WHERE email = '[email protected]';  -- TEXT literal

-- ✅ Better: explicit cast when needed
SELECT * FROM users WHERE email = '[email protected]'::citext;

In practice, PostgreSQL handles this well, but explicit casts make intent clear.

Pitfall 2: Forgetting citext in Function Parameters

-- ❌ WRONG: TEXT parameter loses case-insensitivity
CREATE FUNCTION find_user(user_email TEXT) RETURNS INTEGER AS $$
    SELECT id FROM users WHERE email = user_email;
$$ LANGUAGE SQL;

-- ✅ CORRECT: Use citext parameter
CREATE FUNCTION find_user(user_email citext) RETURNS INTEGER AS $$
    SELECT id FROM users WHERE email = user_email;
$$ LANGUAGE SQL;

Pitfall 3: Not Understanding How GROUP BY Works

CREATE TABLE events (
    id INTEGER,
    tag citext
);

INSERT INTO events VALUES 
    (1, 'PostgreSQL'),
    (2, 'postgresql'),
    (3, 'POSTGRESQL');

-- GROUP BY with citext
SELECT tag, COUNT(*) FROM events GROUP BY tag;
-- Returns:
-- tag          | count
-- -------------|------
-- PostgreSQL   | 3     (all grouped together!)

This might be surprising if you expected separate groups. It’s actually the desired behavior for case-insensitive grouping.


Migration Example: Converting an Existing Application

Let’s migrate a real application from TEXT + LOWER() to citext:

Before (TEXT with LOWER()):

-- Schema
CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    email TEXT UNIQUE NOT NULL
);

CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Application code
def find_user_by_email(email):
    return db.query(
        "SELECT * FROM users WHERE LOWER(email) = LOWER(%s)",
        (email,)
    )

def create_user(email, password):
    db.execute(
        "INSERT INTO users (email) VALUES (LOWER(%s))",
        (email,)
    )

After (citext):

-- Migration
CREATE EXTENSION IF NOT EXISTS citext;

BEGIN;
ALTER TABLE users DROP CONSTRAINT users_email_key;
DROP INDEX idx_users_email_lower;

ALTER TABLE users ALTER COLUMN email TYPE citext;

ALTER TABLE users ADD CONSTRAINT users_email_key UNIQUE (email);
CREATE INDEX idx_users_email ON users(email);
COMMIT;

-- Simplified application code
def find_user_by_email(email):
    return db.query(
        "SELECT * FROM users WHERE email = %s",  # No LOWER()!
        (email,)
    )

def create_user(email, password):
    db.execute(
        "INSERT INTO users (email) VALUES (%s)",  # No LOWER()!
        (email,)
    )

Benefits:

  • Removed all LOWER() calls
  • Simpler queries
  • Standard indexes instead of functional indexes
  • Less error-prone
  • Same performance

Testing citext Behavior

Here’s a complete test suite:

-- Setup
CREATE EXTENSION IF NOT EXISTS citext;
CREATE TABLE test_citext (value citext);

-- Test 1: Equality
INSERT INTO test_citext VALUES ('Hello');
SELECT value FROM test_citext WHERE value = 'hello';     -- ✓ Returns 'Hello'
SELECT value FROM test_citext WHERE value = 'HELLO';     -- ✓ Returns 'Hello'
SELECT value FROM test_citext WHERE value = 'HeLLo';     -- ✓ Returns 'Hello'

-- Test 2: Uniqueness
CREATE TABLE test_unique (id INTEGER, email citext UNIQUE);
INSERT INTO test_unique VALUES (1, '[email protected]');
INSERT INTO test_unique VALUES (2, '[email protected]');  -- ✗ ERROR: duplicate

-- Test 3: Ordering
CREATE TABLE test_order (name citext);
INSERT INTO test_order VALUES ('Charlie'), ('alice'), ('BOB');
SELECT name FROM test_order ORDER BY name;
-- Returns: alice, BOB, Charlie (case-insensitive sort)

-- Test 4: Pattern Matching
SELECT 'Alice'::citext LIKE 'ali%';      -- ✓ true
SELECT 'Alice'::citext LIKE 'ALI%';      -- ✓ true
SELECT 'Alice'::citext ~ '^ali';         -- ✗ false (regex is still case-sensitive!)
SELECT 'Alice'::citext ~* '^ali';        -- ✓ true (use ~* for case-insensitive regex)

-- Cleanup
DROP TABLE test_citext, test_unique, test_order;

Summary

Key Takeaways:

  1. Use citext for identifiers where case shouldn’t matter (emails, usernames, SKUs)
  2. citext makes UNIQUE constraints work intuitively
  3. Standard B-tree indexes work without needing functional indexes
  4. No need to remember LOWER() in every query
  5. Near-zero performance overhead
  6. Case is preserved for display but ignored for comparison
  7. Don’t use citext for case-sensitive data (passwords, API keys)
  8. Be careful with Unicode edge cases in some locales

The Bottom Line:

citext eliminates an entire class of bugs by making case-insensitivity a type-level guarantee rather than a convention that developers must remember. It’s simpler, safer, and just as fast as the LOWER() pattern.


Further Reading


Next in the series: Item 3 – Use EXCLUSION Constraints for Temporal Data

Leave a Comment