Data Modeling for Scalable Applications: Normalization vs Denormalization

Data Modeling for Scalable Applications: Normalization vs Denormalization

BySanjay Goraniya
3 min read
Share:

Data Modeling for Scalable Applications: Normalization vs Denormalization

Data modeling is one of the most important decisions you'll make. Get it right, and your application scales beautifully. Get it wrong, and you'll be fighting performance issues forever. After modeling databases for applications serving millions of users, I've learned that the answer isn't "always normalize" or "always denormalize"—it's "it depends."

Understanding Normalization

What is Normalization?

Organizing data to reduce redundancy and improve data integrity.

Normal Forms

First Normal Form (1NF)

  • Each column contains atomic values
  • No repeating groups
Code
-- Bad: Repeating groups
CREATE TABLE users (
  id INT,
  name VARCHAR(100),
  phone1 VARCHAR(20),
  phone2 VARCHAR(20),
  phone3 VARCHAR(20)
);

-- Good: 1NF
CREATE TABLE users (
  id INT,
  name VARCHAR(100)
);

CREATE TABLE user_phones (
  user_id INT,
  phone VARCHAR(20)
);

Second Normal Form (2NF)

  • 1NF + all non-key attributes depend on the full primary key
Code
-- Bad: Partial dependency
CREATE TABLE orders (
  order_id INT,
  product_id INT,
  product_name VARCHAR(100), -- Depends only on product_id
  quantity INT,
  price DECIMAL
);

-- Good: 2NF
CREATE TABLE orders (
  order_id INT,
  product_id INT,
  quantity INT,
  price DECIMAL
);

CREATE TABLE products (
  product_id INT,
  name VARCHAR(100)
);

Third Normal Form (3NF)

  • 2NF + no transitive dependencies
Code
-- Bad: Transitive dependency
CREATE TABLE orders (
  order_id INT,
  customer_id INT,
  customer_name VARCHAR(100), -- Depends on customer_id, not order_id
  total DECIMAL
);

-- Good: 3NF
CREATE TABLE orders (
  order_id INT,
  customer_id INT,
  total DECIMAL
);

CREATE TABLE customers (
  customer_id INT,
  name VARCHAR(100)
);

When to Normalize

Benefits of Normalization

  1. Data integrity - No redundancy, no inconsistencies
  2. Storage efficiency - Less duplicate data
  3. Easier updates - Update in one place
  4. Flexibility - Easier to change schema

Good for Normalization

  • OLTP systems - Transaction processing
  • Data consistency critical - Financial systems
  • Frequent updates - User profiles, settings
  • Complex relationships - Many-to-many relationships

Example: Normalized Schema

Code
-- Users
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  email VARCHAR(255) UNIQUE,
  name VARCHAR(100)
);

-- Orders
CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INT REFERENCES users(id),
  created_at TIMESTAMP
);

-- Order Items
CREATE TABLE order_items (
  id SERIAL PRIMARY KEY,
  order_id INT REFERENCES orders(id),
  product_id INT REFERENCES products(id),
  quantity INT,
  price DECIMAL
);

-- Products
CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100),
  price DECIMAL
);

When to Denormalize

Benefits of Denormalization

  1. Read performance - Fewer JOINs
  2. Simpler queries - Single table reads
  3. Better for analytics - Pre-aggregated data
  4. Faster reporting - Less computation

Good for Denormalization

  • Read-heavy workloads - Analytics, reporting
  • Performance critical - High-traffic reads
  • Simple queries - Few relationships
  • Data warehouse - OLAP systems

Example: Denormalized Schema

Code
-- Denormalized order view
CREATE TABLE order_summary (
  order_id INT PRIMARY KEY,
  user_id INT,
  user_name VARCHAR(100), -- Denormalized
  user_email VARCHAR(255), -- Denormalized
  total DECIMAL,
  item_count INT, -- Pre-calculated
  created_at TIMESTAMP
);

Hybrid Approach

Normalize for Writes, Denormalize for Reads

Code
-- Normalized: Write model
CREATE TABLE users (id, email, name);
CREATE TABLE orders (id, user_id, total);
CREATE TABLE order_items (id, order_id, product_id, quantity);

-- Denormalized: Read model (materialized view)
CREATE MATERIALIZED VIEW order_details AS
SELECT 
  o.id as order_id,
  u.id as user_id,
  u.name as user_name,
  u.email as user_email,
  o.total,
  COUNT(oi.id) as item_count,
  o.created_at
FROM orders o
JOIN users u ON u.id = o.user_id
LEFT JOIN order_items oi ON oi.order_id = o.id
GROUP BY o.id, u.id, u.name, u.email, o.total, o.created_at;

-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY order_details;

Real-World Patterns

Pattern 1: User Profile

Normalized:

Code
CREATE TABLE users (id, email, name);
CREATE TABLE user_profiles (user_id, bio, avatar_url);
CREATE TABLE user_settings (user_id, theme, language);

Denormalized (for read performance):

Code
CREATE TABLE users (
  id,
  email,
  name,
  bio, -- Denormalized
  avatar_url, -- Denormalized
  theme, -- Denormalized
  language -- Denormalized
);

When to use: If profile is read frequently but updated rarely.

Pattern 2: Order with Items

Normalized:

Code
CREATE TABLE orders (id, user_id, total);
CREATE TABLE order_items (id, order_id, product_id, quantity, price);

Denormalized (for analytics):

Code
CREATE TABLE order_analytics (
  order_id,
  user_id,
  total,
  item_count, -- Pre-calculated
  product_ids, -- Array or JSON
  created_at
);

When to use: For reporting and analytics.

Pattern 3: Social Feed

Normalized:

Code
CREATE TABLE posts (id, user_id, content, created_at);
CREATE TABLE users (id, name, avatar);

Denormalized (for feed):

Code
CREATE TABLE feed_items (
  id,
  post_id,
  user_id,
  user_name, -- Denormalized
  user_avatar, -- Denormalized
  content,
  created_at
);

When to use: When feed is read frequently, posts updated rarely.

CQRS Pattern

Command Query Responsibility Segregation:

  • Write model: Normalized (consistency)
  • Read model: Denormalized (performance)
Code
-- Write: Normalized
CREATE TABLE users (id, email, name);
CREATE TABLE orders (id, user_id, total);

-- Read: Denormalized view
CREATE VIEW order_read_model AS
SELECT 
  o.id,
  o.total,
  u.name as user_name,
  u.email as user_email
FROM orders o
JOIN users u ON u.id = o.user_id;

Migration Strategy

From Normalized to Denormalized

Code
-- Step 1: Add denormalized columns
ALTER TABLE orders ADD COLUMN user_name VARCHAR(100);
ALTER TABLE orders ADD COLUMN user_email VARCHAR(255);

-- Step 2: Backfill
UPDATE orders o
SET 
  user_name = u.name,
  user_email = u.email
FROM users u
WHERE o.user_id = u.id;

-- Step 3: Keep in sync (trigger)
CREATE TRIGGER sync_user_data
AFTER UPDATE ON users
FOR EACH ROW
EXECUTE FUNCTION update_orders_user_data();

Best Practices

  1. Start normalized - Easier to denormalize later
  2. Denormalize for reads - When reads >> writes
  3. Keep in sync - Use triggers or application logic
  4. Document decisions - Why you denormalized
  5. Measure performance - Verify improvements
  6. Consider CQRS - Separate read/write models

Real-World Example

Application: E-commerce platform

Challenge: Order listing page slow (5+ seconds)

Initial Schema (Normalized):

Code
SELECT o.*, u.name, u.email, 
       COUNT(oi.id) as item_count
FROM orders o
JOIN users u ON u.id = o.user_id
LEFT JOIN order_items oi ON oi.order_id = o.id
GROUP BY o.id, u.id, u.name, u.email
LIMIT 20;
-- 5+ seconds (multiple JOINs, GROUP BY)

Optimized Schema (Denormalized):

Code
-- Denormalized table
CREATE TABLE order_list (
  order_id INT PRIMARY KEY,
  user_name VARCHAR(100),
  user_email VARCHAR(255),
  total DECIMAL,
  item_count INT, -- Pre-calculated
  created_at TIMESTAMP
);

-- Fast query
SELECT * FROM order_list
ORDER BY created_at DESC
LIMIT 20;
-- 50ms (single table read)

Result:

  • Query time: 5s → 50ms
  • User experience: Much better
  • Trade-off: Need to keep data in sync

Conclusion

Data modeling is about trade-offs:

  • Normalize for consistency and flexibility
  • Denormalize for performance and simplicity
  • Hybrid for best of both worlds

The key is understanding your access patterns:

  • Read-heavy? Consider denormalization
  • Write-heavy? Stay normalized
  • Both? Use CQRS pattern

Remember: You can always denormalize later, but it's harder to normalize a denormalized schema. Start normalized, optimize based on actual needs.

What data modeling challenges have you faced? How did you balance normalization and denormalization?

Share:

Related Posts