Database Optimization Strategies for High-Traffic Applications

Database Optimization Strategies for High-Traffic Applications

BySanjay Goraniya
3 min read
Share:

Database Optimization Strategies for High-Traffic Applications

Database performance is often the bottleneck in high-traffic applications. I've spent countless hours optimizing databases that serve millions of queries daily, and I've learned that optimization is both an art and a science. Let me share the strategies that have made the biggest impact.

Understanding the Problem

Before optimizing, you need to understand what's slow. The most common issues I've encountered:

  1. Missing or inefficient indexes
  2. N+1 query problems
  3. Inefficient joins
  4. Lack of connection pooling
  5. Poor schema design

Indexing Strategies

When to Index

Index every column used in:

  • WHERE clauses
  • JOIN conditions
  • ORDER BY clauses
  • GROUP BY clauses

Composite Indexes

The order matters in composite indexes:

Code
-- Good: Index matches query pattern
CREATE INDEX idx_user_status_created ON users(status, created_at);

-- Query that uses this index efficiently
SELECT * FROM users 
WHERE status = 'active' 
ORDER BY created_at DESC;

-- Bad: Wrong column order
CREATE INDEX idx_created_status ON users(created_at, status);
-- This won't help the above query as efficiently

Covering Indexes

Include all columns needed by the query:

Code
-- Instead of this query hitting the table
SELECT id, name, email FROM users WHERE status = 'active';

-- Create a covering index
CREATE INDEX idx_users_status_covering 
ON users(status) 
INCLUDE (id, name, email);

Query Optimization

Avoid N+1 Queries

The Problem:

Code
// Bad: N+1 queries
const users = await db.query('SELECT * FROM users');
for (const user of users) {
  user.posts = await db.query('SELECT * FROM posts WHERE user_id = ?', [user.id]);
}

The Solution:

Code
// Good: Single query with JOIN
const users = await db.query(`
  SELECT 
    u.*,
    json_agg(p.*) as posts
  FROM users u
  LEFT JOIN posts p ON p.user_id = u.id
  GROUP BY u.id
`);

// Or: Two queries (sometimes faster)
const users = await db.query('SELECT * FROM users');
const userIds = users.map(u => u.id);
const posts = await db.query(
  'SELECT * FROM posts WHERE user_id = ANY($1)',
  [userIds]
);
// Then map posts to users in application code

Use EXPLAIN

Always use EXPLAIN to understand query execution:

Code
EXPLAIN ANALYZE
SELECT u.name, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
WHERE u.created_at > '2023-01-01'
GROUP BY u.id, u.name
HAVING COUNT(p.id) > 10;

Look for:

  • Seq Scan (bad) vs Index Scan (good)
  • High cost numbers
  • Missing indexes warnings

Limit and Pagination

Always use LIMIT, and paginate efficiently:

Code
-- Good: Cursor-based pagination (for large datasets)
SELECT * FROM posts 
WHERE id > $1 
ORDER BY id 
LIMIT 20;

-- Good: Offset pagination (for smaller datasets)
SELECT * FROM posts 
ORDER BY created_at DESC 
LIMIT 20 OFFSET $1;

-- Bad: No limit
SELECT * FROM posts ORDER BY created_at DESC;

Connection Pooling

Why It Matters

Without connection pooling, each request creates a new database connection, which is expensive:

Code
// Bad: New connection per request
app.get('/users', async (req, res) => {
  const connection = await db.connect();
  const users = await connection.query('SELECT * FROM users');
  connection.release();
  res.json(users);
});

// Good: Connection pool
const pool = new Pool({
  max: 20, // Maximum connections
  min: 5,  // Minimum connections
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

app.get('/users', async (req, res) => {
  const users = await pool.query('SELECT * FROM users');
  res.json(users);
});

Schema Design

Normalize, But Not Too Much

Third Normal Form is usually enough:

Code
-- Good: Normalized but practical
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  email VARCHAR(255) UNIQUE NOT NULL,
  name VARCHAR(255) NOT NULL
);

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INTEGER REFERENCES users(id),
  total DECIMAL(10, 2) NOT NULL,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Bad: Over-normalized
CREATE TABLE users (id, email, name);
CREATE TABLE user_emails (user_id, email_type, email_value);
CREATE TABLE user_names (user_id, name_type, name_value);
-- Too many joins needed for simple queries

Denormalize for Read Performance

Sometimes denormalization helps:

Code
-- Add redundant data for read performance
CREATE TABLE posts (
  id SERIAL PRIMARY KEY,
  user_id INTEGER REFERENCES users(id),
  user_name VARCHAR(255), -- Denormalized
  title VARCHAR(255),
  content TEXT,
  created_at TIMESTAMP
);

-- Update user_name when user.name changes
CREATE TRIGGER update_post_user_name
AFTER UPDATE ON users
FOR EACH ROW
EXECUTE FUNCTION update_posts_user_name();

Caching Strategies

Application-Level Caching

Code
// Redis caching example
async function getUser(id) {
  const cacheKey = `user:${id}`;
  
  // Try cache first
  const cached = await redis.get(cacheKey);
  if (cached) {
    return JSON.parse(cached);
  }
  
  // Cache miss - query database
  const user = await db.query('SELECT * FROM users WHERE id = $1', [id]);
  
  // Cache for 5 minutes
  await redis.setex(cacheKey, 300, JSON.stringify(user));
  
  return user;
}

Query Result Caching

For expensive queries that don't change often:

Code
-- Materialized views for complex aggregations
CREATE MATERIALIZED VIEW user_stats AS
SELECT 
  user_id,
  COUNT(*) as post_count,
  SUM(likes) as total_likes,
  AVG(rating) as avg_rating
FROM posts
GROUP BY user_id;

-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY user_stats;

Monitoring and Profiling

Slow Query Logs

Enable slow query logging:

Code
-- PostgreSQL
SET log_min_duration_statement = 1000; -- Log queries > 1 second

-- MySQL
SET slow_query_log = 'ON';
SET long_query_time = 1;

Database Metrics to Monitor

  1. Query execution time
  2. Connection pool usage
  3. Cache hit rates
  4. Index usage
  5. Lock contention

Real-World Example

Problem: E-commerce site with 10M+ products, queries taking 5+ seconds.

Solution:

  1. Added composite indexes on frequently queried columns
  2. Implemented read replicas for product searches
  3. Added Redis caching for popular products
  4. Optimized JOIN queries using covering indexes
  5. Implemented connection pooling (reduced connection overhead by 80%)

Result: Average query time reduced from 5 seconds to 50ms.

Best Practices Summary

  1. Index strategically - Not everything needs an index
  2. Profile first - Use EXPLAIN to find bottlenecks
  3. Avoid N+1 queries - Use JOINs or batch queries
  4. Use connection pooling - Always
  5. Cache intelligently - Cache what changes infrequently
  6. Monitor continuously - Set up alerts for slow queries
  7. Test with production-like data - Synthetic data doesn't reveal real issues

Conclusion

Database optimization is an ongoing process. What works for 1,000 users won't work for 1,000,000. The key is to:

  • Measure before optimizing
  • Index based on actual query patterns
  • Monitor continuously
  • Refactor incrementally

Remember: Premature optimization is the root of all evil, but ignoring performance is the root of all production incidents. Find the balance.

What database optimization challenges have you faced? I'd love to hear about your experiences and solutions.

Share:

Related Posts