Problem-First Architecture Patterns In Web Applications

Table of Contents

Overview

Here’s a problem-first mapping — each scenario describes a pressure point a web application hits, paired with the tools/patterns that address it.

Scenarios

Scenario: Your database is getting hammered by repeated identical reads (product pages, user profiles, config data)

Solution: Redis or Memcached as a read-through cache, sitting in front of Postgres. The pattern is cache-aside (check cache, miss → query DB → populate cache) with a TTL or explicit invalidation on writes.

Solution: Redis or Memcached as a read-through cache, sitting in front of Postgres. The pattern is cache-aside (check cache, miss → query DB → populate cache) with a TTL or explicit invalidation on writes.

Scenario: A single write needs to trigger multiple downstream effects, and you don’t want the user’s request blocked waiting for all of them

Solution: Message queue (RabbitMQ, SQS, or Redis-backed BullMQ) for fire-and-forget background jobs — sending emails, generating thumbnails, updating search indexes. The producer publishes once and moves on; workers process asynchronously.

Solution: Message queue (RabbitMQ, SQS, or Redis-backed BullMQ) for fire-and-forget background jobs — sending emails, generating thumbnails, updating search indexes. The producer publishes once and moves on; workers process asynchronously.

Scenario: Multiple services need to react to the same business event (order placed → inventory, billing, notifications, analytics all care)

Solution: Kafka (or a lighter alternative like Redis Streams/NATS for smaller scale) for event streaming with multiple independent consumer groups. This is different from a task queue — the event is broadcast and retained, not consumed-and-gone.

Solution: Kafka (or a lighter alternative like Redis Streams/NATS for smaller scale) for event streaming with multiple independent consumer groups. This is different from a task queue — the event is broadcast and retained, not consumed-and-gone.

Scenario: Users need to search by free text, filters, or fuzzy matching across large datasets, and Postgres LIKE queries are too slow

Solution: Elasticsearch or OpenSearch for full-text/faceted search; Meilisearch or Typesense if you want something lighter and easier to operate for smaller catalogs.

Solution: Elasticsearch or OpenSearch for full-text/faceted search; Meilisearch or Typesense if you want something lighter and easier to operate for smaller catalogs.

Scenario: You’re storing and serving images, videos, documents, or other large binary files

Solution: Object storage (S3 or MinIO for self-hosted) rather than the database. Database stores the reference/metadata; the blob lives in object storage, often served via CDN.

Solution: Object storage (S3 or MinIO for self-hosted) rather than the database. Database stores the reference/metadata; the blob lives in object storage, often served via CDN.

Scenario: Analytical/reporting queries (aggregations across millions of rows) are slowing down your transactional database

Solution: Either read replicas of Postgres dedicated to reporting, or — if the analytical workload is heavy — a columnar store like ClickHouse fed via CDC (Debezium + Kafka) from Postgres. This separates OLTP from OLAP.

Solution: Either read replicas of Postgres dedicated to reporting, or — if the analytical workload is heavy — a columnar store like ClickHouse fed via CDC (Debezium + Kafka) from Postgres. This separates OLTP from OLAP.

Scenario: A single Postgres instance can’t handle the write/read volume anymore

Solution: Read replicas for read-heavy scaling first (cheapest fix). If writes are the bottleneck, partitioning/sharding strategies (by tenant, by date range) or moving high-volume specific tables to a purpose-built store (e.g., time-series data to TimescaleDB).

Solution: Read replicas for read-heavy scaling first (cheapest fix). If writes are the bottleneck, partitioning/sharding strategies (by tenant, by date range) or moving high-volume specific tables to a purpose-built store (e.g., time-series data to TimescaleDB).

Scenario: Connection exhaustion — too many app instances opening too many DB connections

Solution: PgBouncer (connection pooler) sitting between app and Postgres, using transaction-mode pooling to multiplex many client connections onto fewer actual DB connections.

Scenario: You need session storage, rate limiting, or distributed locks across multiple app instances

Solution: Redis again, but for a different purpose than caching — atomic operations (INCR for rate limiting, SETNX/Redlock for distributed locks), and shared session state so users aren’t pinned to one server.

Solution: Redis again, but for a different purpose than caching — atomic operations (INCR for rate limiting, SETNX/Redlock for distributed locks), and shared session state so users aren't pinned to one server.

Scenario: Real-time updates need to reach connected clients (chat, live dashboards, notifications)

Solution: WebSockets with a pub/sub backbone (Redis pub/sub or NATS) so any app instance can publish a message and the instance holding that user’s WebSocket connection delivers it. This solves the “which server is this user connected to” problem in a horizontally-scaled setup.

Solution: WebSockets with a pub/sub backbone (Redis pub/sub or NATS) so any app instance can publish a message and the instance holding that user's WebSocket connection delivers it. This solves the "which server is this user connected to" problem in a horizontally-scaled setup.

Scenario: You need strong consistency for some operations (payments, inventory counts) but eventual consistency is fine elsewhere

Solution: Keep Postgres transactions for the strict-consistency paths; use Kafka/event-driven updates for everything else (search index updates, cache invalidation, analytics). This is the core CQRS-lite pattern — don’t force one consistency model on the whole system.

Solution: Keep Postgres transactions for the strict-consistency paths; use Kafka/event-driven updates for everything else (search index updates, cache invalidation, analytics). This is the core CQRS-lite pattern — don't force one consistency model on the whole system.

Scenario: Static assets (JS, CSS, images) are slow for geographically distributed users

Solution: CDN (Cloudflare, CloudFront) in front of object storage or your app server, with cache headers controlling freshness.

Solution: CDN (Cloudflare, CloudFront) in front of object storage or your app server, with cache headers controlling freshness.

Scenario: You need to keep a search index or cache in sync with database changes without littering your application code with manual sync logic

Solution: Change Data Capture (Debezium reading Postgres’s WAL) → Kafka → consumers update Elasticsearch/Redis. This decouples “what changed in the DB” from “who needs to know.”

Solution: Change Data Capture (Debezium reading Postgres's WAL) → Kafka → consumers update Elasticsearch/Redis. This decouples "what changed in the DB" from "who needs to know."

Scenario: Long-running tasks (report generation, video transcoding, ML inference) shouldn’t block the request-response cycle

Solution: Task queue with a worker pool (Celery for Python, BullMQ for Node, or Kafka consumers for higher throughput) — return a job ID immediately, client polls or gets a websocket notification on completion.

Solution: Task queue with a worker pool (Celery for Python, BullMQ for Node, or Kafka consumers for higher throughput) — return a job ID immediately, client polls or gets a websocket notification on completion.

Scenario: You need to enforce business rules or transformations on streaming data before it lands anywhere (deduplication, enrichment, windowed aggregation)

Solution: Stream processing frameworks — Kafka Streams, Apache Flink, or ksqlDB. This sits between “raw events in Kafka” and “consumers,” doing the heavy lifting so downstream services get pre-processed data (e.g., “rolling 5-minute order count per region” rather than raw order events).

Solution: Stream processing frameworks — Kafka Streams, Apache Flink, or ksqlDB. This sits between "raw events in Kafka" and "consumers," doing the heavy lifting so downstream services get pre-processed data (e.g., "rolling 5-minute order count per region" rather than raw order events).

Scenario: You’re dealing with geographically distributed users and need data to be close to them, not just static assets

Solution: Multi-region database replication (Postgres logical replication, or distributed SQL like CockroachDB/YugabyteDB if you need multi-region writes with strong consistency). This is a step beyond CDN — it’s about write locality, not just read caching.

Solution: Multi-region database replication (Postgres logical replication, or distributed SQL like CockroachDB/YugabyteDB if you need multi-region writes with strong consistency). This is a step beyond CDN — it's about write locality, not just read caching.

Scenario: You need feature flags or gradual rollouts without redeploying

Solution: A feature flag service (LaunchDarkly, Unleash, or even a simple Redis-backed flag store) lets you toggle behavior per-user or per-percentage without code changes — useful for canary releases and A/B testing.

Solution: A feature flag service (LaunchDarkly, Unleash, or even a simple Redis-backed flag store) lets you toggle behavior per-user or per-percentage without code changes — useful for canary releases and A/B testing.

Scenario: Multiple microservices each have their own database, but a single business transaction spans several of them

Solution: Saga pattern — either orchestration-based (a coordinator service calls each step and handles compensation on failure) or choreography-based (each service reacts to events and emits its own, using your Kafka/event bus). This replaces distributed transactions (2PC), which don’t scale well.

Saga orchestration based

Saga choreography pattern

Scenario: You need to handle bursty traffic (flash sales, viral content) without over-provisioning constantly

Solution: Queue-based load leveling — incoming requests get queued (SQS, Kafka) and processed at a sustainable rate, with auto-scaling workers. The user gets an “accepted, processing” response rather than the backend falling over trying to do everything synchronously.

Solution: Queue-based load leveling — incoming requests get queued (SQS, Kafka) and processed at a sustainable rate, with auto-scaling workers. The user gets an "accepted, processing" response rather than the backend falling over trying to do everything synchronously.

Scenario: You need to serve personalized or computed content (recommendations, feeds) without recomputing on every request

Solution: Materialized views (Postgres) for moderate complexity, or precomputed feed tables updated by background jobs (fan-out-on-write pattern, common in social feeds) — trades storage for read speed.

Solution: Materialized views (Postgres) for moderate complexity, or precomputed feed tables updated by background jobs (fan-out-on-write pattern, common in social feeds) — trades storage for read speed.

Scenario: Time-series data (metrics, IoT sensor readings, financial ticks) is growing unbounded and queries need to be fast over time ranges

Solution: TimescaleDB (Postgres extension) for moderate scale, or InfluxDB/QuestDB for purpose-built time-series workloads with automatic downsampling and retention policies.

Solution: TimescaleDB (Postgres extension) for moderate scale, or InfluxDB/QuestDB for purpose-built time-series workloads with automatic downsampling and retention policies.

Scenario: You’re integrating with many third-party APIs and need resilience against their failures/slowness

Solution: Circuit breaker pattern (libraries like resilience4j for Java, or built into service meshes) — stop calling a failing dependency temporarily, fail fast, and recover automatically once it’s healthy again.

Solution: Circuit breaker pattern (libraries like resilience4j for Java, or built into service meshes) — stop calling a failing dependency temporarily, fail fast, and recover automatically once it's healthy again.

Conclusion

That covers most of the “shapes” of problems architects run into — caching, decoupling, search, analytics, consistency, geographic distribution, and resilience.

A useful mental model: Postgres is your source of truth, Redis is your speed layer (cache, sessions, pub/sub, locks), and Kafka/queues are your decoupling layer (so services don’t call each other directly and don’t block on each other). Everything else — Elasticsearch, ClickHouse, object storage, CDN — gets added when a specific access pattern (search, analytics, large files, geographic distribution) outgrows what those three can do well.

Leave a Comment