A Comprehensive Guide to Fundamental Concepts Of System Design

Table of Contents

1 Overview
2 FAQ

Overview

The world of software development increasingly demands the ability to design and build large-scale distributed systems. Whether you’re preparing for a system design interview or aiming to architect robust applications, understanding the core concepts is paramount. This guide will take you through the essential building blocks of system design, drawing from a comprehensive tutorial covering everything from a computer’s inner workings to advanced deployment strategies.

This post is the recap of the content from this video, feel free to watch the video for the full content

Laying the Foundation: Understanding the Basics

Before diving into the complexities of distributed systems, it’s crucial to grasp the fundamental architecture of an individual computer.

The Layered System: Computers operate through a layered system, each part optimized for specific tasks. At the very core, computers understand only binary code (zeros and ones), represented as bits. Eight bits form a byte, which can represent a single character or number. These bytes are then organized into larger units like kilobytes, megabytes, gigabytes, and terabytes for storage.

Storage: Data persistence relies on computer disk storage, which can be either HDD (Hard Disk Drive) or SSD (Solid State Drive). Disk storage is non-volatile, meaning it retains data even without power, housing the operating system, applications, and user files. While HDDs offer large storage capacities, SSDs provide significantly faster data retrieval speeds.

RAM (Random Access Memory): Serving as the primary active data holder, RAM stores data structures, variables, and applications currently in use. This volatile memory allows for quick read and write access, holding program variables, intermediate computations, and the runtime stack.

Cache: To further optimize data access, computers utilize cache memory, which is smaller and faster than RAM. The CPU checks different levels of cache (L1, L2, L3) before accessing RAM, storing frequently used data to reduce average access time and improve CPU performance.

CPU (Central Processing Unit): The brain of the computer, the CPU fetches, decodes, and executes instructions. Code written in high-level languages needs to be compiled into machine code before the CPU can process it.

Motherboard (Main Board): This component acts as the central connection point, providing the pathways for data to flow between the CPU, RAM, disk, and other peripherals.

Building Production-Ready Applications: A High-Level Architecture

Moving beyond a single computer, designing production-ready applications involves orchestrating multiple components to handle user requests, store data, and ensure smooth operation.

CI/CD Pipeline (Continuous Integration and Continuous Deployment): This automated process ensures that code goes from the repository through tests and checks to the production server without manual intervention. Platforms like Jenkins or GitHub Actions are used for this automation.

Load Balancers and Reverse Proxies: To manage numerous user requests and maintain a smooth experience during traffic spikes, load balancers and reverse proxies like Nginx distribute requests evenly across multiple servers.

External Storage Servers: Instead of storing data on the same production servers, external storage servers connected over a network are used for persistent data storage.

Multiple Services: Modern applications often consist of multiple interconnected services.
Logging and Monitoring Systems: Keeping a close watch on every micro-interaction, logging and monitoring systems store logs and analyze data, often using external services.

Alerting Services: When issues are detected (e.g., failing requests), alerting services send notifications to keep users informed, often integrating directly with platforms like Slack for immediate developer action.

Debugging and Hotfixes: When problems arise, developers first analyze logs to identify the issue, then replicate it in a staging environment (never debug in production!). Once the bug is fixed, a hotfix, a quick temporary solution, is rolled out before a more permanent fix.

The Cornerstones of System Design: Moving, Storing, and Transforming Data

A well-designed system hinges on effectively managing data.

Moving Data: Ensuring seamless and secure data flow between different parts of the system is crucial. This involves optimizing speed and security for user requests reaching servers or data transfers between databases.

Storing Data: This goes beyond simply choosing a database (SQL or NoSQL). It involves understanding access patterns, implementing indexing strategies, and establishing robust backup solutions to ensure data security and availability.

Transforming Data: Turning raw data into meaningful information is essential. This includes tasks like aggregating log files for analysis or converting user input into different formats.

Key Principles Guiding System Design

Several crucial concepts underpin sound system design decisions.

The CAP Theorem (Brewer’s Theorem): This theorem states that a distributed system can only achieve two out of three of the following properties simultaneously:
- Consistency: All nodes in the system have the same data at the same time.
- Availability: The system is always operational and responsive to requests.
- Partition Tolerance: The system continues to function even when network partitions (communication disruptions) occur. Every design choice involves tradeoffs between these three. For instance, a banking system prioritizes consistency and partition tolerance, potentially sacrificing some availability during temporary processing delays.

Availability: This measures a system’s operational performance and reliability, indicating whether it’s up and running when users need it. It’s often measured in percentage of uptime, with “five nines” (99.999%) being a common goal, allowing for minimal downtime. Service Level Objectives (SLOs) are internal goals for system performance and availability, while Service Level Agreements (SLAs) are formal contracts with users defining the minimum service level.

Resilience: Building resilience means anticipating and preparing for failures. This involves:
- Reliability: Ensuring the system works correctly and consistently.
- Fault Tolerance: How the system handles unexpected failures or attacks.
- Redundancy: Having backups in place so that if one part fails, another can take over.

Performance: Measured through:
- Throughput: The amount of data a system can handle over a specific period (e.g., requests per second for servers, queries per second for databases, bytes per second for data transfer).
- Latency: The time it takes to handle a single request (time to get a response). Optimizing for one can sometimes impact the other.

Good Design Principles: Beyond technicalities, good design focuses on scalability (system growth with user base), maintainability (ease for future developers to understand and improve), and efficiency (optimal resource utilization). Importantly, it also involves planning for failure. The cost of redesigning a poorly designed system can be immense compared to investing in a solid foundation from the start.

Navigating the Network: Connecting the Pieces

Understanding networking is fundamental to designing distributed systems.

Networking Basics: Communication between computers relies on the IP address, a unique identifier for each device on a network. IPv4 uses 32-bit addresses, while the newer IPv6 uses 128-bit addresses to accommodate the increasing number of devices. Data is transmitted in packets, each containing an IP header with sender and receiver IP addresses, guided by the Internet Protocol (IP). The application layer then adds data specific to the application protocol (e.g., HTTP).

Transport Layer: TCP and UDP:
- TCP (Transmission Control Protocol): Operates at the transport layer, ensuring reliable communication through connection establishment (three-way handshake), ordered delivery (sequence numbers), and error checking.
- UDP (User Datagram Protocol): Offers faster but less reliable communication as it doesn’t establish a connection or guarantee delivery, making it suitable for time-sensitive applications like video calls and live streaming where some data loss is acceptable.

DNS (Domain Name System): Acts as the internet’s phonebook, translating human-friendly domain names into IP addresses. This process is overseen by ICANN, which accredits domain name registrars. DNS uses various records like A records (domain to IPv4) and AAAA records (domain to IPv6).

Networking Infrastructure: Devices on a network have either public (unique across the internet) or private (unique within a local network) IP addresses. IP addresses can be static (permanently assigned) or dynamic (changing over time). Firewalls protect networks by monitoring and controlling incoming and outgoing traffic. Within a device, specific processes are identified by ports, creating a unique identifier for a network service when combined with an IP address.

Speaking the Same Language: Essential Application Layer Protocols

The application layer defines how applications communicate over the network.

HTTP (Hypertext Transfer Protocol): Built on TCP/IP, HTTP is a stateless request-response protocol used for web browsing. Each request contains all necessary information in headers (including URL and method) and an optional body. Responses include status codes indicating the outcome of the request (e.g., 200 series for success, 400 series for client errors, 500 series for server errors). Common HTTP methods include GET (fetching), POST (creating), PUT/PATCH (updating), and DELETE (removing).

WebSockets: For real-time updates, WebSockets provide a two-way communication channel over a single, long-lived connection, allowing servers to push updates to clients without repeated HTTP requests. This is crucial for applications like chat, live sports updates, and stock market feeds.

Email Protocols:
- SMTP (Simple Mail Transfer Protocol): The standard for sending email messages between servers.
- IMAP (Internet Message Access Protocol): Used to retrieve emails from a server, allowing access from multiple devices.
- POP3 (Post Office Protocol version 3): Used for downloading emails from a server to a local client, typically for single-device access.

File Transfer and Management:
- FTP (File Transfer Protocol): A traditional protocol for transferring files between a client and server.
- SSH (Secure Shell): Used for securely operating network services on unsecured networks, commonly for remote login and file transfer.

Real-Time Communication:
- WebRTC (Web Real-Time Communication): Enables browser-to-browser applications for voice calling, video chat, and file sharing without plugins.
- MQTT (Message Queuing Telemetry Transport): A lightweight messaging protocol ideal for low-bandwidth scenarios and devices with limited processing power, such as IoT devices.
- AMQP (Advanced Message Queuing Protocol): A protocol for message-oriented middleware, providing robustness and security for enterprise-level message communication (e.g., RabbitMQ).

RPC (Remote Procedure Call): Allows a program on one computer to execute code on a remote server or another computer as if it were a local function call, abstracting network communication details. Many application layer protocols utilize RPC mechanisms.

Crafting Seamless Interactions: API Design

Designing effective APIs (Application Programming Interfaces) is crucial for enabling communication between different software components.

API Design Basics: This involves defining the inputs (data sent to the API) and outputs (data returned by the API) for various operations. The focus is on exposing CRUD (Create, Read, Update, Delete) operations on data.

Communication Protocols and Data Transport: Decisions need to be made regarding the communication protocol (HTTP, WebSockets, etc.) and the data format (JSON, XML, Protocol Buffers) to be used.

API Paradigms:
- REST (Representational State Transfer): A stateless architecture using standard HTTP methods (GET, POST, PUT, DELETE) and commonly JSON for data exchange. While easily consumable, it can lead to over-fetching or under-fetching of data.
- GraphQL: Allows clients to request exactly the data they need, avoiding over-fetching and under-fetching. Uses strongly typed queries, but complex queries can impact server performance, and all requests are typically POST requests with a 200 status code even for errors.
- gRPC (Google Remote Procedure Call): Built on HTTP/2, offering features like multiplexing and server push, and uses Protocol Buffers for efficient data serialization. It’s bandwidth and resource-efficient, especially for microservices, but less human-readable than JSON and requires HTTP/2 support.

Designing Endpoints: API endpoints should reflect the relationships between data (e.g., /users/{user_id}/orders) and support common queries like pagination (limit, offset) and filtering (e.g., by date range).

Idempotency: Well-designed GET requests should be idempotent, meaning making the same request multiple times yields the same result without side effects. GET requests should only be used for data retrieval, not modification. Use PUT or POST for updates and creations.

Backward Compatibility: When modifying endpoints, it’s essential to maintain backward compatibility to avoid breaking existing clients, often by introducing new API versions (e.g., /api/v2/products). In GraphQL, adding new fields without removing old ones helps achieve this.

Rate Limiting: Implementing rate limits prevents abuse and denial-of-service (DoS) attacks by controlling the number of requests a user can make within a certain timeframe.

CORS (Cross-Origin Resource Sharing): Configuring CORS settings controls which domains are allowed to access the API, preventing unwanted cross-site interactions.

Accelerating Delivery: Caching and Content Delivery Networks

Minimizing request latency, especially for users geographically distant from servers, is crucial for a good user experience. Caching and Content Delivery Networks (CDNs) are key strategies for this.

Caching: A technique to improve performance by storing copies of frequently accessed data in temporary storage for faster retrieval.
- Browser Caching: Stores website resources on a user’s local computer, allowing the browser to load sites faster on revisits. Controlled by the Cache-Control header. A cache hit occurs when data is found in the cache, while a cache miss necessitates fetching from the original source. The cache ratio indicates the effectiveness of the cache.
- Server Caching: Stores frequently accessed data on the server-side (in memory like Redis or on disk), reducing the need for expensive operations like database queries. Strategies include:
  - Write-Around Cache: Data is written directly to permanent storage, bypassing the cache (used when write performance is less critical).
  - Write-Through Cache: Data is written simultaneously to the cache and permanent storage (ensures data consistency but can be slower).
  - Write-Back Cache: Data is first written to the cache and then to permanent storage later (improves write performance but risks data loss in case of server crash).
  - Eviction Policies: Rules for removing items from a full cache (e.g., Least Recently Used – LRU, First In First Out – FIFO, Least Frequently Used – LFU).
- Database Caching: Caching database query results, either within the database system or using an external caching layer (e.g., Redis, Memcached), to improve performance for read-heavy applications. Uses similar eviction policies as server-side caching.

Content Delivery Networks (CDNs): A network of geographically distributed servers used to serve static content (JavaScript, HTML, CSS, images, videos). CDNs cache content from the origin server and deliver it to users from the nearest CDN server.
- Pull-Based CDN: The CDN automatically pulls content from the origin server when first requested by a user (ideal for frequently updated static content).
- Push-Based CDN: Content is uploaded to the origin server and then distributed to the CDN (useful for large, infrequently updated files).
- CDNs improve performance by reducing latency, enhancing availability and scalability, and improving security through features like DDoS protection. However, dynamic content or operations requiring server-side logic still need to hit the origin server.

Benefits of Caching: Overall, caching leads to reduced latency, lower server load, and faster load times, resulting in a better user experience.

Acting as Intermediaries: Proxy Servers

Proxy servers act as intermediaries between clients and servers, serving various purposes.

Types of Proxy Servers:
- Forward Proxy: Sits in front of clients to send requests to other servers on the internet (used for internet access control, anonymization, caching).
- Reverse Proxy: Sits in front of one or more web servers, intercepting requests from the internet (used for load balancing, web acceleration, security layer). CDNs are a type of reverse proxy.
- Open Proxy: Allows any user to connect and utilize the proxy (often for anonymizing browsing).
- Transparent Proxy: Passes requests and resources without modification but is visible to the client (used for caching and content filtering).
- Anonymous Proxy: Identifiable as a proxy but doesn’t reveal the original IP address (used for anonymous browsing).
- Distorting Proxy: Provides an incorrect original IP address (similar to anonymous but with misinformation).
- High Anonymity Proxy (Elite Proxy): Makes proxy detection very difficult, ensuring maximum anonymity.

Use Cases of Forward Proxies:
- Instagram Proxies: Used to manage multiple accounts without triggering bans.
- Internet Use Control and Monitoring: Organizations use them to monitor and control employee internet access.
- Caching Frequently Accessed Content: Reduces bandwidth usage and speeds up access within a network.
- Anonymizing Web Access: Hides IP addresses for privacy.

Use Cases of Reverse Proxies:
- Load Balancers: Distribute incoming traffic across multiple servers.
- CDNs: Deliver cached static content to users based on geographical location.
- Web Application Firewalls (WAFs): Inspect incoming traffic to block hacking attempts.
- SSL Offloading/Acceleration: Handle SSL/TLS encryption and decryption, optimizing web server performance.

Balancing the Load: Ensuring Optimal Performance

Load balancers are crucial for distributing incoming network traffic across multiple servers to prevent any single server from being overwhelmed, increasing capacity and reliability.

Common Load Balancing Strategies and Algorithms:
- Round Robin: Distributes requests sequentially to each server.
- Least Connections: Directs traffic to the server with the fewest active connections.
- Least Response Time: Choose the server with the lowest response time and fewest active connections.
- IP Hashing: Directs requests to the same server based on the client’s IP address hash (for session persistence).
- Weighted Algorithms (e.g., Weighted Round Robin, Weighted Least Connections): Assign weights to servers based on their capacity to handle more requests.
- Geographical Algorithms: Direct requests to the geographically closest server.
- Consistent Hashing: Uses a hash function to distribute data across nodes, ensuring clients consistently connect to the same server.
Health Checking: Load balancers continuously monitor the health of servers and only direct traffic to online and responsive ones.

Forms of Load Balancers: Hardware (e.g., F5 BIG-IP, Citrix), software (e.g., HAProxy, Nginx), cloud-based (e.g., AWS ELB, Azure Load Balancer, Google Cloud Load Balancer), and virtual (e.g., VMware Advanced Load Balancer).

Handling Load Balancer Failure: To avoid a single point of failure:
- Redundant Load Balancing (Failover): Using multiple load balancers where one takes over if the primary fails.
- Continuous Monitoring and Health Checks: Detecting and addressing issues early.
- Auto-scaling and Self-healing Systems: Automatically replacing a failed load balancer.
- DNS Failover: Rerouting traffic from the failed load balancer’s IP address to a standby IP with a new load balancer.

The Heart of Data Management: Databases in System Design

Databases are essential for storing and managing application data.

Types of Databases:
- Relational Databases (SQL): Use tables with structured data and SQL as the query language (e.g., PostgreSQL, MySQL, SQLite). They excel at transactions, complex queries, and maintaining data integrity through ACID properties: Atomicity (all or nothing transaction), Consistency (database remains in a valid state), Isolation (transactions are independent), and Durability (committed data persists).
- NoSQL Databases: More flexible, often schema-less, and good for unstructured data, scalability, and quick iteration (e.g., MongoDB (document-based), Cassandra (column-family), Redis (key-value), Neo4j (graph)). They often relax the consistency property of ACID.
- In-Memory Databases: Store data in RAM for lightning-fast retrieval (e.g., Redis, Memcached), primarily used for caching and session storage.
Scaling Databases:
- Vertical Scaling (Scale Up): Improving the performance of a single server (CPU, RAM, disk), but has limitations.
- Horizontal Scaling (Scale Out): Adding more machines to distribute the data and load. Techniques include:
  - Database Sharding: Distributing different portions (shards) of the dataset across multiple servers based on strategies like range-based, directory-based, or geographical sharding.
  - Data Replication: Keeping copies of data on multiple servers for high availability (Master-Slave, Master-Master).
Database Performance Techniques:
- Caching: Using in-memory databases like Redis to cache frequent queries.
- Indexing: Creating indexes on frequently accessed columns to speed up retrieval times.
- Query Optimization: Minimizing joins and using query analysis tools to improve performance.
CAP Theorem in Databases: Remember the CAP theorem when designing distributed database systems, prioritizing two of the three properties (Consistency, Availability, Partition Tolerance) based on application requirements.

FAQ

What are the fundamental hardware components of a computer that are important to understand before designing large-scale systems?

Understanding the individual computer’s architecture is crucial. Key components include:

CPU (Central Processing Unit): The “brain” of the computer that fetches, decodes, and executes instructions. It processes the operations of your code after it’s compiled into machine code.
RAM (Random Access Memory): Volatile primary active data storage for currently used data, variables, and applications, allowing for quick read and write access.
Cache (L1, L2, L3): Smaller, faster memory than RAM that stores frequently used data to reduce the average time to access data and optimize CPU performance.
Disk Storage (HDD or SSD): Non-volatile storage for the operating system, applications, and all user files. SSDs offer significantly faster data retrieval speeds compared to HDDs.
Motherboard: The main circuit board that connects all the computer’s components and provides pathways for data flow.

What are the key elements of a high-level architecture for a production-ready application?

A typical production-ready application architecture includes:

CI/CD Pipeline: Automates the process of integrating code from the repository through testing and deployment to production servers without manual intervention.
Load Balancers and Reverse Proxies: Distribute user requests evenly across multiple servers to maintain a smooth user experience during high traffic.
External Storage Server: A dedicated server connected over a network for storing application data, separate from the production servers.
Logging and Monitoring Systems: Track every micro-interaction, store logs on external services, and analyze data to ensure smooth operation.
Alerting Service: Notifies developers of failing requests or anomalies detected by the logging systems, often integrated with communication platforms like Slack.

What are the three pillars of system design that contribute to a robust and resilient application?

The three key elements of good system design are:

Scalability: The system’s ability to grow and handle an increasing user base and data load efficiently.
Maintainability: Ensuring the system is designed in a way that future developers can easily understand, modify, and improve it.
Efficiency: Making the best possible use of available resources (CPU, memory, network, storage) to achieve optimal performance.

Explain the CAP Theorem and its implications for distributed system design.

The CAP Theorem (Brewer’s Theorem) states that a distributed system can only simultaneously guarantee two out of the following three properties:

Consistency: All nodes in the system have the same data at the same time.
Availability: The system is always operational and responsive to requests.
Partition Tolerance: The system continues to function even if there are network disruptions (partitions) between nodes.

Designing a distributed system requires making trade-offs between these properties based on the specific use case. For example, a banking system might prioritize consistency and partition tolerance over absolute availability during network issues.

What are some key metrics used to measure the performance and reliability of a system?

Several metrics are crucial for evaluating system performance and reliability:

Availability: The percentage of time the system is operational and accessible to users. Service Level Objectives (SLOs) set goals for availability, while Service Level Agreements (SLAs) are formal contracts with users defining the minimum service level, potentially including compensations for downtime.
Reliability: Ensuring the system works correctly and consistently over time.
Fault Tolerance: The system’s ability to handle unexpected failures or attacks and continue functioning.
Redundancy: Having backup systems in place to take over in case of primary component failure.
Throughput: The amount of data or requests the system can handle over a specific period (e.g., requests per second, queries per second, bytes per second).
Latency: The time it takes for a request to be processed and a response to be received.

How do caching and Content Delivery Networks (CDNs) improve system performance?

Caching: Stores copies of frequently accessed data in temporary storage (browser, server, database) to serve future requests faster. This reduces latency, lowers the load on the original data source (like the database), and improves overall response times. Different caching strategies exist, such as write-around, write-through, and write-back. Eviction policies manage what data to remove when the cache is full (e.g., LRU, FIFO, LFU).
Content Delivery Networks (CDNs): A network of geographically distributed servers that cache static content (HTML, CSS, JavaScript, images, videos) closer to users. When a user requests this content, it’s served from the nearest CDN server, reducing latency and improving loading times, especially for users located far from the origin server. CDNs also enhance availability and can provide security benefits.

What is the role of proxy servers and load balancers in system architecture?

Proxy Servers: Act as intermediaries between clients and servers for various purposes:
Forward Proxies: Sit in front of clients to control internet access, anonymize requests, cache content, and monitor usage.
Reverse Proxies: Sit in front of web servers to provide load balancing, web acceleration (caching, compression, SSL offloading), and security (e.g., Web Application Firewalls).
Load Balancers: Distribute incoming network traffic across multiple servers to prevent any single server from being overwhelmed. This increases application capacity, reliability, and availability. They use various algorithms for traffic distribution (e.g., round robin, least connections, least response time, IP hashing, weighted methods, geographical algorithms, consistent hashing) and perform health checks to ensure traffic is only sent to healthy servers. Redundant load balancer setups are crucial to avoid a single point of failure

Conclusion: Embracing the Complexity of System Design

Designing robust and scalable systems is a multifaceted discipline requiring a solid understanding of various concepts, from the fundamental workings of a computer to sophisticated networking and data management strategies. By grasping these core principles and the tradeoffs involved in different design decisions, you can effectively architect applications that can handle the demands of modern software development and excel in system design discussions. Remember that continuous learning and exploration are key to mastering this ever-evolving field.

Đạt Trần

I build softwares that solve problems. I also love writing/documenting things I learn/want to learn.