Tuning Tips to Maximize Postgres Performance

Table of Contents

1 Overview
2 Hardware Design:
3 Virtual Machine Considerations:
4 Operating System Tuning:
5 Postgres Parameter Tuning:
6 WAL (Write-Ahead Log) Settings:
7 Query Planning Parameters:
8 Client Connection Defaults:
9 Autovacuum Settings:
10 Reporting and Logging:
11 Query Tuning and Analysis:
12 Query Rewriting Tips:
13 Performance Pitfalls to Avoid:
14 Partitioning in Postgres:
15 PostgreSQL Performance Tuning FAQ
16 Postgres Performance Tuning and Optimization Study Guide
17 Conclusion:

Overview

This blog post summarizes key insights from a webinar, “Tuning Tips to Maximize Postgres Performance,” hosted by EDB. The webinar, featuring Dave Page, VP and Chief Architect of Database Infrastructure, and Devrim Gundus, an EDB consultant, offers a deep dive into optimizing Postgres performance, from hardware considerations to query tuning techniques.

If you prefer to watch the video instead, here is the link

Hardware Design:

CPU: Opt for CPUs with larger Level 3 caches, as this shared cache between CPU cores enables faster data processing.
Disks: Consider the application’s read/write workload when choosing between SSDs and HDDs. Prioritize RAID configurations (RAID 1 for transaction logs, RAID 10 for data) for performance, backup, and consistency. Leverage tablespaces to strategically store data on drives with varying speeds, pushing less frequently accessed data to slower drives.
RAM: Prioritize RAM upgrades as the most cost-effective performance booster. Abundant RAM facilitates larger caches, minimizing I/O operations and enhancing transaction handling and connection capacity. Explore hot-swap RAM for seamless upgrades and replacements, albeit at a higher cost.
Network: A faster network speeds up data transfer and replication. Factor in backup needs when planning network capacity to avoid bottlenecks during backup operations.

Virtual Machine Considerations:

Virtualization Type: Favor paravirtualization, which leverages special drivers to bypass device virtualization and directly connect to the host machine’s physical devices, resulting in enhanced disk and network performance.
Dedicated Hardware: Mitigate the “noisy neighbor” problem by using dedicated hardware in cloud environments or VMware systems to ensure predictable CPU and RAM availability.
Instance Types: Carefully evaluate instance types in cloud environments, considering not only CPU core count and RAM but also network throughput, as some instance types offer limited network speeds.
NUMA Pinning: In virtualized environments with multiple NUMA nodes (partitions of CPU and memory), pin VMs to specific CPU cores to prevent performance degradation caused by process swapping between NUMA nodes.
Disk Considerations: Pre-allocate virtual disk space to minimize fragmentation on the host machine’s physical drives, especially crucial for database systems where performance is paramount. While using RAID on block devices in Amazon environments shows negligible performance gains due to remote block device limitations, explore RAID benefits in other virtual environments based on network architecture and storage allocation.
Storage Performance: Opt for dedicated or provisioned IOPS in cloud environments to guarantee a specific level of storage performance, though at an increased cost.

Operating System Tuning:

Tuned Daemon: Utilize the tuned daemon for dynamic system tuning. Configure it for Postgres by disabling transparent huge pages and enabling the performance governor for optimal CPU utilization. Remember that these changes require a reboot to take effect.
Huge Pages: Enable huge pages to allocate larger blocks of memory, minimizing memory fragmentation and improving performance as the database cache grows. Calculate the required huge pages based on shared buffer settings and update the tuned daemon configuration accordingly.
File System Optimization: Set “noatime” when mounting the file system to reduce unnecessary file access time checks, improving CPU efficiency.
File System Type: XFS, the default file system on most major operating systems, offers excellent performance for Postgres. Ensure journaling is enabled to prevent data loss. While Btrfs shows promise, it’s not yet on par with XFS performance-wise.

Postgres Parameter Tuning:

Max Connections: Minimize this value, ideally matching the number of CPU cores, to reduce overhead associated with the proc array, which tracks connection states. Consider connection pooling solutions (pgpool, pgBouncer) for managing large numbers of connections.
Shared Buffers: Allocate 30-50% of available RAM to this primary cache for storing data read from disk and updates before being written to disk.
Work Mem: Set the memory block size for sort and hash operations during query execution. Exercise caution, as multiple operations may require work_mem allocation, potentially exceeding RAM limits. Log analysis can help determine the optimal work_mem size.
Maintenance Work Mem: Allocate 1-4 GB of RAM for maintenance operations like vacuum and indexing.
Autovacuum Work Mem: Similar to maintenance_work_mem, but specifically for the autovacuum daemon. Setting it to -1 defaults to the maintenance_work_mem value.
Effective IO Concurrency: For modern SSDs and NVMe drives, set this value to a few hundred to account for their parallel processing capabilities.

WAL (Write-Ahead Log) Settings:

wal_compression: Enable to reduce I/O load by compressing WAL data before writing to disk.
wal_log_hints: Enable for improved performance with utilities like pg_rewind for backups and restores.
wal_buffers: Set to 64 MB to accommodate four 16 MB WAL files.
checkpoint_completion_target: Set to 0.9 (the default in Postgres 14 and later) to spread checkpoint activity evenly across the checkpoint interval.
checkpoint_timeout: Configure the maximum time between checkpoints, balancing recovery time with I/O load.
max_wal_size: Set a high enough value to ensure checkpoints primarily occur based on the timeout setting rather than the WAL size.

Query Planning Parameters:

seq_page_cost, random_page_cost: Adjust these settings to reflect the relative costs of sequential and random disk access, considering SSDs and NVMe drives have minimal differences in these costs.
cpu_tuple_cost: Set the relative cost of processing a single row in a query. These cost parameters influence the query planner’s decision-making process, such as choosing between sequential and index scans.
effective_cache_size: Provide a hint to the query planner regarding the expected amount of data residing in various caches (shared buffers, kernel cache), impacting cost estimations for memory-resident data access.

Client Connection Defaults:

idle_in_transaction_session_timeout: Set a timeout to terminate idle transactions, preventing resource contention and maintenance issues caused by long-held snapshots.
shared_preload_libraries: Include ‘pg_stat_statements’ for enhanced query monitoring and tuning capabilities.

Autovacuum Settings:

autovacuum_analyze_threshold, autovacuum_vacuum_threshold: Configure thresholds to trigger autovacuum operations based on table activity.
autovacuum_naptime: Set the interval between autovacuum worker checks for work.
autovacuum_vacuum_cost_delay: Introduce a delay to throttle autovacuum activity, reducing resource contention.
autovacuum_max_workers: Define the number of concurrent autovacuum worker processes.
autovacuum_cost_limit: Set a limit to prevent excessive resource consumption by autovacuum.

Reporting and Logging:

log_temp_files: Enable to track temporary file usage, particularly useful for monitoring work_mem adequacy.
log_checkpoints: Enable to monitor checkpoint activity, providing insights into timing and resource usage.
edb.enable_time_based_statistics: Enable this setting in EDB Postgres Advanced Server to leverage the Dynamic Runtime Instrumentation Tools Architecture (DRITA) for comprehensive system performance logging.

Query Tuning and Analysis:

Slow Query Identification: Use log_min_duration_statement to log queries exceeding a specified duration, helping pinpoint performance bottlenecks.
pg_stat_statements: Leverage this extension to analyze query performance and resource usage.
pgBadger: Employ this open-source tool to analyze Postgres logs and identify performance issues and trends.
Explain Analyze: Utilize the EXPLAIN ANALYZE command within a transaction block to gain insights into query execution plans, identifying slow operations and areas for optimization. Graphical tools like pgAdmin and PEM offer visual representations of EXPLAIN ANALYZE output.

Query Rewriting Tips:

Naked Columns: Use column names directly in WHERE clauses instead of expressions to facilitate index usage.
CTEs (Common Table Expressions): Postgres 12 and later versions offer improved CTE performance.
Group By Optimization: Group by the least complex data types first for increased efficiency.

Performance Pitfalls to Avoid:

Outdated Statistics: Ensure statistics are up-to-date using ANALYZE or autoanalyze to prevent inaccurate query plans.
External Sorts: Increase work_mem to avoid disk-based sorting.
Hash Operations: Analyze and optimize queries involving hash joins, hash aggregates, and bitmap scans.
Suboptimal Plan Shapes: Use EXPLAIN ANALYZE to identify and address inefficient query plans.

Partitioning in Postgres:

Benefits: Facilitates data aging, streamlines maintenance tasks, enables parallel query execution, and enhances overall performance.
Types: Range, List, and Hash partitioning are available in Postgres.
Automatic Partitioning (EDB Postgres Advanced Server): Simplify partition management with automatic partition creation based on insert values, especially for list partitioning.

PostgreSQL Performance Tuning FAQ

1. What are the key hardware considerations for optimal PostgreSQL performance?

CPU: Prioritize CPUs with larger Level 3 caches for better data sharing between cores.
Disks: Choose disks based on your workload (read-intensive, write-intensive, or mixed). Use RAID 1 for transaction logs and RAID 10 for data for both performance and backup purposes. Consider using tablespaces to move less frequently accessed data (like archives) to slower drives.
RAM: Invest in as much RAM as possible. RAM is the cheapest component to scale, and more RAM translates to larger caches, reducing I/O dependence and allowing for more connections and transactions.
Network: A faster network equates to faster data transfer and replication. Ensure sufficient network bandwidth between databases and application servers, especially for backups.

2. How can I tune my virtual machine environment for PostgreSQL?

Virtualization Type: If possible, use paravirtualization to offload disk and network operations to the host machine for better performance.
Dedicated Hardware: Opt for dedicated hardware to avoid “noisy neighbors” and ensure predictable CPU and RAM resources.
Instance Type Selection: Carefully choose instance types with suitable vCPU cores, RAM, and network throughput for your workload.
NUMA Pinning: Consider pinning VMs to specific CPU cores in NUMA-aware environments like IBM Power to prevent performance degradation from process swapping between NUMA nodes.
Pre-allocated Disks: Pre-allocate virtual disk space to minimize fragmentation and performance issues as the database grows.
RAID Configuration: Test the impact of using software RAID (like Linux MD RAID) in your virtual environment. It may not offer noticeable performance benefits in all cases.
Provisioned IOPS: Consider paying for dedicated or provisioned IOPS for storage devices to guarantee a specific level of performance, especially in cloud environments.

3. What are the essential operating system tuning recommendations for PostgreSQL?

Tuned Daemon: Utilize the tuned daemon (available on major operating systems) to optimize system settings for PostgreSQL. Create a custom profile (edb-postgres) that disables transparent huge pages and sets the CPU governor to performance. Remember to reboot the machine after applying changes.
Huge Pages: Enable huge pages to allocate larger memory blocks, reducing memory fragmentation and improving performance, especially as the database cache grows. Calculate the required huge pages based on shared_buffers and set the value using the tuned daemon.
File System Optimization: Mount file systems with noatime to avoid unnecessary file access time updates.
File System Type: Choose xfs as the preferred file system due to its performance and popularity, and ensure journaling is enabled for data integrity.

4. What are the key PostgreSQL configuration parameters (GUCs) to adjust for performance?

max_connections: Set this to a reasonable number based on expected concurrent connections and the number of CPU cores. Avoid excessively high values, especially without connection pooling.
shared_buffers: Allocate 30-50% of available RAM to this parameter for the main PostgreSQL cache.
work_mem: Set a sensible value for sort and hash operations, considering that it can be allocated multiple times for complex queries. Monitor for “temp file” creation to adjust as needed.
maintenance_work_mem: Allocate 1-4 GB for maintenance operations like vacuum and indexing.
effective_io_concurrency: Set to a few hundred for modern SSDs and NVMe drives to utilize their parallel I/O capabilities.
wal_compression: Enable WAL compression to reduce I/O overhead.
checkpoint_completion_target: Set to 0.9 (90%) to spread checkpoint activity over the checkpoint interval.

5. How can I identify and tune slow queries in PostgreSQL?

Enable Slow Query Logging: Set log_min_duration_statement to capture queries exceeding a specific duration threshold.
Analyze Query Plans: Utilize EXPLAIN ANALYZE within a transaction block to understand query execution plans, identifying bottlenecks like slow joins, sorts, or inefficient index usage.
Leverage Monitoring Tools: Utilize tools like pg_stat_statements, pgBadger, and pgAdmin to gain insights into query performance and identify areas for improvement.

6. What are common query patterns to avoid or optimize in PostgreSQL?

Expressions in WHERE Clauses: Use “naked columns” (direct column references) instead of expressions to allow index usage.
NOT IN and Correlated Subqueries: Consider alternative approaches like LEFT JOIN and NOT EXISTS for potentially better performance.
CTEs Before PostgreSQL 12: Be cautious when using CTEs in older versions due to potential planner limitations.
GROUP BY Ordering: Group by least complex data types before more complex ones for efficiency.

7. What are the benefits of partitioning in PostgreSQL?

Simplified Maintenance: Easier data aging, selective backups, and index maintenance.
Performance Optimization: Run parallel queries across partitions and improve query performance by targeting specific data subsets.
Enhanced Data Management: Organize large datasets based on logical criteria (date, region, etc.) for easier management.

Postgres Performance Tuning and Optimization Study Guide

Short-Answer Quiz

Instructions: Answer the following questions in 2-3 sentences each.

Why is selecting the right CPU important for Postgres performance, specifically concerning level 3 caches?
Explain the advantages and disadvantages of using RAID 10 for data storage in Postgres.
Why is it generally recommended to invest in as much RAM as possible when designing a Postgres system?
What are noisy neighbors in virtualized environments and how can their impact be minimized?
Describe the difference between para-virtualization and full virtualization, and its relevance to Postgres performance.
Explain the concept of NUMA pinning and in what scenarios it might be beneficial for Postgres.
Why is pre-allocating disk space for virtual machines important for Postgres performance?
What is the purpose of the “tuned” daemon and how can it be configured to optimize Postgres performance?
Describe the benefits of using huge pages in Postgres and explain the process of enabling them.
Why is it important to disable access time updates for the Postgres data directory file system?

Short-Answer Quiz Answer Key

The level 3 cache is shared among CPU cores, facilitating data sharing. Larger L3 caches can handle more data, leading to faster processing and improved performance.
RAID 10 provides both data redundancy and performance improvements through striping. However, it requires more storage space compared to other RAID levels.
More RAM allows for larger shared buffers, effectively caching more data and reducing disk I/O, resulting in significant performance gains.
Noisy neighbors are other VMs on the same physical host competing for resources. Their impact can be minimized by using dedicated hardware or choosing instance types with appropriate resource allocation.
Para-virtualization utilizes specialized drivers to offload tasks like disk and network operations to the host, reducing virtualization overhead. Full virtualization emulates all hardware, leading to slower performance.
NUMA pinning assigns VMs to specific CPU cores and memory nodes, preventing performance degradation caused by processes being swapped between NUMA nodes on systems with multiple NUMA nodes.
Pre-allocation prevents disk fragmentation that can occur as the virtual disk dynamically expands, ensuring consistent disk I/O performance.
The “tuned” daemon dynamically tunes system settings. For Postgres, it can be configured to disable transparent huge pages, enable the performance CPU governor, and optimize kernel parameters for better database performance.
Huge pages allocate larger memory blocks, reducing memory fragmentation and improving performance. To enable them, calculate the required huge pages based on shared buffers, configure the “tuned” daemon, enable huge pages in Postgres, and restart the database server.
Postgres does not utilize access time information. Disabling access time updates reduces unnecessary disk writes, saving CPU cycles and improving overall system performance.

Glossary of Key Terms

TermDefinitionLevel 3 Cache (L3) A type of CPU cache shared among all cores, holding frequently accessed data to accelerate processing. Larger L3 caches improve performance by reducing the need to fetch data from main memory.

RAID Redundant Array of Independent Disks. A technology that combines multiple physical disks into a logical unit, offering data redundancy and/or performance improvements.

Shared Buffers The main memory area used by Postgres to cache data, reducing disk I/O and improving query performance. Larger shared buffers typically lead to better performance.

Noisy Neighbor A virtual machine or application running on the same physical host, competing for resources and potentially impacting the performance of other VMs.

Para-virtualization A virtualization technique that utilizes specialized drivers to offload specific tasks to the host machine, reducing virtualization overhead and improving performance.

NUMA Pinning Assigning a VM to specific CPU cores and memory nodes on a system with Non-Uniform Memory Access (NUMA) architecture. This improves performance by preventing processes from being swapped between nodes.

Huge Pages Larger memory pages used to reduce memory fragmentation and improve performance, especially beneficial for databases handling large datasets.

Tuned Daemon A service that dynamically tunes system settings based on usage patterns and workload profiles. It can be configured to optimize for Postgres performance.

Explain Analyze A Postgres command that executes a query and provides detailed information about the query plan, execution time, and resource utilization. It is a crucial tool for query optimization.

Partitioning Dividing large tables into smaller, more manageable parts. This improves query performance, simplifies data management, and facilitates data aging.

Work Mem The amount of memory allocated for sort and hash operations within a query. Increasing this value can improve the performance of complex queries.

Effective Cache Size A configuration parameter that hints to the query planner about the amount of data likely to be in RAM, including shared buffers and kernel caches.

Write-Ahead Log (WAL)A log file that records all changes made to the database, ensuring data durability and crash recovery.

Checkpoint A process that writes modified data pages from shared buffers to the disk, ensuring data consistency and recovery points.

pg_stat_statements A Postgres extension that collects and provides statistics on query execution, aiding in performance analysis and query tuning.

pgbadger An open-source tool for analyzing Postgres log files, providing insights into query performance, slow queries, and other database activity.

CTE (Common Table Expression)A temporary, named result set defined within a query, allowing for modularity and code reusability.

Conclusion:

Tuning Postgres for optimal performance is an ongoing process that involves continuous monitoring, analysis, and adjustments. Keeping the operating system, Postgres configuration, and queries in harmony is crucial for achieving and maintaining peak database performance. Stay tuned to new Postgres releases and best practices to ensure your database remains in top shape.

Đạt Trần

I build softwares that solve problems. I also love writing/documenting things I learn/want to learn.