How to Tune Asynchronous I/O (AIO) in PostgreSQL 18

Contents show

PostgreSQL 18 introduces one of the most significant architectural changes in years: asynchronous I/O (AIO). With AIO, PostgreSQL gains more control over how storage I/O is scheduled, enabling better performance and utilization of modern hardware.

If you haven’t yet caught up, we covered the v18 release announcement last week.

This tutorial will help you understand the basics of AIO in PostgreSQL 18 and show you how to tune it for your environment. We’ll focus on the two most important parameters: io_method and io_workers.

Traditionally, PostgreSQL backends performed I/O synchronously: when a backend needed a page, it read it directly, blocking until the operation completed. AIO changes this by introducing ways to schedule reads asynchronously, allowing the database to overlap work and distribute the cost of checksumming, copying, and I/O handling across processes.

Currently in PostgreSQL 18:

Only reads are handled by AIO.
Some scan types (like index scans) still fall back to synchronous I/O.
Future releases are expected to expand AIO coverage.

Key Parameters

PostgreSQL 18 introduces two main GUCs (configuration parameters) for AIO:

io_method  = worker   # options: sync, worker, io_uring
io_workers = 3

io_method  = worker   # options: sync, worker, io_uring
io_workers = 3

Other AIO-related parameters exist (like io_combine_limit), but their defaults are reasonable for most workloads. Focus on the two above.

The io_method determines how PostgreSQL performs I/O:

sync
Uses traditional synchronous reads with posix_fadvise (where supported). Think of this as a compatibility mode — similar to PostgreSQL 17 behavior. Useful if you need to minimize changes in behavior.
worker (default)
Spawns a pool of I/O worker processes. Backends enqueue I/O requests in shared memory, workers handle the reads, and then copy data into shared buffers. This is portable and available on all platforms.
io_uring
Uses Linux’s modern io_uring API, with each backend managing its own submission and completion queues. It can be very efficient but is Linux-only, and some container runtimes disable it due to security concerns. It also consumes more file descriptors, so you may need to increase ulimit -n.

Practical advice:
Stick with the default worker method unless you:

Need legacy behavior (sync), or
Are running on Linux and can confirm io_uring improves your workload.

Tuning `io_workers`

The default is very conservative:

io_workers = 3

io_workers = 3

That works fine on small systems (e.g., Raspberry Pi or low-core VMs), but it severely underutilizes modern multi-core servers.

For example:

On a 12-core workstation, increasing workers from 3 → 12 nearly doubled sequential scan performance in benchmarks.
Bitmap scans also benefited significantly from having more workers.

Rule of Thumb

Set io_workers to a fraction of your CPU cores:

Start with ~25% of total cores.
On high-end systems, you can experiment with going as high as 1 worker per core.
Having a few “extra” workers is safer than too few — workers are relatively cheap.

Trade-Offs and Limitations

When tuning AIO, be aware of these considerations:

Bandwidth vs. CPU distribution
With worker, checksum verification and memory copies are spread across workers. With io_uring, this overhead stays inside each backend. On some workloads, distributing the load with workers is faster.
Signals overhead
Worker mode uses UNIX signals for inter-process communication. Each I/O may trigger two signals (request + completion). On modern CPUs, this can handle 2–4 GB/s of small-block reads, but signals can become a bottleneck in edge cases.
File descriptor limits (io_uring only)
Because io_uring requires one FD per backend, large systems may quickly hit the per-process FD limit. If you enable io_uring, consider raising ulimit -n.
Scope
As of PostgreSQL 18, AIO applies only to reads. Index scans and some other operations remain synchronous.

Practical Recommendations

Here’s a safe, general approach:

Leave io_method = worker.
Switch to io_uring only after careful benchmarking. Use sync if you need PostgreSQL 17-like behavior.
Increase io_workers.
Use ~25% of cores as a starting point:
- 8-core server → io_workers = 2
- 32-core server → io_workers = 8
- 128-core server → io_workers = 32
  Experiment upward if sequential or bitmap scans are critical in your workload.
Monitor performance.
Use pg_stat_io (new in PG 16+) to observe how I/O is being executed and where bottlenecks are.
Report findings.
AIO is brand new — if you discover interesting results, share them with the PostgreSQL community. Real-world experience will shape future defaults.

AIO in PostgreSQL 18 is a foundational step. Future versions will likely:

Add writes to the AIO path.
Implement adaptive worker scaling (workers spin up or down automatically).
Improve efficiency of inter-process signaling.

Also here are a few config profiles you can drop into postgresql.conf (or an included .conf) plus quick notes for when to tweak. I’ve grouped them by server size and kept the focus on AIO while also setting the usual “core” knobs. The config you use depends on your server size.

Profile A — Small VM / dev

Assumptions: 4 vCPU, 16 GB RAM, mixed read/write OLTP

# --- Memory / cache ---
shared_buffers = 4GB
effective_cache_size = 10GB
work_mem = 16MB
maintenance_work_mem = 512MB

# --- WAL / checkpoints ---
wal_compression = on
max_wal_size = 8GB
min_wal_size = 2GB
checkpoint_timeout = '15min'
checkpoint_completion_target = 0.9
synchronous_commit = on

# --- AIO (PostgreSQL 18) ---
io_method = worker           # options: sync, worker, io_uring
io_workers = 1               # ~25% of 4 cores ≈ 1

# --- Planner hints (optional) ---
random_page_cost = 1.1
effective_io_concurrency = 64  # still used by some paths

# --- Autovacuum sanity ---
autovacuum_vacuum_cost_limit = 4000
autovacuum_naptime = '10s'

# --- Memory / cache ---
shared_buffers = 4GB
effective_cache_size = 10GB
work_mem = 16MB
maintenance_work_mem = 512MB

# --- WAL / checkpoints ---
wal_compression = on
max_wal_size = 8GB
min_wal_size = 2GB
checkpoint_timeout = '15min'
checkpoint_completion_target = 0.9
synchronous_commit = on

# --- AIO (PostgreSQL 18) ---
io_method = worker           # options: sync, worker, io_uring
io_workers = 1               # ~25% of 4 cores ≈ 1

# --- Planner hints (optional) ---
random_page_cost = 1.1
effective_io_concurrency = 64  # still used by some paths

# --- Autovacuum sanity ---
autovacuum_vacuum_cost_limit = 4000
autovacuum_naptime = '10s'

When to nudge up io_workers: frequent bitmap/sequential scans, or your I/O queue looks busy while CPU is idle.

Profile B — Mid-range server

Assumptions: 16 cores, 64 GB RAM, mixed OLTP/analytics

# --- Memory / cache ---
shared_buffers = 16GB
effective_cache_size = 40GB
work_mem = 32MB
maintenance_work_mem = 2GB

# --- WAL / checkpoints ---
wal_compression = on
max_wal_size = 24GB
min_wal_size = 6GB
checkpoint_timeout = '30min'
checkpoint_completion_target = 0.9
synchronous_commit = on

# --- AIO (PostgreSQL 18) ---
io_method = worker
io_workers = 4                # ~25% of 16 cores

# --- Planner hints ---
random_page_cost = 1.1
effective_io_concurrency = 128

# --- Autovacuum ---
autovacuum_vacuum_cost_limit = 6000
autovacuum_naptime = '10s'

# --- Memory / cache ---
shared_buffers = 16GB
effective_cache_size = 40GB
work_mem = 32MB
maintenance_work_mem = 2GB

# --- WAL / checkpoints ---
wal_compression = on
max_wal_size = 24GB
min_wal_size = 6GB
checkpoint_timeout = '30min'
checkpoint_completion_target = 0.9
synchronous_commit = on

# --- AIO (PostgreSQL 18) ---
io_method = worker
io_workers = 4                # ~25% of 16 cores

# --- Planner hints ---
random_page_cost = 1.1
effective_io_concurrency = 128

# --- Autovacuum ---
autovacuum_vacuum_cost_limit = 6000
autovacuum_naptime = '10s'

If you run periodic reporting scans, try io_workers = 6–8 and compare sequential/bitmap scan timings.

Profile C — Large analytics box

Assumptions: 64–128 cores, 256–512 GB RAM, scan-heavy (DWH/HTAP)

# --- Memory / cache ---
shared_buffers = 64GB            # keep headroom for filesystem cache
effective_cache_size = 320GB
work_mem = 64MB                  # raise carefully; it multiplies
maintenance_work_mem = 8GB

# --- WAL / checkpoints ---
wal_compression = on
max_wal_size = 64GB
min_wal_size = 8GB
checkpoint_timeout = '30min'
checkpoint_completion_target = 0.95
synchronous_commit = on          # consider 'remote_apply/off' per durability needs

# --- AIO (PostgreSQL 18) ---
io_method = worker
io_workers = 16                  # ~25% of 64 cores; try 24–32 on 128 cores

# --- Planner hints ---
random_page_cost = 1.05
effective_io_concurrency = 256

# --- Autovacuum / large tables ---
autovacuum_max_workers = 6
autovacuum_vacuum_cost_limit = 8000

# --- Memory / cache ---
shared_buffers = 64GB            # keep headroom for filesystem cache
effective_cache_size = 320GB
work_mem = 64MB                  # raise carefully; it multiplies
maintenance_work_mem = 8GB

# --- WAL / checkpoints ---
wal_compression = on
max_wal_size = 64GB
min_wal_size = 8GB
checkpoint_timeout = '30min'
checkpoint_completion_target = 0.95
synchronous_commit = on          # consider 'remote_apply/off' per durability needs

# --- AIO (PostgreSQL 18) ---
io_method = worker
io_workers = 16                  # ~25% of 64 cores; try 24–32 on 128 cores

# --- Planner hints ---
random_page_cost = 1.05
effective_io_concurrency = 256

# --- Autovacuum / large tables ---
autovacuum_max_workers = 6
autovacuum_vacuum_cost_limit = 8000

If you see a single busy backend saturating checksum/memcpy, scaling io_workers toward 1 per core can help sequential scans.

If your benchmarks on Linux show io_uring winning for your workload:

io_method = io_uring
# io_workers is ignored for io_uring, but leaving a small value is harmless.
io_workers = 4

io_method = io_uring
# io_workers is ignored for io_uring, but leaving a small value is harmless.
io_workers = 4

Raise file descriptor limits (example for systemd service):

# /etc/systemd/system/postgresql.service.d/limits.conf
[Service]
LimitNOFILE=1048576

# /etc/systemd/system/postgresql.service.d/limits.conf
[Service]
LimitNOFILE=1048576

And at the shell for the postgres user (if needed):

ulimit -n 1048576

ulimit -n 1048576

Watch out in containers: some runtimes disable io_uring. If submissions silently fall back or fail, use worker.

Was this helpful?

Thanks for your feedback!