← Back to Blog

RDS Cost: The Three Configurations Nobody Reviews

RDS cost optimization

RDS costs have three distinct charge categories: instance compute (the db instance type), storage (the gp2 or io1 volume), and Multi-AZ standby capacity. Each category has its own optimization opportunities, and each category contains configurations that persist through standard cost reviews for structural reasons.

This article covers the three configurations we find most consistently across accounts: Multi-AZ on databases where it is not justified, read replicas with zero query traffic, and storage auto-scaling that has triggered and never scaled back. Across the 50 accounts analyzed in our initial access period, these three categories represent 14% of total AWS savings identified — making them the third-largest optimization category after non-production scheduling and EBS snapshot accumulation.

Configuration 1: Multi-AZ on Non-Critical Databases

AWS RDS Multi-AZ creates a synchronous standby replica in a different availability zone and provides automatic failover in approximately 60-120 seconds if the primary fails. For databases that are on the critical path of user-facing requests — where a 60-120 second outage would cause user-visible service disruption — Multi-AZ is justified at its cost of double the instance price.

For databases where a 60-120 second or even 10-minute recovery time is acceptable, Multi-AZ provides no meaningful availability improvement over a single-AZ instance with automated backups. A database that runs a nightly batch job for internal reporting, a database that stores processed events for offline analytics, a database backing a non-revenue admin interface — for these workloads, the Multi-AZ standby instance is capacity that is never used and serves a purpose (sub-2-minute failover) that the workload does not require.

Identifying non-critical databases requires answering one question per database: what is the maximum acceptable recovery time if this database becomes unavailable? The answer requires talking to the team that owns the workload. Cost Explorer cannot tell you the criticality of a database — only the engineering team that owns it can. This is why RDS Multi-AZ review is typically lower-priority in FinOps programs: it requires a conversation with each team rather than an automated analysis.

The practical approach: identify all Multi-AZ RDS instances, send a structured questionnaire to the owning team (maximum acceptable downtime: <2 min / 2-10 min / 10-60 min / >60 min), and convert instances where the answer is 10 minutes or more to single-AZ with automated backups and a restore-from-backup procedure documented. For a db.m5.xlarge instance at $0.384/hour, disabling Multi-AZ saves $193/month per instance. Across a large account with 30 databases, identifying that 40% are non-critical generates $2,300/month in savings.

Configuration 2: Read Replicas With Zero Query Traffic

RDS read replicas are created to offload read queries from the primary instance and to serve as promotion candidates if the primary fails. A read replica that receives no queries serves neither purpose but still runs as a full instance at the same cost as an equivalent standalone instance.

Zero-traffic read replicas occur in two scenarios. First, the application that used the replica's endpoint was modified to use the primary endpoint directly (a common change when query latency is not a concern and the developer wants consistent reads), but the replica was not decommissioned. Second, the replica was created for an expected traffic volume that did not materialize — a feature that was built but not launched, or a migration that was completed but the read replica from the migration period was not removed.

Identifying zero-traffic replicas is straightforward: the DatabaseConnections CloudWatch metric shows the number of active connections per RDS instance. A read replica with fewer than 5 connections averaged over 30 days is receiving negligible query traffic. For replicas with larger connection counts, the ReadIOPS metric provides a secondary check — if ReadIOPS is near-zero despite non-zero connections, the connections are health checks rather than query traffic, and the replica is effectively unused.

Before deleting a read replica, verify that it is not a failover target configured as a promotion candidate in the application's connection logic. Some applications implement manual failover by updating a configuration parameter to the replica's endpoint. If the replica is listed as a failover option in any configuration management system, confirm the failover procedure before deletion.

Configuration 3: Storage Auto-Scaling That Never Scales Back

RDS storage auto-scaling automatically expands EBS storage capacity when the database is running low. This is a valuable feature — running out of RDS storage space causes an immediate database failure. However, RDS storage auto-scaling only scales up, never down. Once storage has expanded from 500GB to 1TB due to a data load that subsequently reduced, the allocated storage remains at 1TB indefinitely.

The cost impact depends on storage type and size. gp3 storage costs $0.115/GB-month. An instance that expanded from 500GB to 1TB has an extra 500GB of allocated storage costing $57.50/month that is entirely empty. For instances that have auto-scaled multiple times or that handled a large one-time data load, the gap between allocated and used storage can be substantial.

Identifying this gap requires querying the RDS FreeStorageSpace CloudWatch metric against the AllocatedStorage configuration value. If FreeStorageSpace consistently represents more than 50% of AllocatedStorage over a 60-day period, the database is carrying excess allocated capacity that could be reduced. Note that RDS does not natively support storage reduction — you must snapshot the instance, create a new instance from the snapshot with the reduced storage allocation, verify data integrity, update the application endpoint, and decommission the original. This process has a maintenance window component and is more involved than the Multi-AZ change, which is why it is often deferred.

The threshold for justifying the storage reduction effort is approximately $100/month in excess storage cost. Below that threshold, the engineering time cost of the reduction procedure may exceed the monthly savings. Above $100/month, the annualized savings of $1,200 justifies the 3-4 hours of engineering time required.

RDS Instance Right-Sizing: The Fourth Configuration

Beyond the three configurations above, RDS instance right-sizing follows the same logic as EC2 right-sizing but with additional complexity. RDS instances have two primary resource dimensions that matter for right-sizing: CPU and memory. Network I/O and disk throughput are important for write-heavy workloads but are less frequently the binding constraint.

The specific challenge for RDS right-sizing is that database performance is more sensitive to under-provisioning than application server performance. An EC2 application server that is undersized will slow down requests proportionally — the performance degradation is gradual and usually recoverable. An undersized RDS instance can experience lock contention and queue buildup that creates cascading failures rather than graceful degradation.

The recommended approach for RDS right-sizing is conservative: use a 60% headroom buffer (rather than the 30% used for stateless EC2 services), require 90 days of utilization data rather than 30, and only propose a downsize of one step in the instance class hierarchy at a time. A db.m5.4xlarge running at 12% average CPU should be recommended for db.m5.2xlarge, not db.m5.xlarge — the two-step jump is a larger risk even if the utilization data supports it.

Prioritizing RDS Optimization Work

Across the three configurations described in this article, the optimal review sequence for most accounts is: first, the storage auto-scaling audit (takes two hours, no conversations required, savings are immediate); second, the read replica zero-traffic audit (takes two hours plus verification, no team conversations required for clearly zero-traffic replicas); third, the Multi-AZ criticality review (requires team conversations, but the $193/month-per-instance savings justify the effort for any account with more than 10 Multi-AZ instances).

The Multi-AZ review also generates useful organizational data: mapping each RDS database to its owning team and its criticality tier. That mapping is reusable for future cost optimization work, disaster recovery planning, and SLA documentation — making the upfront investment in the review more broadly valuable than the immediate cost savings alone.

As we describe in our account-level analysis in what we learned from our first 50 customer accounts, RDS optimization is most effective when run concurrently with the non-production scheduling and EBS snapshot audit, so that the three categories together capture the 61% of savings that the top three categories represent.

Identify your RDS optimization opportunities in 15 minutes

KernelRun scans RDS instances for Multi-AZ appropriateness, zero-traffic replicas, and storage allocation gaps within the first analysis run. Connect your first account in 4 minutes.

Request a Demo