Scaling Relational Databases: Sharding, Partitioning, and Read Replicas Compared

Updated: 14 May, 20268 mins read
Andrei
AndreiLead Engineer
Updated: 14 May, 20268 mins read
Andrei
AndreiLead Engineer

Relational databases can scale much further than many teams expect. A well-indexed PostgreSQL, MySQL, SQL Server, or managed cloud database can often support serious production traffic for years before the architecture needs to change. The harder question is not "How do we scale the database?" It is "Which pressure are we actually trying to relieve?"

Sharding, partitioning, and read replicas are often discussed together because they all split work in some way. But they solve different problems.

Read replicas spread read traffic across additional database instances. Partitioning divides a large table into smaller physical segments inside the database engine. Sharding distributes data across multiple independent database nodes or clusters, usually requiring application or middleware awareness.

Those distinctions matter. Choosing the wrong pattern can add operational complexity without improving the bottleneck. Choosing the right one can extend the life of a relational system, reduce cloud cost growth, and keep engineering teams shipping without turning the database into a permanent migration project.

For organisations designing cloud platforms, this is less about fashionable architecture and more about constraints: workload shape, write volume, consistency requirements, tenancy model, reporting needs, team maturity, cost, compliance, and recovery expectations. That constraint-led view is central to good cloud architecture, as Westpoint discusses in Cloud architecture is not about technology. It is about constraints.

The scaling problem is usually three problems

Database scaling conversations often start too late, when latency has already degraded or maintenance jobs are running into business hours. By that point, every option looks urgent. A better approach is to separate the pressure into three categories.

The first is read pressure. The application has many queries, dashboards, exports, search pages, API calls, or reporting jobs competing with transactional traffic. CPU, buffer cache, connection limits, and I/O are under pressure, but most of the work is reading existing data.

The second is data management pressure. A table has grown large enough that index maintenance, vacuuming, backups, retention, deletes, query planning, or archival jobs have become slow or unpredictable. The issue may not be total database throughput; it may be that one or two tables are too large to operate cleanly.

The third is write and storage pressure. The primary database can no longer handle the write volume, working set, lock contention, tenant growth, or storage footprint on a single node. Vertical scaling is reaching practical or financial limits. This is where the architecture starts to become distributed.

Read replicas mainly address read pressure. Partitioning mainly addresses large-table management and query pruning. Sharding addresses write, storage, tenant, and horizontal scale limits, but at the highest operational cost.

Read replicas: scale reads without changing the data model

A read replica is a copy of the primary database used for read-only traffic. The application writes to the primary and sends selected reads to one or more replicas. Managed services make this pattern easier to adopt. For example, Amazon RDS read replicas are designed to let teams serve read-heavy workloads from multiple copies of the data, and AWS notes that RDS uses database engines' native asynchronous replication for engines such as MySQL, MariaDB, PostgreSQL, Oracle, and SQL Server.

Read replicas are usually the lowest-friction scaling option because they preserve the primary schema and application model. You do not need to split tenants, redesign keys, or rewrite every query. Many teams introduce replicas for reporting, analytics extracts, public read APIs, product catalogue browsing, admin dashboards, or background jobs.

They work best when reads can tolerate slight replication lag. In asynchronous replication, the replica trails the primary by some amount of time. That lag may be milliseconds under normal load and much longer during spikes, maintenance, long transactions, or network issues. The application must understand which reads require fresh data.

A common pattern is read-your-writes routing. After a user changes an order, profile, booking, or payment state, the next request for that user goes to the primary for a short period. Less sensitive reads go to replicas. This keeps user experience consistent without forcing every query onto the primary.

Read replicas are not a general answer to write bottlenecks. Every write still lands on the primary before it is replicated. They can even increase write-side work because changes must be streamed and applied elsewhere. They also introduce operational questions: replica lag monitoring, failover behaviour, backup policy, schema migration sequencing, query routing, and the risk of running expensive reporting queries that saturate the replica and fall behind.

The strength of read replicas is that they buy time and reduce load with relatively little data model disruption. The weakness is that they do not remove the single write primary.

Partitioning: make large tables governable

Partitioning splits a large table into smaller pieces according to a partition key. In PostgreSQL, for example, declarative table partitioning lets a table be defined as partitioned while the actual storage belongs to child partitions. Rows are routed to partitions based on the partition key, such as date, tenant, region, status, or hash value.

Unlike sharding, partitioning usually happens inside one database system. The application may still query the logical parent table. The database planner can then prune partitions when the query predicate matches the partitioning scheme.

Partitioning is especially useful for time-series or lifecycle-heavy data. Audit logs, events, telemetry, transactions, orders, messages, and billing records often have natural time boundaries. If queries usually target the last day, week, month, or quarter, partitioning by date can reduce the amount of data scanned. It can also make retention cleaner: dropping an old monthly partition is usually much cheaper than deleting millions of individual rows.

Partitioning can also help with operational maintenance. Smaller partitions can make index rebuilds, backups, data loading, and archival processes easier to schedule. Different partitions can have different storage or indexing strategies in some systems. For multi-tenant platforms, partitioning by tenant or by a hash of tenant ID can isolate some access patterns, though it needs careful design to avoid creating too many partitions or concentrating large tenants in awkward places.

The trap is assuming partitioning is a performance button. It only helps when the database can exclude partitions or operate on smaller physical units. If queries do not include the partition key, the database may still inspect many partitions. If the partitioning key has poor cardinality, a few partitions may become hot. If the team creates thousands of partitions without understanding planner and maintenance overhead, performance can get worse.

Partitioning is often a strong middle step before sharding. It keeps the relational model intact while making the largest tables easier to operate. For many SaaS systems, it is the difference between "the database is too big" and "this table needs lifecycle design."

Sharding: distribute ownership of data

Sharding distributes data across multiple database nodes or clusters. Each shard owns a subset of the data. That subset might be based on tenant ID, customer ID, region, account, hash range, geography, or another shard key.

The key difference from partitioning is that shards are separate database units. A query may need to know which shard to contact. A transaction may become harder if it crosses shards. Schema changes, backups, migrations, data repair, and observability must be coordinated across multiple databases.

Sharding is powerful because it can scale writes and storage horizontally. Instead of one primary taking all writes, writes are distributed across shard primaries. A large tenant can be isolated. Regional shards can bring data closer to users or regulatory boundaries. Hardware and managed database limits become less absolute because capacity can be added in units.

The cost is complexity. The shard key becomes one of the most important architectural decisions in the system. A good shard key spreads load evenly, aligns with common queries, limits cross-shard transactions, and supports future growth. A poor shard key creates hot shards, expensive fan-out queries, awkward rebalancing, and application logic that becomes hard to reason about.

For example, tenant ID is often attractive in B2B SaaS because most application queries are tenant-scoped. But if one tenant is much larger than the rest, that shard can become overloaded. Geographic sharding can support data residency and latency goals, but global users, shared reporting, and cross-region workflows become harder. Hash-based sharding spreads load more evenly, but range queries and operational debugging can become less intuitive.

Some systems provide built-in sharding or distributed SQL behaviour. MongoDB's sharding documentation describes sharding as a way to distribute data across multiple machines for very large datasets and high-throughput applications. Distributed SQL databases take a different approach by distributing and replicating SQL data beneath a familiar SQL interface. Those platforms can reduce the amount of custom sharding logic, but they still require teams to understand locality, transaction costs, indexes, failover, latency, and operational limits.

In a traditional relational stack, manual sharding is a major architectural step. It should be driven by evidence, not anxiety.

A practical comparison

Read replicas are usually the easiest to introduce. They are a strong fit when the primary is overloaded by reads, dashboards, exports, search pages, or background jobs. They require query routing and replica lag handling, but they do not force a new data ownership model.

Partitioning is a design choice inside the database. It is a strong fit when individual tables have become too large to query, maintain, archive, or delete from predictably. It needs a partition key that matches real access patterns. It is not a substitute for read scaling if the same hot partition still receives all the traffic.

Sharding is a distribution strategy. It is a strong fit when a single database node or cluster cannot handle write volume, storage size, tenant isolation, or regional requirements. It changes how the application thinks about data. It can be the right answer, but it is rarely the first answer.

The order matters. Many teams should first tune queries and indexes, improve connection management, remove accidental N+1 query patterns, introduce caching where correctness permits, and separate analytical workloads from transactional ones. Then read replicas and partitioning can extend the current architecture. Sharding comes when the system has outgrown the single-writer model or when isolation requirements make shared storage the wrong shape.

How these patterns combine

These patterns are not mutually exclusive. Mature systems often use all three.

A sharded system can have read replicas per shard. A partitioned table can live on each shard. A primary database can use partitioning for large event tables and read replicas for reporting. The important point is to know which layer is solving which problem.

Consider a logistics platform with high write volume from vehicle events, operational dashboards for dispatch teams, and long retention requirements for compliance. Time-based partitioning may help manage the event table. Read replicas may protect the write primary from dashboards and exports. Sharding by region or customer group may eventually be needed if write traffic or data residency requirements exceed what one cluster can handle.

That evolution should be planned, not improvised under outage pressure. Westpoint's cloud architecture and development work often centres on this kind of scaling decision: designing platforms that can grow under real demand while keeping operational ownership clear.

The hidden cost is operational surface area

The technical diagrams tend to make scaling patterns look cleaner than they are. The actual cost appears in operations.

Read replicas require monitoring for lag, query saturation, storage drift, and failover readiness. Teams need clear rules for which traffic can use replicas. Schema migrations must account for replication delay and long-running queries.

Partitioning requires partition lifecycle management. New partitions need to exist before data arrives. Old partitions need retention policies. Indexes must be designed per partitioning strategy. Query plans must be checked to confirm pruning is happening. Application queries may need to include the partition key consistently.

Sharding requires routing, rebalancing, shard-aware migrations, cross-shard reporting, backup coordination, incident response per shard, tenant movement, data repair, and sometimes distributed transaction design. It also changes how engineers debug production issues. "Look in the database" becomes "find the correct shard, then inspect the right database, with the right tenant context, at the right point in replication."

This is also where cost discipline matters. Scaling out can hide waste. More replicas, larger instances, extra storage, cross-region replication, and over-retained partitions all show up on the cloud bill. Westpoint's article on why cloud costs keep growing is relevant here: cost growth is often systemic, caused by architecture, ownership, and operating habits rather than one bad invoice line.

Decision framework

Start with workload evidence. Look at slow queries, wait events, CPU, I/O, memory, lock contention, connection pressure, replication lag, table growth, index size, checkpoint behaviour, and business traffic patterns. Scaling decisions should be based on measured pressure.

Choose read replicas when the workload is read-heavy and stale reads are acceptable for some paths. Make routing explicit. Decide which reads must hit the primary. Monitor lag as a user-facing reliability signal, not just an infrastructure metric.

Choose partitioning when table size and lifecycle management are the pain. Use a key that appears naturally in queries and retention rules. For event-like data, time is often the best starting point. For tenant-heavy systems, tenant-aware partitioning can help, but the distribution of tenant size matters.

Choose sharding when writes, storage, isolation, or locality have exceeded the practical limits of a single database architecture. Spend serious time on the shard key. Model cross-shard workflows before building. Decide how reporting, billing, search, support tooling, backups, and migrations will work.

Do not treat sharding as a way to avoid data modelling. It makes data modelling more important. Every accidental cross-shard join becomes an application, platform, or analytics problem.

A common migration path

A sensible relational scaling journey often looks like this:

  • Tune the current database: indexes, query plans, connection pools, schema design, caching, and workload separation.
  • Add read replicas for read-heavy paths that can tolerate lag.
  • Partition the largest lifecycle-heavy tables.
  • Move analytical workloads to a separate store or pipeline where needed.
  • Introduce sharding only when single-primary limits, tenant isolation, data residency, or write throughput require it.
  • Revisit platform operations: deployment pipelines, migrations, observability, incident response, cost ownership, and security controls.

The surrounding engineering system matters as much as the database topology. CI/CD pipelines, migration safety, rollback plans, and ownership boundaries all become more important as the data layer becomes more distributed. That is one reason database scaling work often intersects with platform engineering and delivery practices, including the pipeline concerns discussed in Why your CI/CD pipeline wasn't built for microservices.

What leaders should ask before approving a scaling change

A database scaling proposal should answer practical questions.

What metric proves the current bottleneck? Which user journeys are affected? How much headroom does the proposed change create? What new failure modes does it introduce? Who owns those failure modes? What happens during migration? How are backups and restores tested? How does the design affect cloud spend? Can the team operate it during a partial outage?

For read replicas, ask how replica lag affects correctness. For partitioning, ask whether real queries include the partition key. For sharding, ask what breaks when a workflow needs data from two shards.

The best scaling decision is not always the most technically impressive one. It is the one that relieves the actual constraint while preserving delivery speed, data correctness, and operational clarity.

Conclusion

Read replicas, partitioning, and sharding are three different answers to three different scaling pressures.

Read replicas scale read-heavy workloads with limited application disruption, but they do not solve write bottlenecks and they introduce consistency lag. Partitioning makes large relational tables easier to query and manage, but it only pays off when the partition key matches the workload. Sharding distributes data and write load across multiple database units, but it turns the data layer into a distributed system with all the operational responsibility that implies.

Most teams should earn their way toward sharding. Use evidence first. Reduce avoidable load. Add replicas where read pressure is real. Partition tables whose size has become an operational problem. Shard only when the business, workload, or compliance model genuinely demands horizontal data ownership.

That discipline keeps relational databases useful for longer, makes cloud costs easier to explain, and gives engineering teams a scaling path they can operate with confidence.

Frequently asked questions

Partitioning usually splits a large table into smaller physical pieces inside the same database system. Sharding distributes data across separate database nodes or clusters, usually requiring routing logic and more operational coordination.

Read replicas mainly help read-heavy workloads. Writes still go to the primary database, then replicate outward. They can reduce pressure from reporting, dashboards, and read APIs, but they do not remove the single write-primary limit.

Sharding is usually appropriate when write volume, storage size, tenant isolation, regional requirements, or single-node limits make one database cluster insufficient. It should follow measured evidence because it adds significant complexity.

Partitioning can improve performance when queries include the partition key and the database can prune irrelevant partitions. If queries do not align with the partitioning strategy, partitioning may add overhead without improving response time.

Yes. A production system might use partitioned tables, read replicas for reporting, and eventually sharding for tenant or regional scale. The important point is to know which pattern solves which pressure.

CASE STUDIES

Unified enterprise IAM and zero-downtime migration