Loading…

Rebalancing (Data Rebalancing)

Rebalancing is redistributing data across nodes or partitions to restore even load after cluster changes or skew.

Definition

Rebalancing is the process of redistributing data across nodes or partitions in a distributed system to restore even load and resource utilization.

How it works

When a cluster changes—a node joins, a node dies, or data skew develops—some nodes end up holding disproportionately more data or handling more requests than others. Rebalancing moves data from overloaded nodes to underloaded ones until the distribution is roughly even again.

There are several strategies. Fixed partition count assigns a set number of partitions at creation time (e.g., Kafka topics with 64 partitions). When a node joins, it takes ownership of some partitions from existing nodes. The data moves, but the partition boundaries stay fixed. Dynamic partitioning splits hot partitions when they exceed a size threshold and merges cold ones. HBase and MongoDB use this approach. Consistent hashing with virtual nodes (used by Cassandra and DynamoDB) assigns multiple token ranges per physical node. Adding a node means reassigning a subset of tokens, moving roughly 1/N of the data rather than reshuffling everything.

During rebalancing, the system must continue serving reads and writes. Most systems handle this by keeping the old assignment active until the new node has a full copy, then atomically switching ownership. Kafka, for example, uses an ISR (in-sync replica) list—a new replica catches up from the leader before becoming eligible to serve reads.

When to use it

After scaling a Cassandra cluster from 6 to 9 nodes: data needs to redistribute so the new nodes carry their share.
When a Kafka consumer group member crashes: partitions previously assigned to it must be reassigned to surviving consumers.
When a hot key in DynamoDB causes a single partition to throttle: adaptive capacity splits that partition automatically.
After a PostgreSQL Citus cluster adds a worker node: rebalance_table_shards() redistributes shards across the new topology.

Trade-offs

Gains: Even load distribution, better throughput, improved fault tolerance after node failures, and the ability to scale horizontally.

Costs: Rebalancing generates significant network and disk I/O. During the move, both source and destination nodes carry extra load. If done too aggressively, it can degrade latency for user-facing queries. Kafka partition reassignment, for instance, can saturate inter-broker bandwidth if you move too many partitions at once—the throttle flag on kafka-reassign-partitions.sh exists for this reason.

Example

Triggering a Kafka partition reassignment with throttling:

kafka-reassign-partitions.sh \
  --bootstrap-server broker1:9092 \
  --reassignment-json-file reassign.json \
  --execute \
  --throttle 50000000  # 50 MB/s limit

Rebalancing is closely tied to hash partitioning and range partitioning, since the partitioning strategy determines how data moves during a rebalance. It also relates to geo-replication, where cross-region data placement adds another dimension to rebalancing decisions.

Related & Recent Blogs

Tooling and DXcursor-claude-personas: Give Your AI Coding Assistant a Domain Expert Brain in 30 Seconds9 min read BackendFive Caching Strategies Every Backend Dev Should Know9 min read Web DevelopmentChrome 144 Introduces the <geolocation> Element: Declarative Location Access for Modern Web Apps5 min read System DesignHow Files Are Stored, Deleted, and Copied Inside Your Computer6 min read Computer ScienceThe Mechanics of Compression: How 100GB Becomes 25GB12 min read Web DevelopmentBuilding This Blog: A Modern Next.js Blog with Markdown5 min read

Share this definition

Technical Terms

Rebalancing (Data Rebalancing)

Rebalancing is redistributing data across nodes or partitions to restore even load after cluster changes or skew.

Definition

Rebalancing is the process of redistributing data across nodes or partitions in a distributed system to restore even load and resource utilization.

How it works

When to use it

After scaling a Cassandra cluster from 6 to 9 nodes: data needs to redistribute so the new nodes carry their share.
When a Kafka consumer group member crashes: partitions previously assigned to it must be reassigned to surviving consumers.
When a hot key in DynamoDB causes a single partition to throttle: adaptive capacity splits that partition automatically.
After a PostgreSQL Citus cluster adds a worker node: rebalance_table_shards() redistributes shards across the new topology.

Trade-offs

Gains: Even load distribution, better throughput, improved fault tolerance after node failures, and the ability to scale horizontally.

Example

Triggering a Kafka partition reassignment with throttling:

kafka-reassign-partitions.sh \
  --bootstrap-server broker1:9092 \
  --reassignment-json-file reassign.json \
  --execute \
  --throttle 50000000  # 50 MB/s limit

Definition

How it works

When to use it

Trade-offs

Example

Related terms

Related & Recent Blogs

Share this definition

Definition

How it works

When to use it

Trade-offs

Example

Related terms

Related & Recent Blogs

Share this definition