-
Type: Task
-
Resolution: Won't Do
-
Priority: Unknown
-
None
-
Component/s: Performance
-
None
-
Needed
Summary
99.99% of customers lean towards hash based sharding to maintain balanced shards. hash based sharding is slow and hurts the cpu for the shard nodes. this affects the overall performance and impacts any poc's we run in the field.
I believe the drivers are constantly polling the cluster and know exactly how many shards are available. If the drivers know how many shards there are, can we just have the drivers produce a random int on the number of shards and use that as the way to distribute the data evenly across the shards without the need for hashing the shardkey?
There are obviously edge cases where a customer may go from x to x+1 shards and we end up with an imbalance of data. Perhaps we can add a weighting factor to the randomint where the x+1 value occurs more than x.