Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Won't Do
Priority: Unknown
Fix Version/s: None
Component/s: Performance
Labels:
None

Driver Changes:
Needed

Summary

99.99% of customers lean towards hash based sharding to maintain balanced shards. hash based sharding is slow and hurts the cpu for the shard nodes. this affects the overall performance and impacts any poc's we run in the field.

I believe the drivers are constantly polling the cluster and know exactly how many shards are available. If the drivers know how many shards there are, can we just have the drivers produce a random int on the number of shards and use that as the way to distribute the data evenly across the shards without the need for hashing the shardkey?

There are obviously edge cases where a customer may go from x to x+1 shards and we end up with an imbalance of data. Perhaps we can add a weighting factor to the randomint where the x+1 value occurs more than x.

Assignee:: Unassigned
Reporter:: Eugene Kang
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Jun 12 2023 06:46:56 PM UTC
Updated:: Jun 13 2023 01:51:23 AM UTC
Resolved:: Jun 13 2023 01:51:23 AM UTC

Details

Description

Summary

Attachments

Activity

People

Dates