-
Type: Task
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
In LSM trees we don't currently take into account duplicate items when sizing bloom filters that are being created during an LSM merge. There are algorithms available that can help estimate the number of duplicate items - it's worth investigating.
See:
http://www.datastax.com/dev/blog/improving-compaction-in-cassandra-with-cardinality-estimation
- related to
-
WT-1157 LSM merges in update-heavy workloads
- Closed