-
Type: Improvement
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Index Maintenance
-
None
Whenever you access an index node you do it from a parent or sibling. If the value for this index node is the same [as it's sibling or parent] then storing the value, especially for full text indexes a value is redundant and a huge waste of space - we could store a single byte that means 'the same as the previous one' and dramatically reduce index sizes with all the performance benefits that brings.
It would also be good to have the initial node - which has the value contain a pointer to the next 'different' node if we don't do that already - all this is key in competing with a column store on counting and summarising ad-hoc. I'm sure there is a JIRA for counting indexes that plays into this too however this is about performance and index compression. This is essentially a way to skip through the index the way we currently can skip a sub object when parsing BSON.