-
Type: New Feature
-
Resolution: Done
-
Priority: Minor - P4
-
Affects Version/s: 1.7
-
Component/s: None
-
None
Add support for lazy and raw subclasses of BsonDocument and BsonArray:
BsonValue BsonDocument : BsonValue LazyRawBsonDocument : BsonDocument RawBsonDocument : BsonDocument BsonArray : BsonValue LazyBsonArray : BsonArray RawBsonArray : BsonArray
In all cases when a lazy/raw document or array is deserialized the raw bytes will saved without any further deserialization at that time. If the lazy/raw document is later serialized then the raw bytes can be written back out without any need to reserialize them. This can be a huge performance win when copying documents from one place to another.
Since both LazyBsonDocument and RawBsonDocument derive from BsonDocument you will be able to use them anywhere a BsonDocument can be used. In particular, you will be able to access the elements of a lazy/raw document.
The two classes will differ in how they handle accessing the elements (and therefore in their performance characteristics).
A LazyBsonDocument will immediately deserialize one level deep as soon as you access the document contents in any way. Any embedded documents (and arrays) will be lazy themselves and will not be deserialized unless (and until) you attempt to access their contents. Once a level has been deserialized it will essentially become a normal BsonDocument and there will be essentially no performance difference. If you access every part of the document the end result will be that the whole document has been deserialized, just that it has been done in a lazy fashion one level at a time as you accessed different parts of the document. If you are going to access the entire document you might as well use a regular BsonDocument and deserialize the whole thing up front, but if you only need to access some parts of the document a LazyBsonDocument could be a big performance win.
A RawBsonDocument always keeps the raw bytes representation of the document. You can still access any part of the document, but unlike a LazyBsonDocument, this will not trigger any permanent deserialization. Only the one element you access will be deserialized. If you access that element again in the future, it will have to be deserialized again. This representation is beneficial when you only need to access very few fields before sending the document on somewhere else, so by not triggering a permanent deserialization the document doesn't have to ever be reserialized again. Note that a RawBsonDocument is immutable, so it can only be used when you want to send the document on unmodified.
Here's some sample code using a LazyBsonDocument:
// sample code using LazyBsonDocument var source = database.GetCollection<LazyBsonDocument>("source"); var destination = database.GetCollection<LazyBsonDocument>("destination"); foreach (var document in source.FindAll()) { document["timestamp"] = DateTime.UtcNow; // triggers one level of deserialization (note that document is mutable) destination.Insert(document); // only the top level needs to be reserialized }
and some sample code using a RawBsonDocument:
// sample code using RawBsonDocument with output to a file var source = database.GetCollection<RawBsonDocument>("source"); var destination = File.Create("destination.bson"); // destination could be a socket instead foreach (var document in source.FindAll()) { // note that this code accesses the "export" and "_id" elements of the RawBsonDocument if (document["export"].ToBoolean()) { destination.Write(document.Bytes, 0, document.Bytes.Length); // no reserialization required source.Update(Query.EQ("_id", document["_id]), Update.Set("export", false)); // clear export flag } }