-
Type: New Feature
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 2.2.1
-
Component/s: BSON
-
None
Prologue
Since I am working on someone else's dime, I need a
justification, why I am spending so much time on something completely different
Hence I must be able to at least say "I tried".
So a simple yes or no is sufficient. You do not have to explain
your decision. Both are just fine with me.
If you say yes – wonderful. If you say no, I have leverage that
my employer might actually buy some support to get this into the
driver (and then some more to take the administrative burden off my
shoulders). That will probably be cheaper than me maintaining a fork of
the pymongo driver As I said, break as much as possible in each new
release, so that I become more expensive than official support!
However, our application still has the need for fast on-the-fly
conversion, which I cannot justify by just saying
the driver does not support it. So here goes:
Judging by your comments, you seem to dislike the names of the callbacks:
This module provides two methods: `object_hook` and `default`. These
names are pretty terrible, but match the names used in Python's `json
library ...
But do you dislike the concept? Because here it comes again (this time
borrowed from pickle, copy and MarkupSafe, the names are
inspired by emacs(1)).
Risks
None. All features have to be explicitely enabled. But on the other
hand – features are always dangerous and terrible things can happen :9.
Using the __bson__ object state hook does not present a problem,
since no other application should be using it.
Feature
When an element cannot be encoded – given that all features are
enabled – the following object methods are tried in order before an
exception is raised:
- __bson__(self)
- __getstate__(self)
__bson__ may deliver any type of data, __getstate__ is
required to deliver a dict. If that fails, an attempt is made to
retrieve the object's __dict__ attribute for encoding.
Each feature is turned on and off independently with the following
functions:
- enable_bson_hook([True|False])
- enable_getstate_hook([True|False])
- enable_dict_hook([True|False])
The activation status is checked with the following functions:
- is_bson_hook_enabled()
- is_getstate_hook_enabled()
- is_dict_hook_enabled()
The feature is available at
https://github.com/wolfmanx/mongo-python-driver/commit/f61a96805c37e16d5544dd79ec126c1d98ea9550
Benefits
For the driver: None.
For an application:
This feature eliminates the need to copy and hold data. Given that
mongodb already competes with the application for memory, this is a
good thing.
Depending on the structure of the data and the availability of the C
extension, the time savings are significant.
Here is a rough estimate comparison of the best officially supported
method and the __getstate__ feature for encoding 100 mixed object
type documents with 1000 fields each:
Python 2 - C extension
:INF: waste_some_space_and_time_converting_data_with_BSON_encode()
total_time (100/1000) : 0:00:00.524321 factor: 44.63:INF: dont_waste_time_with_getstate_hook_feature()
total_time (100/1000) : 0:00:00.015852 factor: 1.35
Python 3 - C extension
:INF: waste_some_space_and_time_converting_data_with_BSON_encode()
total_time (100/1000) : 0:00:00.612010 factor: 39.14:INF: dont_waste_time_with_getstate_hook_feature()
total_time (100/1000) : 0:00:00.020740 factor: 1.33
Python 2 - pure Python
:INF: waste_some_space_and_time_converting_data_with_json_bson_default()
total_time (100/1000) : 0:00:01.231634 factor: 4.57:INF: dont_waste_time_with_getstate_hook_feature()
total_time (100/1000) : 0:00:00.486110 factor: 1.8
Python 3 - pure Python
:INF: waste_some_space_and_time_converting_data_with_json_bson_default()
total_time (100/1000) : 0:00:00.983371 factor: 5.36:INF: dont_waste_time_with_getstate_hook_feature()
total_time (100/1000) : 0:00:00.349842 factor: 1.91
Note: The reference time for each separate test run is different.
I.e., the times and factors are not comparable between different
setups.
Disadvantages
This is not the best solution. A default/object_pair_hook
callback pair (like implemented in the json module) would be ideal,
but I don't have the time to implement the necessary API, passing it
down through the entire pymongo DB layer.
With a thread local configuration this solution would be a very good
runner-up.
Caveats
Without a thread local configuration, enabling the __getstate__
and __dict__ features without proper locking can lead to
unexpected results in concurrent threads encoding BSON data.
A documented global module lock would already go a long way.
Epilogue
Please, forgive me, I'm just an old dog, eager to learn a new trick,
but still hanging on to his old tricks, too.
- depends on
-
PYTHON-1750 Support codec callbacks for simple types
- Closed