In _cbsonmodule.c, the _type_marker function uses PyObject_HasAttrString(object, "_type_marker") and PyObject_GetAttrString(object, "_type_marker"). In my workloads (highly nested documents with many large array fields), these functions become severe bottlenecks to performance, because they each create new Python string objects by calling PyUnicode_FromString("_type_marker") every time they run.
A simple change that helped substantially (~60% faster) was creating a global TYPEMARKERSTR object, defining it once in PyInit__cbson as PyUnicode_FromString("_type_marker"), and replacing PyObject_Has/GetAttrString(object, "_type_marker") with PyObject_Has/GetAttr(object, TYPEMARKERSTR). One caveat is that this leaks the TYPEMARKERSTR object in the case that the cbson module is unloaded.
Also, correct me if I'm wrong, but I believe these lines are redundant because the function returns type at the end regardless.
- causes
-
PYTHON-3798 add error checking and visit for _type_marker_str
- Closed
-
PYTHON-3728 Coverity issue with convert_codec_options
- Closed
- is related to
-
PYTHON-3729 Speed up C BSON encoding by using PyObject_GetAttr instead of PyObject_GetAttrString
- Closed
-
PYTHON-3718 Faster INT2STRING
- Closed
-
PYTHON-3797 Cache commonly used strings in C extensions
- Closed
- related to
-
PYTHON-3819 Optimize BSON encoding/decoding performance
- Released