-
Type: Improvement
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
libyajl's unmaintained and overly complex for our purposes – there are a ton of JSON parsing bugs open in libbson that are very hard to fix with libyajl. We currently vendor libyajl and we've patched it to make it perform better when parsing a stream of documents, but we can't use that patch on Debian, due to Debian's policy against vendoring third-party libraries that are available as Debian packages. This puts us in a Catch-22: we can't fix upstream since upstream isn't maintained, but we can't use our patch since Debian prohibits vendoring libyajl there.
We'll vendor a different JSON-parsing library that is not already packaged for Debian, that presents a simpler interface better suited to our needs, and which already performs well when parsing a stream of documents. I'd thought JSMN was good but it's not actually incremental – not suitable for parsing a very large stream of one or more JSON objects. We'll use jsonsl. It's MIT licensed, actually designed for incremental parsing with callbacks, seems complex but good, not in Debian (which is good news).
Local benchmarks show between 1% and 15% gain on the JSON portions of the Driver Benchmark Suite with the first draft of a JSONSL port.
Status:
- Done: libbson ported to jsonsl, passes all tests, plus a bit of additional testing for areas that seemed risky to me as I did the port
- Done: test with bad escape sequences, I think jsonsl_util_unescape's "len" might be invalid if "err" is set.
- Done: escape sequences in keys, test and implement.
- Done: functions like static int _bson_json_read_null (void *_ctx) can be updated with void returns and take a bson_json_reader_t, const char pointer, and ssize_t, so I can delete a bunch of casts
- Done: were top-level arrays allowed with previous implementation?
- Done: large amount of cleanup and additional error checks with tests for those missing checks
- Done: perf improvement by setting call_UESCAPE and handling escapes inline, instead of on whole keys and string values?
- Done: the naked malloc and free in jsonsl need to be patched to use the BSON macros
- is depended on by
-
CDRIVER-994 "[{$numberLong: '1'}]" is BSON-encoded with array key name "1"
- Closed
-
CDRIVER-1942 JSON can be parsed despite stray "}" characters
- Closed
-
CDRIVER-1472 NaN handling difference between C and C++ driver
- Closed
-
CDRIVER-1913 Parse "$code" extended JSON
- Closed
- is related to
-
CDRIVER-597 Support building without requiring vendored YAJL
- Closed