Uploaded image for project: 'Python Driver'
  1. Python Driver
  2. PYTHON-1916

C message module write_dict encodes and decodes RawBSONDocument

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.9
    • Affects Version/s: None
    • Component/s: None
    • None

      When the C message module encodes a RawBSONDocument it inflates the top-level keys of the raw document (decoding them to Python objects) and then encodes them back into BSON.

      This is a bug because not all BSON values can be round-tripped in python, for example UUIDs may be inadvertently changed:

      >>> from bson.binary import Binary
      >>> from uuid import uuid4
      >>> from bson import BSON
      >>> from bson.raw_bson import RawBSONDocument,  DEFAULT_RAW_BSON_OPTIONS
      >>> coll = client.t.t
      >>> doc = {'_id': 1, 'u': Binary(uuid4().bytes, 4)}
      >>> raw = RawBSONDocument(BSON.encode(doc))
      >>> coll.insert_one(raw)
      <pymongo.results.InsertOneResult object at 0x103d7e948>
      >>> raw_coll = coll.with_options(codec_options=DEFAULT_RAW_BSON_OPTIONS)
      >>> raw2 = raw_coll.find_one()
      >>> raw.raw
      b'&\x00\x00\x00\x10_id\x00\x01\x00\x00\x00\x05u\x00\x10\x00\x00\x00\x04\xdfM\xd3,r\x19H\xb8\x87h\x17\x81\xd2q\xcaK\x00'
      >>> raw2.raw
      b'&\x00\x00\x00\x10_id\x00\x01\x00\x00\x00\x05u\x00\x10\x00\x00\x00\x03\xdfM\xd3,r\x19H\xb8\x87h\x17\x81\xd2q\xcaK\x00'
      >>> raw == raw2
      False
      

      This is also a performance issue because decoding and encoding a RawBSONDocument is unnecessary.

      This fix is simply to make the write_dict method check for RawBSONDocument.

            Assignee:
            shane.harvey@mongodb.com Shane Harvey
            Reporter:
            shane.harvey@mongodb.com Shane Harvey
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: