Uploaded image for project: 'Go Driver'
  1. Go Driver
  2. GODRIVER-2311

Byte array reuse in BSON unmarshalling may cause duplicated values

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Blocker - P1 Blocker - P1
    • 1.9.0, 1.8.5, 1.7.6
    • Affects Version/s: 1.0.4
    • Component/s: None
    • None
    • Needed

      Updated:
      Confirmed this is a bug; conditions that will trigger this bug are:

      1. Load a BSON document into a []byte (e.g. from a file, a server response, etc.)
      2. Unmarshal the BSON document into any type that contains a []byte field, like a user-defined struct or bson.D.
      3. Modify the bytes in the input []byte.
      4. Observe that the contents of the []byte field in the unmarshaled value changed.

      Check out a repro example here (note that the example doesn't repro the problem on the Go Playground as of the 1.8.5/1.9.0 releases, which fix the bug): https://go.dev/play/p/-BjGJ9OjAVB

      Note that this only applies to unmarshaling into byte slice values, not byte array values. For example, values unmarshaled to a struct containing a [16]byte field are not affected. However, the same BSON document unmarshaled to a bson.D will infer the value type is a []byte and will be affected.

      Original:
      Some users of a mongopush fork are having issues with duplication of UUIDs in unmarshalled values. Specifically, when reading an oplog written to a file here, some UUID fields in the unmarshalled values can be duplicated.

      The duplication bug is fixed by this commit which makes a copy of the input file byte buffer. That fix suggesting the root cause of the issue may be some input byte array reuse in the returned value (i.e. the BSON Unmarshal function returns an unmarshalled value with byte slices that point to sections of the same byte array as the input data). That can lead to unexpected value duplication or corruption if the input byte array is modified after unmarshalling a value (modifying the input byte slice/array after Unmarshal returns is a valid use case).

      Try to detect the possible issue using the following process:

      1. Create a set of input structs containing different value types, including byte slice types (e.g. []byte, uuid.UUID, etc).
      2. Marshal each input struct value as BSON.
      3. Record the low and high addresses of the output byte slice and underlying array.
      4. Unmarshal the marshalled bytes into a bson.D.
      5. Record the low and high addresses of the byte slice-type values in the unmarshalled bson.D.
      6. Check if any of the byte slice-type value addresses are in the memory address range of the input byte slice/array.

      E.g. getting underlying array addresses of a slice:

      s := []byte{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
      
      sh := (*reflect.SliceHeader)(unsafe.Pointer(&s))
      
      fmt.Println("Low address", sh.Data)
      fmt.Println("High address", sh.Data+uintptr(sh.Cap-1)
      

            Assignee:
            matt.dale@mongodb.com Matt Dale
            Reporter:
            matt.dale@mongodb.com Matt Dale
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: