Uploaded image for project: 'Go Driver'
  1. Go Driver
  2. GODRIVER-3472

Support unmarshaling Vector binary values directly into []int8 and []float32

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: BSON
    • None
    • None
    • Go Drivers
    • None
    • None
    • None
    • None
    • None
    • None

      Context

      Currently (assuming this PR is merged as-is) users must decode Vector data (BSON Binary subtype 9) using the bson.Vector type. If they then want to manipulate the numerical data, they must perform an extra step to extract/decode it. We generally assume that users know what type of Vector data is stored, in which case the extra step is unnecessary. For the "int8" and "float32" type vectors, we should give users a shortcut to decode the numerical data directly to a []int8 or []float32.

      For example, the following should work:

      var coll *mongo.Collection
      d := bson.D{{
      	"vec", bson.NewVector([]int8{0, 1, 2, 3}),
      }}
      coll.InsertOne(..., v)
      
      var res struct {
      	Vec []int8
      }
      coll.FindOne(...).Decode(&res)
      

      For PACKED_BIT type, there's the bit array value and the padding value, so the Vector type is still the best type to decode into, despite the additional step required to get to the data. It's not clear which Vector types will be most widely used.

      Definition of Done

      • Users must be able to decode int8 Vector data into a struct field that is type []int8.
      • Users must be able to decode float32 Vector data into a struct field that is type []float32.

      Pitfalls

      • Our understanding of how customers will use Vector data in Go applications is limited. This suggested improvement may seem more or less useful as our understanding increases.
      • We currently assume that there are equal or more use cases for "int8" and "float32" vector data compared to PACKED_BIT data. However, it's possible that we're wrong and PACKED_BIT will get way more use, making this suggested improvement much less useful.
      • The Vector data format may change in a way that makes []int8 and []float32 unable to represent the Vector without losing data.

            Assignee:
            Unassigned Unassigned
            Reporter:
            matt.dale@mongodb.com Matt Dale
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              None
              None
              None
              None