Uploaded image for project: 'Python Driver'
  1. Python Driver
  2. PYTHON-3684

Allow encoding subclasses of built in types, like pandas.NaT

    • Type: Icon: New Feature New Feature
    • Resolution: Unresolved
    • Priority: Icon: Unknown Unknown
    • 4.12
    • Affects Version/s: None
    • Component/s: None
    • None

      PyMongo doesn't support encoding pandas.NaT objects:

      Encoding a pandas.NaT object results in an error:
      
      bindings/python/test/test_pandas.py:114: in round_trip
          res = write(self.coll, data)
      bindings/python/pymongoarrow/api.py:394: in write
          enc_tab = RawBSONDocument(encode(next(tabular_gen), codec_options=codec_options))
      ../../work/pycharm/mongo-arrow/lib/python3.10/site-packages/bson/__init__.py:1021: in encode
          return _dict_to_bson(document, check_keys, codec_options)
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
       
      >   ???
      E   ValueError: NaTType does not support utcoffset
      

      Ideally we could add a type encoder to encode this custom type but it fails because NaT is a subclass of datetime:

      bindings/python/test/test_arrow.py:205: in round_trip
          res = write(self.coll, data)
      bindings/python/pymongoarrow/api.py:394: in write
          type_registry = TypeRegistry([_PandasNAEncoder(), _PandasNaTEncoder()], fallback_encoder=_fallback_encoder)
      ../../work/pycharm/mongo-arrow/lib/python3.10/site-packages/bson/codec_options.py:162: in __init__
          self._validate_type_encoder(codec)
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
      
      self = TypeRegistry(type_codecs=[<pymongoarrow.api._PandasNAEncoder object at 0x1354fbc40>, <pymongoarrow.api._PandasNaTEncoder object at 0x1354fa770>], fallback_encoder=<function _fallback_encoder at 0x134d70a60>)
      codec = <pymongoarrow.api._PandasNaTEncoder object at 0x1354fa770>
      
          def _validate_type_encoder(self, codec: _Codec) -> None:
              from bson import _BUILT_IN_TYPES
          
              for pytype in _BUILT_IN_TYPES:
                  if issubclass(cast(TypeCodec, codec).python_type, pytype):
                      err_msg = (
                          "TypeEncoders cannot change how built-in types are "
                          "encoded (encoder %s transforms type %s)" % (codec, pytype)
                      )
      >               raise TypeError(err_msg)
      E               TypeError: TypeEncoders cannot change how built-in types are encoded (encoder <pymongoarrow.api._PandasNaTEncoder object at 0x1354fa770> transforms type <class 'datetime.datetime'>)
      

      We need to add a PyMongo feature to efficiently support this use case. Perhaps we can loosen the restriction to allow subclasses on built in types but not exact types.

            Assignee:
            Unassigned Unassigned
            Reporter:
            shane.harvey@mongodb.com Shane Harvey
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: