-
Type: Task
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: BSON
-
None
-
Environment:python2.7
one table in mongo db has 800W records,
i use
result = collection.find()
for record in result:
print record
when it run more than 50 minutes, it will occur error as below
tart query data
2018-12-04 23:28:19,875 - logger.py: 37 - ERROR - Exception when loading data: SON([(u'uid', u'336543dfafd443d0872b48cda0e13333'), (u'platformCode', u'xxxx'), (u'loginPlatformCode', u'xxx'), (u'createTime', u'2018-10-15 10:58:25'), (u'eventName', u'\u70b9\u51fb\u753b\u7b14\u56fe\u6807'), (u'simpleName', xxx'), (u'sn', u'xxxx'), (u'courseNum', u'aaaa'), (u'_id', \{'$oid': '5bc4028d10d57c78d9658455'}), (u'type', u'1'), (u'packageName', u'xxxx')])
'utf8' codec can't decode byte 0xce in position 29: invalid continuation byte
Traceback (most recent call last):
File "/schedule.wordir/etl/data_loader.py", line 45, in run
for record in input.extract():
File "/schedule.wordir/etl/input/data_input_mongo_full.py", line 56, in extract
for record in result:
File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 1169, in next
if len(self.__data) or self._refresh():
File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 1106, in _refresh
self.__send_message(g)
File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 971, in __send_message
codec_options=self.__codec_options)
File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 1055, in _unpack_response
return response.unpack_response(cursor_id, codec_options)
File "/usr/local/lib/python2.7/dist-packages/pymongo/message.py", line 945, in unpack_response
return bson.decode_all(self.documents, codec_options)
InvalidBSON: 'utf8' codec can't decode byte 0xce in position 29: invalid continuation byte
but i just find out the special record (u'_id', {'$oid': '5bc4028d10d57c78d9658455'}) ,do as same code , it is ok...
i find one same question on stackoverflow: