-
Type: Task
-
Resolution: Unresolved
-
Priority: Unknown
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
While working on PYTHON-2673, we ran into what we believe is a bug in CPython's garbage collector. To workaround the problem we needed to add the Pool.__pinned_sockets property which holds references to each SocketInfo checked out of the pool.
Without the Pool.__pinned_sockets workaround, the bug triggered when:
- the client is created with loadBalanced=True
- a find or aggregate cursor fails on the initial attempt
- the retry attempt succeeds
- then soon after the cursor is garbage collected without being iterated or closed explicitly
Here's some logs from a repro (https://github.com/ShaneHarvey/mongo-python-driver/blob/2054e80c0cd815171b345241d2dd2ba3642ce604/test/load_balancer/test_load_balancer.py#L78):
Running Find 2021-06-10 18:00:18,617 INFO event_loggers [pool ('127.0.0.1', 8000)] connection check out started 2021-06-10 18:00:18,618 INFO event_loggers [pool ('127.0.0.1', 8000)][conn #1] connection created 2021-06-10 18:00:18,645 INFO event_loggers [pool ('127.0.0.1', 8000)][conn #1] connection setup succeeded 2021-06-10 18:00:18,645 INFO event_loggers [pool ('127.0.0.1', 8000)][conn #1] connection checked out of pool get_socket checkout: SocketInfo(<socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 61222), raddr=('127.0.0.1', 8000)>) at 0x7f8e156389a0 _close_socket: SocketInfo(<socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 61222), raddr=('127.0.0.1', 8000)>) at 0x7f8e156389a0 get_socket checkout error: SocketInfo(<socket.socket [closed] fd=-1, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6>) CLOSED at 0x7f8e156389a0: 127.0.0.1:8000: connection closed 2021-06-10 18:00:18,648 INFO event_loggers [pool ('127.0.0.1', 8000)] pool cleared return socket: SocketInfo(<socket.socket [closed] fd=-1, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6>) CLOSED at 0x7f8e156389a0 2021-06-10 18:00:18,648 INFO event_loggers [pool ('127.0.0.1', 8000)][conn #1] connection checked into pool 2021-06-10 18:00:18,649 INFO event_loggers [pool ('127.0.0.1', 8000)][conn #1] connection closed, reason: error server.pool after error: deque([]) 2021-06-10 18:00:18,649 INFO event_loggers [pool ('127.0.0.1', 8000)] connection check out started 2021-06-10 18:00:18,650 INFO event_loggers [pool ('127.0.0.1', 8000)][conn #2] connection created 2021-06-10 18:00:18,652 INFO event_loggers [pool ('127.0.0.1', 8000)][conn #2] connection setup succeeded 2021-06-10 18:00:18,652 INFO event_loggers [pool ('127.0.0.1', 8000)][conn #2] connection checked out of pool get_socket checkout: SocketInfo(<socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 61224), raddr=('127.0.0.1', 8000)>) at 0x7f8e1543e7c0 server.pool after success: deque([]) Created _SocketManager: SocketInfo(<socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 61224), raddr=('127.0.0.1', 8000)>) at 0x7f8e1543e7c0 get_socket: SocketInfo(<socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 61216), raddr=('127.0.0.1', 8000)>) at 0x7f8e075693a0 return socket: SocketInfo(<socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 61216), raddr=('127.0.0.1', 8000)>) at 0x7f8e075693a0 sockets.appendleft: SocketInfo(<socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 61216), raddr=('127.0.0.1', 8000)>) at 0x7f8e075693a0 sock: SocketInfo(<socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 61224), raddr=('127.0.0.1', 8000)>) at 0x7f8e1543e7c0 Pool before cursor del: deque([]) Pool after cursor del: deque([]) gc: collectable <Cursor 0x7f8df5d292e0> gc: collectable <dict 0x7f8df5d65b40> gc: collectable <Collection 0x7f8e07569fd0> gc: collectable <dict 0x7f8df5d65a80> gc: collectable <Database 0x7f8e07569e20> gc: collectable <CodecOptions 0x7f8df5d18c10> gc: collectable <dict 0x7f8df5d23d40> gc: collectable <_ServerSession 0x7f8e15a157c0> gc: collectable <Binary 0x7f8df5d2db80> gc: collectable <dict 0x7f8e12d9d780> gc: collectable <dict 0x7f8e14ab2a40> gc: collectable <SessionOptions 0x7f8e14ff4100> gc: collectable <ClientSession 0x7f8e14fe3b20> gc: collectable <dict 0x7f8e13d19840> gc: collectable <_Transaction 0x7f8e15256cd0> gc: collectable <_Query 0x7f8e147dd880> gc: collectable <method 0x7f8df5d2f080> gc: collectable <cell 0x7f8e15256c10> gc: collectable <cell 0x7f8e153f5a00> gc: collectable <cell 0x7f8e153f59a0> gc: collectable <cell 0x7f8e152cea90> gc: collectable <tuple 0x7f8df5d2da90> gc: collectable <function 0x7f8e1543b1f0> gc: collectable <function 0x7f8e15639c10> gc: collectable <frame 0x7f8df5f0e740> gc: collectable <frame 0x7f8e1553fc90> gc: collectable <_MongoClientErrorHandler 0x7f8df5d25760> gc: collectable <frame 0x7f8e15540a10> gc: collectable <socket 0x7f8e156dfb20> gc: collectable <SocketInfo 0x7f8e156389a0> gc: collectable <dict 0x7f8df5d2f200> gc: collectable <set 0x7f8e15a13660> gc: collectable <SocketChecker 0x7f8e15638af0> gc: collectable <SON 0x7f8e156e80e0> gc: collectable <dict 0x7f8e13d5c140> gc: collectable <list 0x7f8e147959c0> gc: collectable <tuple 0x7f8e1578de40> gc: collectable <tuple 0x7f8df5d08040> gc: collectable <managedbuffer 0x7f8e152660c0> gc: collectable <memoryview 0x7f8e08e0bf40> gc: collectable <tuple 0x7f8e152c2ca0> gc: collectable <OSError 0x7f8df5d51240> gc: collectable <traceback 0x7f8e08e68a00> gc: collectable <frame 0x7f8e1554f440> gc: collectable <traceback 0x7f8e08e68240> gc: collectable <frame 0x7f8e1554f0e0> gc: collectable <traceback 0x7f8e08e68340> gc: collectable <AutoReconnect 0x7f8e15796c40> gc: collectable <tuple 0x7f8e156389d0> gc: collectable <set 0x7f8e15a13820> gc: collectable <dict 0x7f8df5d2b300> gc: collectable <list 0x7f8e154509c0> gc: collectable <traceback 0x7f8e08e68780> gc: collectable <frame 0x7f8df5d35cf0> gc: collectable <traceback 0x7f8e1475d4c0> gc: collectable <frame 0x7f8df5d7d580> gc: collectable <traceback 0x7f8df5d23a00> gc: collectable <frame 0x7f8df5d27dd0> gc: collectable <traceback 0x7f8df5d237c0> gc: collectable <frame 0x7f8e155cf020> gc: collectable <traceback 0x7f8df5d238c0> gc: collectable <frame 0x7f8df5d30c40> gc: collectable <traceback 0x7f8df5d23940> gc: collectable <traceback 0x7f8df5d23840> gc: collectable <traceback 0x7f8df5d23c80> gc: collectable <traceback 0x7f8df5d23fc0> gc: collectable <socket 0x7f8df5d734c0> gc: collectable <SocketInfo 0x7f8e1543e7c0> gc: collectable <dict 0x7f8e14795a00> gc: collectable <set 0x7f8e15a13900> gc: collectable <SocketChecker 0x7f8e15638340> gc: collectable <ObjectId 0x7f8e14ab95b0> gc: collectable <_OpMsg 0x7f8df5d04340> gc: collectable <list 0x7f8e08e668c0> gc: collectable <list 0x7f8e157d83c0> gc: collectable <ObjectId 0x7f8e14ab9580> gc: collectable <dict 0x7f8df5d65900> gc: collectable <ObjectId 0x7f8e1525b370> gc: collectable <dict 0x7f8df5d658c0> gc: collectable <ObjectId 0x7f8df5d29400> gc: collectable <dict 0x7f8df5d65a40> gc: collectable <dict 0x7f8df5d04440> gc: collectable <Int64 0x7f8df5d65c40> gc: collectable <dict 0x7f8df5d23a40> gc: collectable <Timestamp 0x7f8df5d298b0> gc: collectable <PinnedResponse 0x7f8df5d73760> gc: collectable <frame 0x7f8df5f09ae0> gc: collectable <frame 0x7f8df5f09140> gc: collectable <_SocketManager 0x7f8e15619f70> gc: collectable <dict 0x7f8e08e666c0> gc: collectable <collections.deque 0x7f8df5d73640> gc: collectable <frame 0x7f8df5f08b60> gc: collectable <frame 0x7f8e14be5cd0> gc: collectable <frame 0x7f8df5d75d40> Cursor.__die _SocketManager.close: SocketInfo(<socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 61224), raddr=('127.0.0.1', 8000)>) at 0x7f8e1543e7c0 return socket: SocketInfo(<socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 61224), raddr=('127.0.0.1', 8000)>) at 0x7f8e1543e7c0 2021-06-10 18:00:18,687 INFO event_loggers [pool ('127.0.0.1', 8000)][conn #2] connection checked into pool sockets.appendleft: SocketInfo(<socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 61224), raddr=('127.0.0.1', 8000)>) at 0x7f8e1543e7c0 _SocketManager.__del__: None /Users/shane/git/mongo-python-driver/test/load_balancer/test_load_balancer.py:99: ResourceWarning: unclosed <socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 61224), raddr=('127.0.0.1', 8000)> gc.collect() Object allocated at (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/suite.py", lineno 84 return self.run(*args, **kwds) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/suite.py", lineno 122 test(result) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/suite.py", lineno 84 return self.run(*args, **kwds) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/suite.py", lineno 122 test(result) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/suite.py", lineno 84 return self.run(*args, **kwds) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/suite.py", lineno 122 test(result) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", lineno 736 return self.run(*args, **kwds) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", lineno 676 self._callTestMethod(testMethod) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", lineno 633 method() File "/Users/shane/git/mongo-python-driver/test/load_balancer/test_load_balancer.py", lineno 83 next(cursor) File "/Users/shane/git/mongo-python-driver/pymongo/cursor.py", lineno 1237 if len(self.__data) or self._refresh(): File "/Users/shane/git/mongo-python-driver/pymongo/cursor.py", lineno 1158 self.__send_message(q) File "/Users/shane/git/mongo-python-driver/pymongo/cursor.py", lineno 1046 response = client._run_operation( File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", lineno 1289 return self._retryable_read( File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", lineno 1391 with self._slaveok_for_server( File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", lineno 113 return next(self.gen) File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", lineno 1235 with self._get_socket(server, session, pin=pin) as sock_info: File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", lineno 113 return next(self.gen) File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", lineno 1171 with server.get_socket( File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", lineno 113 return next(self.gen) File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", lineno 1374 sock_info = self._get_socket(all_credentials) File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", lineno 1497 sock_info = self.connect(all_credentials) File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", lineno 1323 sock = _configured_socket(self.address, self.opts) File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", lineno 1020 sock = _create_connection(address, options) File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", lineno 982 sock = socket.socket( gc: collectable <tuple 0x7f8df5da4b50> gc: collectable <type 0x7f8df5f326a0> gc: collectable <getset_descriptor 0x7f8df5db2580> gc: collectable <dict 0x7f8df5d83080> gc: collectable <tuple 0x7f8df5da8900> gc: collectable <tuple 0x7f8df5da4c70> gc: collectable <type 0x7f8df5f32da0> gc: collectable <tuple 0x7f8df5daef90> gc: collectable <tuple 0x7f8df5da4d00> gc: collectable <type 0x7f8df5f33410> gc: collectable <tuple 0x7f8df5db3220> Pool after gc.collect: deque([SocketInfo(<socket.socket fd=-1, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6>) at 0x7f8e1543e7c0])
From the above we can see the GC executes Cursor._del_ which correctly transfers ownership of the SocketInfo to the Pool but then the SocketInfo's internal socket is then immediately closed (in the same GC call):
gc: collectable <Cursor 0x7f8df5d292e0> gc: collectable <socket 0x7f8df5d734c0> gc: collectable <SocketInfo 0x7f8e1543e7c0> gc: collectable <_SocketManager 0x7f8e15619f70> gc: collectable <traceback 0x7f8e08e68340> gc: collectable <AutoReconnect 0x7f8e15796c40> Cursor.__die _SocketManager.close: SocketInfo(<socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 61224), raddr=('127.0.0.1', 8000)>) at 0x7f8e1543e7c0 return socket: SocketInfo(<socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 61224), raddr=('127.0.0.1', 8000)>) at 0x7f8e1543e7c0 2021-06-10 18:00:18,687 INFO event_loggers [pool ('127.0.0.1', 8000)][conn #2] connection checked into pool sockets.appendleft: SocketInfo(<socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 61224), raddr=('127.0.0.1', 8000)>) at 0x7f8e1543e7c0 _SocketManager.__del__: None /Users/shane/git/mongo-python-driver/test/load_balancer/test_load_balancer.py:99: ResourceWarning: unclosed <socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 61224), raddr=('127.0.0.1', 8000)> gc.collect()
It seems like the GC is mistakenly loosing a reference count to the socket.socket instance. We should whittle this down to a small reproducer script which doesn't reply on pymongo and report the bug to CPython.
- related to
-
PYTHON-2673 Connection pinning behavior for load balanced clusters
- Closed