-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.6.5
-
Component/s: Stability
-
Environment:Google Cloud Instance n1-highmem-4 (4 vCPUs, 26 GB memory) with Debian 9.4.
-
ALL
-
- Initiate chunk moving
- Wait for several hours and mongod will either exit with out-of-memory error or be killed by oom-killer
A two shards MongoDB database regularly crashes with out-of-memory error or is being killed by the oom-killer. The system runs on GCE Debian 9.4 with MongoDB v3.6.5, WiredTiger storage engine. The servers are n1-highmem-4 (4 vCPUs, 26 GB memory). On the server runs just mongod and there are no other services. mongos are on different servers.
Initially the servers were without swap as it is the practice on Google Cloud, but afterwards a small 1GB swap has been added, but the problem is the same.
We have tried to play with the cacheSizeGB parameter and reduced it to 10GB but still the crash happens.
Usually process exit/crash happens al least once a day.
Currently on the database there is a chunk moving process underway and crashes happens only on the shard instances that receive chunks. Sending chunks instances never crashed.
If mongod process is killed by oom-killer this can be seen in the syslogs:
Jun 15 14:45:17 server4 kernel: [1731430.432189] Out of memory: Kill process 13130 (mongod) score 980 or sacrifice child Jun 15 14:45:17 server4 kernel: [1731430.441717] Killed process 13130 (mongod) total-vm:28280536kB, anon-rss:26174876kB, file-rss:0kB, shmem-rss:0kB
{{}}Sometimes mongod exits with leaving this in the mongod.log:
2018-06-15T02:14:32.456+0200 F - [rsSync] out of memory.0x55cbc8535751 0x55cbc8534d84 0x55cbc8623b4b 0x55cbc86c665c 0x55cbc70fccff 0x55cbc70f8b02 0x55cbc707b3f1 0x55cbc86449b0 0x7fbbf3507494 0x7fbbf3249acf ----- BEGIN BACKTRACE ----- {"backtrace":[{"b":"55CBC6305000","o":"2230751","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55CBC6305000","o":"222FD84","s":"_ZN5mongo29reportOutOfMemoryErrorAndExitEv"},{"b":"55CBC6305000","o":"231EB4B"},{"b":"55CBC6305000","o":"23C165C","s":"_Znam"},{"b":"55CBC6305000","o":"DF7CFF","s":"_ZN5mongo4repl8SyncTail7OpQueueC1Ev"},{"b":"55CBC6305000","o":"DF3B02","s":"_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_22ReplicationCoordinatorE"},{"b":"55CBC6305000","o":"D763F1","s":"_ZN5mongo4repl10RSDataSync4_runEv"},{"b":"55CBC6305000","o":"233F9B0"},{"b":"7FBBF3500000","o":"7494"},{"b":"7FBBF3161000","o":"E8ACF","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.6.5", "gitVersion" : "a20ecd3e3a174162052ff99913bc2ca9a839d618", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.9.0-6-amd64", "version" : "#1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07)", "machine" : "x86_64" }, "somap" : [ { "b" : "55CBC6305000", "elfType" : 3, "buildId" : "7D4592BDFAA6C15459D2319DEAB7F10E9EB4E7D7" }, { "b" : "7FFC48D98000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "A3207CC9FE1CAA3374AE7061AA5C3C5619B8A0E5" }, { "b" : "7FBBF4743000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buildId" : "713D47D5F599289C0A91ADE8F0122B2B4AA78B2E" }, { "b" : "7FBBF42B0000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1", "elfType" : 3, "buildId" : "2CFE882A331D7857E9CE1B5DE3255E6DA76EF899" }, { "b" : "7FBBF4044000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.1", "elfType" : 3, "buildId" : "E2AA3B39763D943F56B3BD05C8E36E639BA95E12" }, { "b" : "7FBBF3E40000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "B895F0831F623C5F23603401D4069F9F94C24761" }, { "b" : "7FBBF3C38000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "5D83E0642E645026DBB11F89F7DF7106BD821495" }, { "b" : "7FBBF3934000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1B95E3A8B8788B07E4F59EE69B1877F9DEB42033" }, { "b" : "7FBBF371D000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "51AD5FD294CD6C813BED40717347A53434B80B7A" }, { "b" : "7FBBF3500000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "4285CD3158DDE596765C747AE210AB6CBD258B22" }, { "b" : "7FBBF3161000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "AA889E26A70F98FA8D230D088F7CC5BF43573163" }, { "b" : "7FBBF495A000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "263F909DBE11A66F7C6233E3FF0521148D9F8370" } ] }} mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x55cbc8535751] mongod(_ZN5mongo29reportOutOfMemoryErrorAndExitEv+0x84) [0x55cbc8534d84] mongod(+0x231EB4B) [0x55cbc8623b4b] mongod(_Znam+0x21C) [0x55cbc86c665c] mongod(_ZN5mongo4repl8SyncTail7OpQueueC1Ev+0x7F) [0x55cbc70fccff] mongod(_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_22ReplicationCoordinatorE+0x402) [0x55cbc70f8b02] mongod(_ZN5mongo4repl10RSDataSync4_runEv+0x111) [0x55cbc707b3f1] mongod(+0x233F9B0) [0x55cbc86449b0] libpthread.so.0(+0x7494) [0x7fbbf3507494] libc.so.6(clone+0x3F) [0x7fbbf3249acf] ----- END BACKTRACE -----
I would suspect memory management is not working properly while receiving chunks.
- duplicates
-
SERVER-36443 Long-running queries should not cause a build-up of unused ChunkManager objects
- Closed
- is related to
-
SERVER-35444 Heap stacks should not be included in serverStatus with heapprofilingenabled
- Closed
-
SERVER-42138 Memory used by ChunkManager objects can grow without bound
- Closed