-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 3.0.1, 3.2.0-rc0
-
Component/s: Internal Code
-
None
-
Fully Compatible
-
ALL
The "MongoDB starting" logline (and related lines) are logged when mongod/mongos starts up, and provide basic information about the environment in use that is invaluable when diagnosing issues:
$ /m/3.0.0/bin/mongod --port 12345 2015-10-27T10:48:11.823+1100 I CONTROL [initandlisten] MongoDB starting : pid=29676 port=12345 dbpath=/data/db 64-bit host=genique 2015-10-27T10:48:11.823+1100 I CONTROL [initandlisten] db version v3.0.0 2015-10-27T10:48:11.823+1100 I CONTROL [initandlisten] git version: a841fd6394365954886924a35076691b4d149168 2015-10-27T10:48:11.823+1100 I CONTROL [initandlisten] build info: Linux build15.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49 2015-10-27T10:48:11.823+1100 I CONTROL [initandlisten] allocator: tcmalloc 2015-10-27T10:48:11.823+1100 I CONTROL [initandlisten] options: { net: { port: 12345 } } 2015-10-27T10:48:11.848+1100 I JOURNAL [initandlisten] journal dir=/data/db/journal 2015-10-27T10:48:11.848+1100 I JOURNAL [initandlisten] recover : no journal files present, no recovery needed 2015-10-27T10:48:11.981+1100 I JOURNAL [durability] Durability thread started 2015-10-27T10:48:11.981+1100 I JOURNAL [journal writer] Journal writer thread started 2015-10-27T10:48:11.987+1100 I NETWORK [initandlisten] waiting for connections on port 12345
However, since 3.0.1, much initialisation work is done ahead of these basic log lines. Specifically, storage engine initialisation (which includes journal replay), and setting up sockets (which is worse as of 3.1.9).
$ /m/3.0.1/bin/mongod --port 12345 2015-10-27T10:48:20.026+1100 I JOURNAL [initandlisten] journal dir=/data/db/journal 2015-10-27T10:48:20.027+1100 I JOURNAL [initandlisten] recover : no journal files present, no recovery needed 2015-10-27T10:48:20.160+1100 I JOURNAL [durability] Durability thread started 2015-10-27T10:48:20.160+1100 I JOURNAL [journal writer] Journal writer thread started 2015-10-27T10:48:20.161+1100 I CONTROL [initandlisten] MongoDB starting : pid=29829 port=12345 dbpath=/data/db 64-bit host=genique 2015-10-27T10:48:20.162+1100 I CONTROL [initandlisten] db version v3.0.1 2015-10-27T10:48:20.162+1100 I CONTROL [initandlisten] git version: 534b5a3f9d10f00cd27737fbcd951032248b5952 2015-10-27T10:48:20.162+1100 I CONTROL [initandlisten] build info: Linux build16.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49 2015-10-27T10:48:20.162+1100 I CONTROL [initandlisten] allocator: tcmalloc 2015-10-27T10:48:20.162+1100 I CONTROL [initandlisten] options: { net: { port: 12345 } } 2015-10-27T10:48:20.188+1100 I NETWORK [initandlisten] waiting for connections on port 12345
While I understand the need to initialise sockets and the storage engine as early as possible, it can be very problematic for these to be done ahead of basic logging of things like version number, build info, and options. This is because there are ample opportunities where the server can log errors and exit before the basic environment information has been logged.
For example, compare the behaviour when the port is already in use:
$ /m/3.2.0-rc0/bin/mongod 2015-10-27T10:41:55.964+1100 E NETWORK [initandlisten] listen(): bind() failed errno:98 Address already in use for socket: 0.0.0.0:27017 2015-10-27T10:41:55.964+1100 E NETWORK [initandlisten] addr already in use 2015-10-27T10:41:55.964+1100 E STORAGE [initandlisten] Failed to set up sockets during startup. 2015-10-27T10:41:55.964+1100 I CONTROL [initandlisten] dbexit: rc: 48 $ /m/3.0.0/bin/mongod 2015-10-27T11:11:21.686+1100 I CONTROL [initandlisten] MongoDB starting : pid=17987 port=27017 dbpath=/data/db 64-bit host=genique 2015-10-27T11:11:21.686+1100 I CONTROL [initandlisten] db version v3.0.0 2015-10-27T11:11:21.686+1100 I CONTROL [initandlisten] git version: a841fd6394365954886924a35076691b4d149168 2015-10-27T11:11:21.686+1100 I CONTROL [initandlisten] build info: Linux build15.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49 2015-10-27T11:11:21.686+1100 I CONTROL [initandlisten] allocator: tcmalloc 2015-10-27T11:11:21.686+1100 I CONTROL [initandlisten] options: {} 2015-10-27T11:11:21.711+1100 E NETWORK [initandlisten] listen(): bind() failed errno:98 Address already in use for socket: 0.0.0.0:27017 2015-10-27T11:11:21.711+1100 E NETWORK [initandlisten] addr already in use 2015-10-27T11:11:21.718+1100 I JOURNAL [initandlisten] journal dir=/data/db/journal 2015-10-27T11:11:21.718+1100 I JOURNAL [initandlisten] recover : no journal files present, no recovery needed 2015-10-27T11:11:21.864+1100 I JOURNAL [durability] Durability thread started 2015-10-27T11:11:21.864+1100 I JOURNAL [journal writer] Journal writer thread started 2015-10-27T11:11:21.871+1100 I CONTROL [initandlisten] now exiting 2015-10-27T11:11:21.871+1100 I NETWORK [initandlisten] shutdown: going to close listening sockets... 2015-10-27T11:11:21.871+1100 I NETWORK [initandlisten] shutdown: going to flush diaglog... 2015-10-27T11:11:21.871+1100 I NETWORK [initandlisten] shutdown: going to close sockets... 2015-10-27T11:11:21.872+1100 I STORAGE [initandlisten] shutdown: waiting for fs preallocator... 2015-10-27T11:11:21.872+1100 I STORAGE [initandlisten] shutdown: final commit... 2015-10-27T11:11:21.876+1100 I JOURNAL [initandlisten] journalCleanup... 2015-10-27T11:11:21.876+1100 I JOURNAL [initandlisten] removeJournalFiles 2015-10-27T11:11:21.878+1100 I JOURNAL [initandlisten] Terminating durability thread ... 2015-10-27T11:11:21.974+1100 I JOURNAL [journal writer] Journal writer thread stopped 2015-10-27T11:11:21.974+1100 I JOURNAL [durability] Durability thread stopped 2015-10-27T11:11:21.975+1100 I STORAGE [initandlisten] shutdown: closing all files... 2015-10-27T11:11:21.976+1100 I STORAGE [initandlisten] closeAllFiles() finished 2015-10-27T11:11:21.976+1100 I STORAGE [initandlisten] shutdown: removing fs lock... 2015-10-27T11:11:21.976+1100 I CONTROL [initandlisten] dbexit: rc: 48
And the following actual (full) logfiles from the field, which contain no useful context about version, build, etc. This means that support engineers are left hunting for crumbs of information, like the timestamp/severity/component format indicating 3.0.x, and the Hotfix line and backslashes in paths indicating Windows.
2015-10-20T09:09:31.180-0500 I CONTROL Hotfix KB2731284 or later update is not installed, will zero-out data files 2015-10-20T09:09:31.204-0500 I JOURNAL [initandlisten] journal dir=db\journal 2015-10-20T09:09:31.205-0500 I JOURNAL [initandlisten] recover : no journal files present, no recovery needed 2015-10-20T09:09:31.205-0500 I JOURNAL [initandlisten] 2015-10-20T09:09:31.206-0500 E JOURNAL [initandlisten] Insufficient free space for journal files 2015-10-20T09:09:31.206-0500 I JOURNAL [initandlisten] Please make at least 3379MB available in db\journal or use --smallfiles 2015-10-20T09:09:31.206-0500 I JOURNAL [initandlisten] 2015-10-20T09:09:31.207-0500 I STORAGE [initandlisten] exception in initAndListen: 15926 Insufficient free space for journals, terminating 2015-10-20T09:09:31.207-0500 I CONTROL [initandlisten] now exiting 2015-10-20T09:09:31.207-0500 I NETWORK [initandlisten] shutdown: going to close listening sockets... 2015-10-20T09:09:31.207-0500 I NETWORK [initandlisten] shutdown: going to flush diaglog... 2015-10-20T09:09:31.208-0500 I NETWORK [initandlisten] shutdown: going to close sockets... 2015-10-20T09:09:31.208-0500 I STORAGE [initandlisten] shutdown: waiting for fs preallocator... 2015-10-20T09:09:31.208-0500 I STORAGE [initandlisten] shutdown: final commit... 2015-10-20T09:09:31.208-0500 I STORAGE [initandlisten] shutdown: closing all files... 2015-10-20T09:09:31.208-0500 I STORAGE [initandlisten] closeAllFiles() finished 2015-10-20T09:09:31.208-0500 I CONTROL [initandlisten] dbexit: rc: 100
2015-10-21T08:39:23.982-0500 I CONTROL Hotfix KB2731284 or later update is not installed, will zero-out data files 2015-10-21T08:39:23.982-0500 W - [initandlisten] Detected unclean shutdown - db\mongod.lock is not empty. 2015-10-21T08:39:23.983-0500 I JOURNAL [initandlisten] journal dir=db\journal 2015-10-21T08:39:23.983-0500 I JOURNAL [initandlisten] recover begin 2015-10-21T08:39:23.983-0500 I JOURNAL [initandlisten] recover lsn: 6474133 2015-10-21T08:39:23.983-0500 I JOURNAL [initandlisten] recover db\journal\j._2 2015-10-21T08:39:23.983-0500 I JOURNAL [initandlisten] recover skipping application of section seq:5634193 < lsn:6474133 2015-10-21T08:39:24.013-0500 I JOURNAL [initandlisten] recover skipping application of section seq:5694183 < lsn:6474133 2015-10-21T08:39:24.606-0500 I JOURNAL [initandlisten] recover skipping application of section seq:5754183 < lsn:6474133 2015-10-21T08:39:25.650-0500 I JOURNAL [initandlisten] recover skipping application of section seq:5814183 < lsn:6474133 2015-10-21T08:39:26.261-0500 I JOURNAL [initandlisten] recover skipping application of section seq:5874173 < lsn:6474133 2015-10-21T08:39:26.977-0500 I JOURNAL [initandlisten] recover skipping application of section seq:5934173 < lsn:6474133 2015-10-21T08:39:27.486-0500 I JOURNAL [initandlisten] recover skipping application of section seq:5994163 < lsn:6474133 2015-10-21T08:39:28.106-0500 I JOURNAL [initandlisten] recover skipping application of section seq:6054163 < lsn:6474133 2015-10-21T08:39:28.605-0500 I JOURNAL [initandlisten] recover skipping application of section seq:6114163 < lsn:6474133 2015-10-21T08:39:29.363-0500 I JOURNAL [initandlisten] recover skipping application of section more... 2015-10-21T08:39:29.988-0500 I JOURNAL [initandlisten] recover db\journal\j._3 2015-10-21T08:39:42.883-0500 I JOURNAL [initandlisten] recover cleaning up 2015-10-21T08:39:42.884-0500 I JOURNAL [initandlisten] removeJournalFiles 2015-10-21T08:39:42.921-0500 I JOURNAL [initandlisten] recover done 2015-10-21T08:39:42.921-0500 I JOURNAL [initandlisten] 2015-10-21T08:39:42.921-0500 E JOURNAL [initandlisten] Insufficient free space for journal files 2015-10-21T08:39:42.921-0500 I JOURNAL [initandlisten] Please make at least 422MB available in db\journal or use --smallfiles 2015-10-21T08:39:42.921-0500 I JOURNAL [initandlisten] 2015-10-21T08:39:42.921-0500 I STORAGE [initandlisten] exception in initAndListen: 15926 Insufficient free space for journals, terminating 2015-10-21T08:39:42.922-0500 I CONTROL [initandlisten] now exiting 2015-10-21T08:39:42.922-0500 I NETWORK [initandlisten] shutdown: going to close listening sockets... 2015-10-21T08:39:42.922-0500 I NETWORK [initandlisten] shutdown: going to flush diaglog... 2015-10-21T08:39:42.922-0500 I NETWORK [initandlisten] shutdown: going to close sockets... 2015-10-21T08:39:42.922-0500 I STORAGE [initandlisten] shutdown: waiting for fs preallocator... 2015-10-21T08:39:42.922-0500 I STORAGE [initandlisten] shutdown: final commit... 2015-10-21T08:39:42.922-0500 I STORAGE [initandlisten] shutdown: closing all files... 2015-10-21T08:39:42.923-0500 I STORAGE [initandlisten] closeAllFiles() finished 2015-10-21T08:39:42.923-0500 I CONTROL [initandlisten] dbexit: rc: 100
2015-10-22T20:24:06.133-0500 I CONTROL Hotfix KB2731284 or later update is not installed, will zero-out data files 2015-10-22T20:24:06.134-0500 W - [initandlisten] Detected unclean shutdown - db\mongod.lock is not empty. 2015-10-22T20:24:06.142-0500 I STORAGE [initandlisten] ************** old lock file: db\mongod.lock. probably means unclean shutdown, but there are no journal files to recover. this is likely human error or filesystem corruption. please make sure that your journal directory is mounted. found 4 dbs. see: http://dochub.mongodb.org/core/repair for more information ************* 2015-10-22T20:24:06.147-0500 I STORAGE [initandlisten] exception in initAndListen: 12596 old lock file, terminating 2015-10-22T20:24:06.149-0500 I CONTROL [initandlisten] dbexit: rc: 100
Logging this very basic information ahead of initialising sockets or storage should not deprive those systems of any significant resources. Is there any reason it should not be done first thing inside _initAndListen? As long as initialising sockets comes immediately after, followed immediately by storage engine init, then this should be fine, right?