-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
Improvements to log slot freeing to improve thread scalability
Investigated the negative scaling of writeahead log seen in SERVER-18908 and SERVER-19189. Found two issues; experimental patch that appears to address both attached.
- Threads are often waiting because there are no FREE slots. Slots are freed by the __log_wrlsn_server. However because it is done asyncronously there may be unnecessary delay in freeing slots for a couple reason: if there is thread contention __log_wrlsn_server may not get scheduled; it uses yields and sleeps so it may not notice when slots become freeable; and because the thread waiting for a FREE slot in __wt_log_slot_close is also using yields and sleeps, it may not notice right away when a slot is freed. The patch addresses this issue by pulling the slot-freeing logic from the loop of __log_wrlsn_server out into a function __log_wrlsn which is then called from __wt_log_slot_close when it has scanned all the slots and not found a FREE one. This call is made with the log_slot_lock held for thread-safety, but that's ok because at that point any thread that would have entered that lock would have become stuck anyway due to lack of FREE slots.
- By adding some messages to the code I noticed that often when threads were stuck in __wt_log_slot_close waiting for a FREE slot there were many WRITTEN slots but no FREE slots because the oldest slot was not yet WRITTEN (either because it was waiting for i/o to complete, or actually more often was waiting for all threads that had joined the slot to copy their data into the buffer and transition the slot to DONE - presumably because one of the threads that had to do so was held up by contention.) In other words slots were like this:
SLOT: start_lsn=1000, end_lsn=2000, state<DONE (i.e. threads copying data into the slot buffer) SLOT: start_lsn=2000, end_lsn=3000, state=WRITTEN (i.e. slot has been written to disk and is now waiting to be freed) SLOT: start_lsn=3000, end_lsn=4000, state=WRITTEN (i.e. slot has been written to disk and is now waiting to be freed) SLOT: start_lsn=4000, end_lsn=5000, state=WRITTEN (i.e. slot has been written to disk and is now waiting to be freed) SLOT: start_lsn=5000, end_lsn=6000, state=WRITTEN (i.e. slot has been written to disk and is now waiting to be freed)
As I understand the algorithm the only purpose of the WRITTEN slots is to keep track of holes in the log file (for example, 1000-2000 in the example above) so we can correctly advance the LSN - is that right? However they aren't doing so very efficiently - the same information could be recorded by coalescing the WRITTEN slots into a single one (more specifically, one for each hole in the log file), making the other slots FREE, like so:
SLOT: start_lsn=1000, end_lsn=2000, state<DONE (i.e. threads copying data into the slot buffer) SLOT: start_lsn=2000, end_lsn=6000, state=WRITTEN (i.e. slot has been written to disk and is now waiting to be freed) SLOT: state=FREE SLOT: state=FREE SLOT: state=FREE
Attached patch is a POC-level implementation of the above. Some performance numbers, for n mongod client threads doing inserts of tiny documents in 10k batches into a standalone mongod server on a machine with 12 cores (24 cpus):
threads 3.0.4 3.0.4 +WTlog.patch 8 278401 280608 16 379076 405451 24 232358 407481 32 158440 334523 48 125652 246961 64 118095 220157
- performance with a large number of threads has been about doubled
- there is still some negative scaling at large thread counts, so maybe there are additional bottlenecks to be addressed
So this seems good from a performance perspective, at least on this test. Have not done any functional testing on it. michael.cahill, sue.loverso, can you take a look and see if this makes sense to you?
- is depended on by
-
SERVER-18908 Secondaries unable to keep up with primary under WiredTiger
- Closed
-
SERVER-19282 WiredTiger changes in MongoDB 3.1.6
- Closed
-
SERVER-19283 WiredTiger changes for MongoDB 3.0.5
- Closed
-
SERVER-19189 Improve performance under high number of threads with WT
- Closed
-
SERVER-19532 WiredTiger changes for MongoDB 3.1.7
- Closed
-
SERVER-19744 WiredTiger changes for MongoDB 3.0.6
- Closed