@agorrod, @michaelcahill: there's a stall in the new-split branch. I was hoping Michael's WT-931 would fix it, but I can still reproduce the problem. Here's the config I'm using, and the more threads, the sooner it fires:
file_type=row data_source=file checkpoints=1 cache=5 compression=none leaf_page_max=12 internal_page_max=12 ops=1000000 rows=1000 key_max=32 value_max=32
and we end up with checkpoint in an infinite loop walking the tree:
#0 __wt_tree_walk (session=0x8024ff180, pagep=0x7ffffddee9d8, flags=320) at ../src/btree/bt_walk.c:317 WT-1 0x0000000000446afd in __wt_sync_file (session=0x8024ff180, syncop=8) at ../src/btree/bt_evict.c:655 WT-2 0x0000000000457477 in __wt_bt_cache_op (session=0x8024ff180, ckptbase=0x8062fe400, op=8) at ../src/btree/bt_sync.c:59 WT-3 0x00000000004403bc in __checkpoint_worker (session=0x8024ff180, cfg=0x7ffffddeede0, is_checkpoint=1) at ../src/txn/txn_ckpt.c:750
It looks to me like checkpoint is looping between two pages: the "couple" page and the next page (which is a WT_REF_SPLIT page). Checkpoint reads the split page, gets a WT_RESTART return, returns to the "couple" page, does a next, and winds up on the split page again.
I can reproduce the problem even without deepening the tree, so this is a fundamental issue in splitting (maybe an eviction race with checkpoint, maybe a race inside split itself).