We should create a Python test case that holds back the stable timestamp, inserts a bunch of data, creates a checkpoint, inserts a little more data and creates a new checkpoint.
The second checkpoint should only do enough I/O to review the content updated since the previous checkpoint.