I'm occasionally getting a read error when it tries to read the object in cache-bucket.
[1631819751:993716][13985:0x7f3ba055f700], tiered:shadow, WT_CURSOR.insert: int __posix_file_read(WT_FILE_HANDLE *, WT_SESSION *, wt_off_t, size_t, void *), 460: ./cache-bucket/shadow-0000000002.wtobj: handle-read: pread: failed to read 32768 bytes at offset 362000384
The strange thing is that the same thread, looking at the same file, got the (correct) much larger file size and then a smaller one. After the abort due to the error, looking at the database directory for shadow-0000002.wtobj we see:
WT_TEST.tiered-abort/bucket: total 2383652 -rw-r--r-- 1 sue adm 363339776 Sep 16 19:15 shadow-0000000002.wtobj WT_TEST.tiered-abort/cache-bucket: total 1421516 -r--r--r-- 1 sue adm 363339776 Sep 16 19:15 shadow-0000000002.wtobj
I added debugging in both the local_flush and local_flush_finish to look at the file size of the source file:
FLUSH: get size for shadow-0000000002.wtobj dest ./bucket/shadow-0000000002.wtobj FLUSH: Copy shadow-0000000002.wtobj (363339776) to ./bucket/shadow-0000000002.wtobj Checkpoint 3 complete at stable 1325376. FLUSH_FINISH: Rename shadow-0000000002.wtobj (132055040) to ./cache-bucket/shadow-0000000002.wtobj Flush tier 3 completed. [1631819751:993716][13985:0x7f3ba055f700], tiered:shadow, WT_CURSOR.insert: int __posix_file_read(WT_FILE_HANDLE *, WT_SESSION *, wt_off_t, size_t, void *), 460: ./cache-bucket/shadow-0000000002.wtobj: handle-read: pread: failed to read 32768 bytes at offset 362000384 size 132055040
The local_flush_finish just does a rename from the local database source file into the cache directory. The file size in there per the ls -l shows the larger size.
I can pretty reliably reproduce this style of failure with test_tiered_abort -T 12 -t 10