Changes for WT-4732 introduced a mechanism to pick dhandles randomly for eviction. This was done to avoid following any specific pattern in choosing dhandles and hence be fair in our eviction policy.
The dhandles are maintained in a global dhandle list as well as in a hash table with a fixed size of buckets (512), each bucket containing a list of dhandles (that hash collides) in that bucket. The change for WT-4732 was supposed to do the following:
1. If there are less than a certain number of dhandles, walk the global list to a random dhandle
2. If there are a large number of dhandles, pick a random dhandle bucket and then pick a random dhandle in that bucket's dhandle list.
The changes introduced a bug in step 2, where after picking the random bucket, with say N dhandles in its list, instead of picking a random dhandle from among those N, it would pick a random dhandle from the next N in the global list. Hence limiting eviction to a small subset of dhandles in the global list.
The above bug leads to eviction server's inability to find pages to evict and eventually cache gets full and workload stalls.