A stepdown during MigrationSourceManager::enterCriticalSection can trigger the cleanupOnError scope guard and eventually call MigrationSourceManager::_cleanup. This function std::moves the manager's clone driver into a local variable so it is destructed when the function exits, but it calls two functions before calling cancelClone on the cloneDriver (which puts it into state kDone), and if either of them throws (which ShardServerCatalogCacheLoader::waitForCollectionFlush can if the node's replication role changes), the invariant in the clone driver's destructor fails, because it will still be in state kCloning.
I think the fix would be to either move the cancelClone call earlier in _cleanup, or put it in a scope guard declared after _cloneDriver is extracted into a local variable.