-
Type: Improvement
-
Resolution: Done
-
Priority: Critical - P2
-
None
-
Affects Version/s: None
-
Component/s: None
-
Storage - Ra 2022-03-21
-
v6.0
Summary
The atomic operations in WiredTiger should ensure that a full memory barrier is provided by them.
Motivation
The lock-free algorithms in use by WiredTiger rely on the instructions not getting re-ordered across the atomics. On GCC WiredTiger uses the __atomic builtins to implement its atomic operations. The __atomic builtins provided by the compiler could potentially not be utilizing a full memory barrier instruction on some platforms with a relaxed memory model like AArch64 (ARM64). This could leave WiredTiger exposed to a data corruption possibility on such platforms. WiredTiger should make sure the underlying primitives it uses include a full memory sync each time they are used.
For reference:
A thread on GCC discussing why __atomic builtins might not be strong enough for the older __sync builtins.
Another thread discussing how AArch64 atomics might allow subtle re-orders.
- Does this affect any team outside of WT?
No
- How likely is it that this use case or problem will occur?
Arguably possible on x86-64. Very rare, but possible on AArch64 (ARM64).
- If the problem does occur, what are the consequences and how severe are they?
On the platforms like ARM64 that have a relaxed memory model for the SMP architecture, it could result in subtle data corruption bugs.
- Is this issue urgent?
Yes
Acceptance Criteria (Definition of Done)
All the atomic operations have been guaranteed to provide a full memory barrier on the various platforms WiredTiger supports. The performance impact has been evaluated and found to be acceptable. WiredTiger stress testing and MongoDB patch testing has been successfully completed.
Testing
WiredTiger stress tests, MongoDB patch tests, WiredTiger and MongoDB performance tests.
Documentation update
Change the documentation on building on the POSIX systems to ensure the build provides a full barrier that WiredTiger needs.
Updated Scope
The scope for this ticket has been limited to an investigation into the atomic operations on the ARM platform. New tickets will be created for any follow on work.
[Optional] Suggested Solution
A possible way is to use __sync builtins instead of __atomic builtins as they are meant to provide a full barrier. The modern __sync builtins are implemented using the __atomic builtins, so we have to be careful about doing so. Newer versions of GCC should handle this correctly since a change went into GCC 5.0 to provide __atomic builtins with a full barrier to support the __sync builtins.
Another possibility could be to directly use the MEMMODEL_SYNC with __atomic builtins that was introduced to fix the __sync builtins.
We will have to investigate the facilities provided by the MSVC compiler.
- depends on
-
WT-8946 Use the maximum number of cores for ARM64 stress testing and disable ASan tests on this build variant
- Closed
-
WT-9079 All memory barriers on ARM should be DMB instead of DSB
- Closed
-
WT-9081 Reintroduce PPC tests
- Closed
- is related to
-
WT-9256 Investigate changes to WiredTiger for non-LSE ARM platforms
- Closed
-
WT-6798 Utilize Arm LSE atomics and the correct strength barriers
- Closed
-
WT-9124 Evaluate increasing stress testing coverage for PPC
- Backlog
-
WT-2786 Migrate WiredTiger to the C11 memory model
- Closed
-
WT-9080 Increase the amount of stress testing on ARM platform
- Closed