3/28/2025 at 12:32:29 AM
You must be careful with many of these failure mode “mitigations”. Even a read after write from physical disk doesn’t necessarily guarantee that the data is hardened on the media. With SAS drives you may be able to also set the FUA and DPO bits to ensure the drive isn’t returning data from cache, but I’m not confident on how it can be performed on SATA drives. By default the drives will just return what’s in cache. Even the dual write scenario can have some unexpected failure modes. The article should also have covered data protection mechanisms such as T-10 DIF/DIX, which protect against certain data addressing corruptions in parts of the stack. Ultimately it’s a hard problem, and very few software developers are considering all the failure modes. Some of these failure modes are only seen when running against millions of drives, and onto if you have protection at a higher layer can you figure out what actually happened.by jmpman