Lock contention is a real issue for any multi-threaded system, and while a RW mutex is useful when you have a longer executing critical section, for something very short lived there is still a cache coordination cost. In many of the HashiCorp applications, we work around this by using an immutable radix tree design instead [1].Instead of a RW mutex, you have a single writer lock. Any writer acquires the lock, makes changes, and generates a new root pointer to the tree (any update operation generates a new root, because the tree is immutable). Then we do an atomic swap from the old root to the new root. Any readers do an atomic read of the current point in time root, and perform their read operations lock free. This is safe because the tree is immutable, so readers don't need to be concerned with another thread modifying the tree concurrently, any modifications will create a new tree. This is a pattern we've standardized with a library we call MemDB [2].
This has the advantage of making reads multi-core scalable with much lower lock contention. Given we typically use Raft for distributed consensus, you only have a single writer anyways (e.g. the FSM commit thread is the only writer).
We apply this pattern to Vault, Consul, and Nomad all of which are able to scale to many dozens of cores, with largely a linear speedup in read performance.
[1] https://github.com/hashicorp/go-immutable-radix
[2] https://github.com/hashicorp/go-memdb