> We can clearly see how our old CPUs were reaching their limit. In the week before we upgraded our primary database server, its CPU usage (from /proc/stat) averaged over 90%

This strikes me as odd. In my experience, traditional OLTP row stores are I/O bound due to contention (locking and latching). Does anyone have an explanation for this?

> Once you have a server full of NVMe drives, you have to decide how to manage them. Our previous generation of database servers used hardware RAID in a RAID-10 configuration, but there is no effective hardware RAID for NVMe, so we needed another solution... we got several recommendations for OpenZFS and decided to give it a shot.

Again, traditional OLTP row stores have included a mechanism for recovering from media failure: place the WAL log on separate device from the DB. Early MySQL used a proprietary backup add-on as a revenue model so maybe this technique is now obfuscated and/or missing. You may still need/want a mechanism to federate the DB devices and incremental volume snapshots are far superior to full DB backup but placing the WAL log on a separate device is a fantastic technique for both performance and availability.

The Let's Encrypt post does not describe how they implement off-machine and off-site backup-and-recovery. I'd like to know if and how they do this.

> The Let's Encrypt post does not describe how they implement off-machine and off-site backup-and-recovery. I'd like to know if and how they do this.

The section:

> There wasn’t a lot of information out there about how best to set up and optimize OpenZFS for a pool of NVMe drives and a database workload, so we want to share what we learned. You can find detailed information about our setup in this GitHub repository.

points to: https://github.com/letsencrypt/openzfs-nvme-databases

Which states:

> Our primary database server rapidly replicates to two others, including two locations, and is backed up daily. The most business- and compliance-critical data is also logged separately, outside of our database stack. As long as we can maintain durability for long enough to evacuate the primary (write) role to a healthier database server, that is enough.

Which sounds like traditional master/slave setup, with fail over?