> parity/ECC

Parity is ECC (which is usually Reed-Solomon, which is just a fancy name for a big set of more equations than data chunk you have, so that's how it adds in redundancy) with 1 bit. Usually you should aim for +20-40% redundancy.

Ceph, HDFS and other distributed storage systems implement erasure coding (which is subtly different from error correction coding), which I would recommend for handling backups.

blattimwind

The interesting thing about erasure codes is that you need to checksum your shards independently from the EC itself. If you supply corrupted or wrong shards, you get corrupted data back.

I think for backup (as in small-scale, "fits on one disk") error-correcting codes are not a really good approach, because IME hard disks with one error you notice usually have made many more errors - or will do so shortly. In that case no ECC will help you. If, on the other hand, you're looking at an isolated error, then only very little data is affected (on average).

For example, a bit error in a chunk in a tool like borg/restic will only break that chunk; a piece of a file or perhaps part of a directory listing.

So for these kinds of scenarios "just use multiple backup drives with fully independent backups" is better and simpler.

pas

For small scale, use Dropbox or Google Drive, or whatever, because for small scale the most important part of backup is actually reliably having it done. If you rely on manual process, you're doomed. :)

For large scale in house things: Ceph regurarly does scrubbing of the data. (Compares checksums.) and DreamHost has DreamObjects.

Thanks for mentioning borg/restic, I have never heard of them. (rsnapshot [rsync] works well, but it's not so shiny) Deduplication sounds nice. (rsnapshot uses hardlinks.)

That made me look for something btrfs based, and here's this https://github.com/digint/btrbk seems useful (send btrfs snapshots to a remote somewhere, also can be encrypted), could be useful for small setups.