Hi there! Author here.
I created nFreezer initially for my needs, because when doing remote backups (especially on servers on which we never have physical access), it's hard to 100% trust the destination server.
Being curious: how do you usually do remote backups of your important files?
--> With usual solutions, even if you use SSH/SFTP and an encrypted partition on destination, there will be a short time during which the data will be unencrypted on the remote server, just before being written to disk and before arriving to the encrypted filesystem layer.
Thus this software nFreezer: the data is never decrypted on the remote server.
How do you work with this?
borg[1] has become the de facto standard for this use-case.[2]
It can run over SSH with the borg binary on the remote server or it can run in an SFTP mode with nothing installed on the destination.
As I discover it, borg seems to be pretty good for this use case indeed.
But still there is one thing (at least) that can be of interest to some people with nFreezer: it is very simple, it does the job in only 249 lines of code. You can then read the FULL source code in a few hours, to see if you trust it or not. See here: https://github.com/josephernest/nfreezer/blob/master/nfreeze...
If I want to do this with the source code of the tool you mentioned, I would have to spend at least one full week. (this is normal: this program has 100 times more features).
The key point is: if you're looking for a solution for which you don't want to trust a remote server, then you probably don't want to trust the backup tool of a random internet person either. And you probably want to read the source code of the program.
So having only < 300 lines of code to read in a single .py can be an advantage.
From https://github.com/josephernest/nfreezer/blob/master/nfreeze..., this is how you decrypt files:
[...]
with open(f2, 'wb') as f, src_cm.open(chunkid.hex(), 'rb') as g:
decrypt(g, pwd=encryptionpwd, out=f)
[...]
def decrypt(f=None, s=None, pwd=None, out=None):
[...]
while True:
block = f.read(BLOCKSIZE)
if not block:
break
out.write(cipher.decrypt(block))
try:
cipher.verify(tag)
except ValueError:
print('Incorrect key or file corrupted.')
So, basically, you decrypt the whole file and write the result before checking the tag. You're using a block cipher in a streaming fashion, and as has already been said before (see https://www.imperialviolet.org/2015/05/16/aeads.html, "AEADs with large plaintexts") this is dangerous if you don't do it correctly. Your data may be garbage, but it's too late it's already written on the disk before you know it's bad, it's not deleted and you won't know which file it was.As some HN crypto celebrity said some time ago, if you write "AES" in your code then you're wrong. You MUST use misuse-resistant libraries unless you know exactly what you're doing.
TL;DR: your crypto is broken, use NaCl instead of doing it yourself.
except ValueError:
print('Incorrect key or file corrupted.')
So if there is a tag problem, this is clearly logged, and you know something is wrong. The good thing is that you can easily edit it and have instead (pass the filename fn to the function, to be able to log): print('Incorrect key or file corrupted, will be deleted:', fn)
os.remove(fn)
exit(...)
Real question: is it possible to `.verify(tag)` before having decrypt(...) the whole file? I doubt it is possible. So an option could be to write the file in a temporary place, and then, only when tag is verified, move it to the right place. Delete it if it is not verified. Another option would be to do a first pass of decrypt(), without writing anything to disk, then get the tag, verify it, and then if ok, redo the whole decryption with writing on disk this time. The latter might be a bit too extreme and halfs the performance.> So if there is a tag problem, this is clearly logged, and you know something is wrong [...] The good thing is that you can easily edit it and have instead (pass the filename fn to the function, to be able to log)
That's the thing: the script in its current version is incorrect, and even doing that won't be a perfect solution. That's why other people are saying that other softwares, with large usage and that can do more than what nFreezer can do, should be analyzed before trying to do it your own way.
It's good to not rely on anyone else, but crypto is the one domain where you can't have "good enough" -- it's either correct, or it's not.
> Another option would be to do a first pass of decrypt(), without writing anything to disk, then get the tag, verify it, and then if ok, redo the whole decryption with writing on disk this time
Yep, that's the way: do the decrypting in memory, or in /tmp, verify the tag, and only after you can put the file where it belongs. I just checked the API of the crypto module, and there's a `decrypt_and_verify` that should do it properly.
Of course that's problematic especially for big files, so what you want to do is chunk the files, encrypt the chunks separately and store the file as a list of such chunks.
The step after is to use Content-Defined Chunking, ie chunking based on the content of the file. This way when a big file modifies only the chunk around the modification will change, the rest of the file will be chucked exactly the same way. So you don't need to store the full content of each version of the file, just a small-ish diff.
That's not a novel system, bup (https://github.com/bup/bup) kinda pioneered it... and as others have advised, restic, borg-backup and tarsnap do exactly that.
According to wikipedia, bup was released in 2010, 3 years after Tarsnap started doing this. (And Tarsnap wasn't the first either.)