"or fun, I tried “de-archiving” an old colleague’s blog from The Wayback Machine. Getting the first few pages was easy, but getting the whole thing, and with quality/precision, was very hard."
Any good tool to extract a website from archive.org?
https://github.com/hartator/wayback-machine-downloader
(I discuss it on my https://www.gwern.net/Search tutorial and use it every once in while eg to make my mirror of 'Climb Mount Improbable' https://www.gwern.net/docs/genetics/selection/www.mountimpro... or 'Hard Truths From Soft Cats' https://www.gwern.net/images/hardtruthsfromsoftcats.tumblr.c... )