What does HackerNews think of wayback-machine-downloader?
Download an entire website from the Wayback Machine.
One nice way to do so ( handy for any site that you think may vanish off Way Back Machine ): https://github.com/hartator/wayback-machine-downloader
https://github.com/hartator/wayback-machine-downloader
You just dump and sync to s3 and use terraform to provision the route53 and bucket setups.
Yes they are mostly content sites. The hardest part is filtering adult domains assuming you don't want them. There are a staggering number of adult domains that expire every year and get huge traffic.
https://web.archive.org/web/20140607001646/https://www.quirk...
You can use this tool to download the files for the site:
(I discuss it on my https://www.gwern.net/Search tutorial and use it every once in while eg to make my mirror of 'Climb Mount Improbable' https://www.gwern.net/docs/genetics/selection/www.mountimpro... or 'Hard Truths From Soft Cats' https://www.gwern.net/images/hardtruthsfromsoftcats.tumblr.c... )
I have built something similar, but to retrieve a backup for one of my dead websites. It was a fun project.
Shameless plug: https://github.com/hartator/wayback-machine-downloader/
wayback_machine_downloader www.c2.com -c 20
Ref: https://github.com/hartator/wayback-machine-downloaderDownload an entire website from the Internet Archive Wayback Machine.