What does HackerNews think of wayback-machine-downloader?

The “faces of Amazon” ex-employee stories site has mysteriously vanished | Jul 2023

Came here to say this. My guess is that Amazon paid them to go away. If my guess is accurate ( I could certainly be wrong ), then Amazon could have them add a robots.txt banning archive.org. If they do that access to the archive will be removed. Mirror it now if you want the content.

One nice way to do so ( handy for any site that you think may vanish off Way Back Machine ): https://github.com/hartator/wayback-machine-downloader

We're improving search results when you use quotes | Aug 2022

Expand Context ↕

There are a few out there ready to go. Here's one

https://github.com/hartator/wayback-machine-downloader

You just dump and sync to s3 and use terraform to provision the route53 and bucket setups.

Yes they are mostly content sites. The hardest part is filtering adult domains assuming you don't want them. There are a staggering number of adult domains that expire every year and get huge traffic.

Ask HN: Does anyone have an archive of quirky.com? | Jul 2021

Some of it is on archive.org:

https://web.archive.org/web/20140607001646/https://www.quirk...

You can use this tool to download the files for the site:

https://github.com/hartator/wayback-machine-downloader/

Blog with Markdown and Git, and degrade gracefully through time | Feb 2021

Expand Context ↕

https://github.com/hartator/wayback-machine-downloader

(I discuss it on my https://www.gwern.net/Search tutorial and use it every once in while eg to make my mirror of 'Climb Mount Improbable' https://www.gwern.net/docs/genetics/selection/www.mountimpro... or 'Hard Truths From Soft Cats' https://www.gwern.net/images/hardtruthsfromsoftcats.tumblr.c... )

Verelox Wiped by Ex-Admin | Jun 2017

Expand Context ↕

They should be able to use https://github.com/hartator/wayback-machine-downloader and get at least a static version of the website back online.

Internet Archaeology: Scraping time series data from Archive.org | Apr 2017

Really cool, congrats!

I have built something similar, but to retrieve a backup for one of my dead websites. It was a fun project.

Shameless plug: https://github.com/hartator/wayback-machine-downloader/

The c2 wiki was down | Oct 2016

  wayback_machine_downloader www.c2.com -c 20

Ref: https://github.com/hartator/wayback-machine-downloader

Offer HN: Free logo design for an open source project | Aug 2016

We don't have a logo yet!

Download an entire website from the Internet Archive Wayback Machine.

https://github.com/hartator/wayback-machine-downloader

Waybackpack: download the entire Wayback Machine archive for a given URL | May 2016

Ha fun, I've made a similar tool not so long ago: https://github.com/hartator/wayback-machine-downloader/

Waybackpack: download the entire Wayback Machine archive for a given URL | May 2016

I've been using this with great success too; https://github.com/hartator/wayback-machine-downloader