What does HackerNews think of warc-proxy?

Serving content from a WARC

Language: Python

I thought that mitmproxy did this, but cursory searches didn't show anything; that said, their actual format[1] has even more fidelity (I'd guess it's comparable to wireshark)

One should be aware that WARC is great for preservation, but getting content back out of it would require specialized tooling ala: https://github.com/alard/warc-proxy

1: https://github.com/mitmproxy/mitmproxy/blob/9.0.1/mitmproxy/...

You can probably make your service do both screenshots and WARC, instead of loading a site directly, load it through WARC Proxy (https://github.com/odie5533/WarcProxy), that will write out a WARC file and you can still store your screenshot.

Once you have the WARCs you can upload them to Archive.org and they can be added to the wayback, or you can set up your own service for browsing them, built off something like warc-proxy https://github.com/alard/warc-proxy (Yeah, same name different purpose...)

There is also a MITM version of WARCProxy that will let you store HTTPS sites: https://github.com/odie5533/WarcMITMProxy

Just to second what donpdonp said (https://news.ycombinator.com/item?id=6509604), I think a service like this needs to offer a standard format WARC (http://archive-access.sourceforge.net/warc/) download.

The whole point of a service like this is long-term access and that really requires a data checkout option which can be used with other tools (e.g. https://github.com/alard/warc-proxy).