On this note - anyone familiar with a similar implementation, preferably one like Outline.com that copies the whole content, but in a local.. none web-based format?

I'm writing a Rust library that for Archival purposes I want to immutably refer to source content. So in short, I need to download it and store it immutably. Yet, don't want to grab all the html, UI images, ads, etc - I just want the content. I've found Outline.com amazing, but the tool I'm writing is "distributed", so I don't want to depend on a service.

Anyone familiar with local tooling for these types of services? TLDR, Outline, etc?

There is a Python library called Newspaper that is designed to do that. I believe this is what outline.com uses.

There is also a JS library called readability which is what is used by Firefox's reader mode.

https://newspaper.readthedocs.io/en/latest/

https://github.com/mozilla/readability