How do people archive sites 1:1 using tools like Playwright? I’ve tried to screenshot things (looks weird) and pull the page content (have problems viewing articles like medium).

It's ultimately a cat and mouse game with many websites actively trying to sabotage archival efforts.

Unless this is a core functionality in something you're working on, most people will be better off using the SavePageNow API from archive.org and integrating with that. This is what I ultimately ended up doing for one of my projects.[1]

[1]: https://lgug2z.com/articles/notado-07-2023-update/

Website owners can request their site be blacklisted for archival, so this doesn't work for all websites.

Yeah this is true, but it works for enough websites to be a meaningful option, and it is almost always going to work better than something you have home-rolled (unless your core product is a direct competitor or something, in which case all bets are off ;))

A good is example of this is Pinboard which claims to offer website archiving. A friend has over 100,000 links saved (with an archival account) there. When we spent a few minutes looking at the archive links for those items a few weeks ago, we couldn't find a single correct, working, accessible archive from the most recently archived links (listed as archived 5 weeks ago, so also not up to date).

I wonder if Firefox "reader mode as a utility" might be a viable alternative for Pinboard like "content oriented" archiving?

https://github.com/mozilla/readability