Seems like a lot of interesting sites recognise that I am scraping and block access with a capture. I've no idea how to get around that.

How are you doing it? Automating an actual browser using the developer tools is probably the most under-the-radar way, and also quite nice to work with.

See: https://github.com/jawj/web-scraping-for-researchers