What does HackerNews think of wikipedia_ql?
Query language for efficient data extraction from Wikipedia
Language:
Python
You might be interested in https://github.com/zverok/wikipedia_ql
Or use something like https://github.com/zverok/wikipedia_ql that uses Mediawiki API
I think this project is not getting enough attention: https://github.com/zverok/wikipedia_ql
It allows to query Wikipedia (not wikidata, but the actual human-readable text) more or less directly, mixing the way you describe a scraper with some nicer higher-level constructs.
Can't vouch for its performance, but the API is interesting and nice.