Is there a dataset of HN titles? This made me want to fiddle with this, but step one is to get the data, and I don't want to crawl HN if the data has already been collected.

There are a few sources. There's the official API [0], the Algolia search API [1], and the BigQuery dataset which is pretty up to date [2].

I used the Algolia search API, it has extremely generous rate limits and page limits.

[0]: https://github.com/HackerNews/API [1]: https://hn.algolia.com/api [2]: https://console.cloud.google.com/bigquery?p=bigquery-public-...