I've been looking for an AI/GPT/deep learning tool that would help me perform some sanitation and normalization of a large data set that's quite personal to me- my last.fm data, time-stamped logs of (nearly) every song I've listened to for almost twenty years now. The data has all kinds of issues- for example, yesterday I realized that I had two sets of logs for one album. One version of the album used U+2026 (…) and one used three periods (...). There are problems like that, stuff more akin to typos, styling stuff (& vs and), or even garbage-in garbage-out stuff (YouTube Music changing the tags on the same album over time making it look like I actually listened to different albums, or not actually having all of the tags they're supposed to have).
I've got .NET code that hits the last.fm api and dumps the info to a LiteDB database, so I can export to CSV pretty easily if this tool would be useful to me, unless anyone has any better directions to point me in. Appreciate any thoughts you folks have.