Is there a method to detect a specific word and tell me the timestamp throughout an audio sample easily? I've been trying to implement something like this but wasn't sure how to approach it.

If you already have the transcript without timestamps (e.g. for an audiobook where you know the source text), you could use https://github.com/readbeyond/aeneas , which infers the timestamps by aligning text-to-speech output with the audio using dynamic time warping.

If you don't have the transcript, you'd use a transcription service that also gives you timestamps. E.g. there was a frontpage submission yesterday where someone used AWS Transcription to count the number of words in each minute of a talk: https://news.ycombinator.com/item?id=21635939