The main problem with FTS isn't the search indexing component it's actually the HTML content parser.

There are TONS of projects like Elasticsearch or just raw Lucene that will allow you to parse text and index it.

HTML? Not so much...

There are just to many problems

Text ads polluting the extracted text is by far the main issue but there are other issues as well including OCR of images, AJAX paginated pages, lazy loaded images that might need OCR, metadata extraction (when was the page published, who was the author, etc).

There are some projects that take this on but Google just does an amazing job and these secondary tools are pretty limited by comparison.

95% accuracy doesn't help because that 5% usually ends up being 100% of your false positives.

I've had a lot of success by running HTML pages through mozilla's readability[0] tool (actually the go port of it[1]) before indexing it.

[0]: https://github.com/mozilla/readability

[1]: https://github.com/go-shiori/go-readability