Ha, yes, I've done that at https://gigablast.com/ . The biggest problems now are the following: 1) Too hard to spider the web. Gatekeeper companies like Cloudflare (owned in part by Google) and Cloudfront make it really difficult for upstart search engines to download web pages. 2) Hardware costs are too high. It's much more expensive now to build a large index (50B+ pages) to be competitive.

I believe my algorithms are decent, but the biggest problem for Gigablast is now the index size. You do a search on Gigablast and say, well, why didn't it get this result that Google got. And that's because the index isn't big enough because I don't have the cash for the hardware. btw, I've been working on this engine for over 20 years and have coded probably 1-2M lines of code on it.

Interesting. I had some interests in building a search engine myself (for playing around ofcourse). I had read a blog post by Michael Nielson [1] which had sparked my interest. Do you have any written material about your architecture and stuff like that? Would love to read up.

[1]: https://michaelnielsen.org/ddi/how-to-crawl-a-quarter-billio...

gbmatt

there's some stuff here : https://github.com/gigablast/open-source-search-engine