Now, if only they could solve the problem where the text they index does not match the text my browser displays.

I've seen a few searches recently (on DDG) where the indexed text is apparently hidden behind an authentication wall.

Google's cached page feature used to fix those sorts of shenanigans.

Ah the good ol "whitelist the google bot IPs but pop-up paywalls for everyone else" trick. It'd be cool if google randomized their crawler's IPs and made their bot look like just another user, then we'd all get the same viewing experience.

chrismarlow9

Pretty old grey hat SEO tactic called cloaking. It has evolved from using basic js with obfuscation to redirect users to more sophisticated tactics where users are lured in via serps only to tell them they must sign in.

This used to be a plague in the serps for technical error searches and Pinterest in the image searches.

The problem with the random ip proposal is that it is a cat and mouse game. If I own 1000 sites on various topics and cross reference visits via IP, well now it's not very random anymore. At one point there was even a services (fantomas) that kept an updated database.

This goes very deep in black hat SEO history with mosaic cloaking and other tactics.

I used this as a teen in the earlier days of Google to make decent pocket change, and that was in the early 2000s.

It goes even deeper when you get into parasite hosting tactics with paid backlinks and xrumer.

What's funny is all of these old school black hat seos have moved on to social media. PVA social accounts combined with AI rewriting are where the magic is now and don't even get me started on repurposed content via YouTube. It's no longer economical to tackle organic serps.

Heck I even have a bot that scans popular linked domains and monitors the expiration date on them for quick snatch with a nice grafana ui on it. And I've got quite a few nice domains with easy passive traffic and zero effort. All you do is restore the latest archive.org snapshots and setup a catch all email for leads.

Dall e and gpt 3 are going to turn these into legit businesses.

My point is if there is money to be made people will automate to the extreme. If you dig deep you'll realize an IP address doesn't mean much, hence recaptcha.

stef25

> Heck I even have a bot that scans popular linked domains and monitors the expiration date on them for quick snatch with a nice grafana ui on it. And I've got quite a few nice domains with easy passive traffic and zero effort. All you do is restore the latest archive.org snapshots and setup a catch all email for leads.

That's crazy. How does restoring an archive.org snapshot work, you just dump static html on the domain?

Are those mostly just content sites?

chrismarlow9

There are a few out there ready to go. Here's one

https://github.com/hartator/wayback-machine-downloader

You just dump and sync to s3 and use terraform to provision the route53 and bucket setups.

Yes they are mostly content sites. The hardest part is filtering adult domains assuming you don't want them. There are a staggering number of adult domains that expire every year and get huge traffic.