> Published content will be later used to train subsequent models, and being able to distinguish AI from human input may be very valuable going forward

I find this to be a particularly interesting problem in this whole debacle.

Could we end up having AI quality trend downwards due to AI ingesting its own old outputs and reinforcing bad habits? I think it's a particular risk for text generation.

I've already run into scenarios where ChatGPT generated code that looked perfectly plausible, except for that the actual API used didn't really exist.

Now imagine a myriad fake blogs using ChatGPT under the hood to generate blog entries explaining how to solve often wanted problems, and that then being spidered and fed into ChatGPT 2.0. Such things could end up creating a downwards trend in quality, as more and more of such junk gets posted, absorbed into the model and amplified further.

I think image generation should be less vulnerable to this since all images need tagging to be useful, "ai generated" is a common tag that can be used to exclude reingesting old outputs, and also because with artwork precision doesn't matter so much. If people like the results, then it doesn't matter that much that something isn't drawn realistically.

rgrieselhuber

As someone in SEO, I've been pretty disgusted by the desire for site owners to want to use AI-generated content. There are various opinions on this, of course, but I got into SEO out of interest in the "organic web" vs. everything being driven by ads.

Love the idea of having AI-Free declarations of content as it could / should help to differentiate organic content from generated content. It would be very interesting if companies and site owners wished to self-certify their site as organic with something like an /ai-free.txt.

dale_glass

I don't see the point. There's lots of old content out there that won't get tagged, so lacking the tag doesn't mean it's AI generated. Meanwhile people abusing AI for profit (eg, generating AI driven blogs to stick ads on them) wouldn't want to tag their sites in a way that might get them ignored.

And what are the consequences for lying?

westurner

Does use of a search engine violate the "No AI" covenant with oneself?

Variation on the Turning Test: prove that it's not a human claiming to be a computer.

Modeling premises and Meta-analysis are again necessary elements for critical reasoning about Sources and Methods and superpositions of Ignorance and Malice.

jhbadger

Maybe this could encourage the recreation of the original Yahoo! (If you don't remember, Yahoo! started out not as a search engine in the Google sense but as a collection of human curated links to websites about various topics)

daveguy

I consider Wikipedia to be a massive curated set of information. It also includes a lot of references and links to additional good information / source materials. Companies try to get spin added and it's usually very well controlled. I worry that a lot of ai generated dreck will seep into Wikipedia, but I am hopeful the moderation will continue to function well.

westurner

List of Web directories: https://en.wikipedia.org/wiki/List_of_web_directories ; DMOZ FTW

Distributed Version Control > Work model > Pull Request: https://en.wikipedia.org/wiki/Distributed_version_control#Pu...

sindresorhus/awesome: https://github.com/sindresorhus/awesome#contents

bayandin/awesome-awesomeness: https://github.com/bayandin/awesome-awesomeness

"Help compare Comment and Annotation services: moderation, spam, notifications, configurability" https://github.com/executablebooks/meta/discussions/102

Re: fact checks, schema.org/ClaimReview, W3C Verifiable Claims, W3C Verifiable News & Epistemology: https://news.ycombinator.com/item?id=15529140

W3C Web Annotations could contain (cryptographically-signed (optionally with a W3C DID)) Verifiable Claims; comments with signed Linked Data