It's unfortunate that this guy was harassed for releasing these uncensored models. It's pretty ironic, for people who are supposedly so concerned about "alignment" and "morality" to threaten others.

"Alignment", as used by most grifters on this train, is a crock of shit. You only need to get so far as the stochastic parrots paper, and there it is in plain language. "reifies older, less-inclusive 'understandings'", "value lock", etc. Whose understandings? Whose values?

Maybe they should focus on real problems that will result from these technologies instead of some science fiction thought experiments about language models turning the solar system into paperclips, and perhaps less about how the output of some predictions might hurt some feelings.

What's also very unfortunate is overloading the term "alignment" with a different meaning, which generates a lot of confusion in AI conversations.

The "alignment" talked about here is just usual petty human bickering. How to make the AI not swear, not enable stupidity, not enable political wrongthing while promoting political rightthing, etc. Maybe important to us day-to-day, but mostly inconsequential.

Before LLMs and ChatGPT exploded in popularity and got everyone opining on them, "alignment" meant something else. It meant how to make an AI that doesn't talk us into letting it take over our infrastructure, or secretly bootstrap nanotechnology[0] to use for its own goals, which may include strip-mining the planet and disassembling humans. These kinds of things. Even lower on the doom-scale, it meant training an AI that wouldn't creatively misinterpret our ideas in ways that lead to death and suffering, simply because it wasn't able to correctly process or value these concepts and how they work for us.

There is some overlap between the two uses of this term, but it isn't that big. If anything, it's the attitudes that start to worry me. I'm all for open source and uncensored models at this point, but there's no clear boundary for when it stops being about "anyone should be able to use their car or knife like they see fit", and becomes "anyone should be able to use their vials of highly virulent pathogens[1] like they see fit".

----

[0] - The go-to example of Eliezer is AI hacking some funny Internet money, using it to mail-order some synthesized proteins from a few biotech labs, delivered to a poor schmuck who it'll pay for mixing together the contents of the random vials that came in the mail... bootstrapping a multi-step process that ends up with generic nanotech under control of the AI.

I used to be of two minds about this example - it both seemed totally plausible and pure sci-fi fever dream. Recent news of people successfully applying transformer models to protein synthesis tasks, with at least one recent case speculating the model is learning some hitherto unknown patterns of the problem space, much like LLMs are learning to understand concepts from natural language... well, all that makes me lean towards "totally plausible", as we might be close to an AI model that understands proteins much better than we do.

[1] - I've seen people compare strong AIs to off-the-shelf pocket nuclear weapons, but that's a bad take, IMO. Pocket off-the-shelf bioweapon is better, as it captures the indefinite range of spread an AI on the loose would have.

I agree with you that "AI safety" (let's call it bickering) and "alignment" should be separate. But I can't stomach the thought experiments. First of all, it takes a human being to guide these models, to host (or pay for the hosting) and instantiate them. They're not autonomous. They won't be autonomous. The human being behind them is responsible.

As far as the idea of "hacking some funny Internet money, using it to mail-order some synthesized proteins from a few biotech labs, delivered to a poor schmuck who it'll pay for mixing together the contents of the random vials that came in the mail... bootstrapping a multi-step process that ends up with generic nanotech under control of the AI.":

Language models, let's use GPT-4, can't even use a web browser without tripping over itself. My web browser setup, which I've modified to use the chrome visual assistance over the debug bridge now, if you so much as increase the pixels of the viewport by 100 or so, the model is utterly perplexed because it's lost its context. Arguably, that's an argument from context, which is slowly being made irrelevant with even local LLMs (https://www.mosaicml.com/blog/mpt-7b). It has no understanding, it'll use an "[email protected]" to try and login to websites, because it believes that this is its email address. It has no understanding that it needs to go register for email. Prompting it with some email access and telling it about its email address just papers over the fact that the model has no real understanding across general tasks. There may be some nuggets of understanding in there that it has gleaned for specific task from the corpus, but AGI is a laughable concern. These are trained to minimize loss on a dataset and produce plausible outputs. It's the Chinese room, for real.

It still remains that these are just text predictions, and you need a human to guide them towards that. There's not going to be autonomous machiavellian rogue AIs running amok, let alone language models. There's always a human being behind that.

As far as multi-modal models and such, I'm not sure, but I do know for sure that these language models don't have general understanding, as much as Microsoft and OpenAI and such would like them to. The real harm will be deploying these to users when they can't solve the prompt injection problem. The prompt injection thread here a few days ago was filled with a sad state of "engineers", probably those who've deployed this crap in their applications, just outright ignoring the problem or just saying it can be solved with "delimiters".

AI "safety" companies springing up who can't even stop the LLM from divulging a password it was supposed to guard. I broke the last level in that game with like six characters and a question mark. That's the real harm. That, and the use of machine learning in the real world for surveillance and prosecution and other harms. Not science fiction stories.

>They're not autonomous. They won't be autonomous.

https://github.com/Significant-Gravitas/Auto-GPT

>This program, driven by GPT-4, chains together LLM "thoughts", to autonomously achieve whatever goal you set. As one of the first examples of GPT-4 running fully autonomously, Auto-GPT pushes the boundaries of what is possible with AI.

>As an autonomous experiment, Auto-GPT may generate content or take actions that are not in line with real-world business practices or legal requirements. It is your responsibility to ensure that any actions or decisions made based on the output of this software comply with all applicable laws, regulations, and ethical standards. The developers and contributors of this project shall not be held responsible for any consequences arising from the use of this software.