They didn't mention gpt-4-32k. Does anybody know if it will be generally available in the same timeframe?

There's still no news about the multi-modal gpt-4. I guess the image input is just too expensive to run or it's actually not as great as they hyped it.

>I guess the image input is just too expensive to run or it's actually not as great as they hyped it.

We already know they have a SOTA model that can turn images into latent space vectors without being some insane resource hog - in fact, they give it away to competitors like Stability. [0]

My guess is a limited set of people are using the GPT-4 with CLIP hybrid, but those use-cases are mostly trying to decipher pictures of text (which it would be very bad at), so they're working on that (or other use-case problems).

[0]https://github.com/openai/CLIP