Google: Gemini image generation got it wrong. We'll do better.

Google Gemini’s image generation capabilities, powered by the model Imagen 2, have been the subject of controversy over the past week. As explained by Google Senior VP Prabhakar Rabhavan: “[B]ecause our users come from all over the world, we want [Gemini] to work well for everyone. If you ask for a picture of football players, or someone walking a dog, you may want to receive a range of people. You probably don’t just want to only receive images of people of just one type of ethnicity (or any other characteristic).

However, if you prompt Gemini for images of a specific type of person — such as ‘a Black teacher in a classroom,’ or ‘a white veterinarian with a dog’ — or people in particular cultural or historical contexts, you should absolutely get a response that accurately reflects what you ask for.

So what went wrong? In short, two things. First, our tuning to ensure that Gemini showed a range of people failed to account for cases that should clearly not show a range. And second, over time, the model became way more cautious than we intended and refused to answer certain prompts entirely — wrongly interpreting some very anodyne prompts as sensitive.” You can see some of the resulting errors here.

In the wake of this performance issue, Gemini users cannot currently use the tool to create images of people (although that feature should be coming back within a few weeks). In addition, Google’s parent company Alphabet took quite a hit on the stock market yesterday, with a $90 billion selloff causing Alphabet shares to drop 4.5% in value.

Microsoft invests in Europe’s Mistral AI to expand beyond OpenAI

Just yesterday Microsoft announced a 15 million euro investment in French generative AI startup Mistral AI, which has been producing open-source conversational AI models over roughly the past year. As part of the deal with Microsoft, Mistral AI will host its models on Microsoft’s Azure cloud computing platform. The deal is also expected to drive users to Mistral AI’s new conversational AI platform, Le Chat (French for “the cat”), set to rival ChatGPT.

Stable Diffusion 3.0 debuts new diffusion transformation architecture to reinvent text-to-image gen AI

Stability AI has launched a preview of Stable Diffusion 3.0, the latest version of its text-to-image AI platform. Unlike previous versions of Stable Diffusion, version 3.0 is based on a new architecture called a diffusion transformer (previous versions of Stable Diffusion did not use transformers). One notable improvement with Stable Diffusion 3.0 is in embedding text in images, a notorious challenge for previous image generation models. Stable Diffusion 3.0 will come in range of model sizes, varying from 800 million parameters to 8 billion parameters in size.

ChatGPT Went Berserk, Giving Nonsensical Responses All Night

Last Tuesday, ChatGPT users found that the tool was behaving erratically, spouting gibberish in response to their prompts. By Wednesday, the issue had been resolved. According to an announcement by OpenAI, “An optimization to the user experience introduced a bug with how the model process language.”

Google cut a deal with Reddit for AI training data

As announced last Thursday, Google is partnering with Reddit to obtain training data for its AI models. “The collaboration will give Google access to Reddit’s data API, which delivers real-time content from Reddit’s platform.”

Tool of the week: Stable Diffusion 3.0

In the news items above we noted last week’s release of Stable Diffusion 3.0, which reportedly significantly outperforms both Midjourney and DALL-E 3. Read more about that apparent difference in performance below.

AI-generated image of the week

I (Chris Snider) was showing my web design students how to create repeating backgrounds for their websites using Midjourney. The key is just adding --tile to the end of your prompt. Here’s an example:

Prompt: bulldogs and basketballs --tile 

Generative AI tip of the week: Being Nice

A recent study suggests that being polite to your language model might improve its performance.

Text-to-2D-platformer generative AI? Yes, please!