Google recently introduced its Gemini AI model and shared a video titled "Hands-on with Gemini: Interacting with multimodal AI." The demo, which showcased Gemini's multimodal capabilities, was revealed to be misleading.
The video made it look like the AI could understand and respond to different types of commands, like voice and pictures, in real time. But it turns out that the video was not a live demonstration. Instead, it was carefully made using text prompts and still images.
The demonstration was faked. The video was edited to make it seem like the AI was responding in real-time to people and things, but it wasn't. The demonstration did not involve genuine voice interaction or live responses to visual stimuli. Instead, Google used still image frames from the footage and prompted Gemini via text, resulting in a portrayal that diverged from the actual functionality of the AI.
For instance, the video showed the model quickly recognizing hand gestures as part of a game of Rock, Paper, Scissors. However, it was revealed that the actual prompt required showing all three gestures at once and providing a hint, suggesting a more manufactured interaction than initially depicted. Similarly, other interactions depicted in the video were not as spontaneous or intuitive as implied.
Google's explanation that the video "shows real outputs from Gemini" has been met with skepticism, as it becomes clear that the staged interactions do not accurately represent the model's genuine capabilities. The controversy surrounding the misrepresented demo has led to questions about the transparency and honesty of AI presentations from Google and potentially the broader AI industry.
Later on social media, the VP of Research at Google DeepMind explained that the video was made to inspire developers. Still, the difference between the made-up demos and what Gemini can really do has made people question if Google can be trusted.
Really happy to see the interest around our “Hands-on with Gemini” video. In our developer blog yesterday, we broke down how Gemini was used to create it. https://t.co/50gjMkaVc0— Oriol Vinyals (@OriolVinyalsML) December 7, 2023
We gave Gemini sequences of different modalities — image and text in this case — and had it respond… pic.twitter.com/Beba5M5dHP