Dark Mode Light Mode

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Follow Us
Follow Us
Login Login

Gemini Live could benefit from further practice.

Gemini Live Phone Gemini Live Phone

Why bother engaging with a chatbot that lacks reliability and personality?

I’ve been pondering that question ever since I started experimenting with Gemini Live, Google’s version of OpenAI’s Advanced Voice Mode, last week. Gemini Live aims to provide a chatbot experience that is highly engaging, with realistic voices and the ability for users to interrupt the bot at any point.

Gemini Live is designed to be user-friendly and promote interactive, natural conversations, according to Sissie Hsiao, GM for Gemini experiences at Google, as stated in an interview with Eltrys in May. It can offer information in a more concise and conversational manner compared to traditional text-based interactions. It is important for an AI assistant to possess the capability to tackle intricate issues. and also have a seamless and effortless flow when you interact with it.

Advertisement

Having extensively used Gemini Live, I can confidently say that it offers a much smoother and more intuitive experience compared to Google’s previous endeavours in AI-powered voice interactions (such as Google Assistant). However, it fails to tackle the issues with the underlying technology, such as hallucinations and inconsistencies, and even brings about a few additional problems.

The uncertain valley


Gemini Live is a sophisticated text-to-speech engine that seamlessly integrates with Google’s advanced generative AI models, Gemini 1.5 Pro and 1.5 Flash. Generating text that can be spoken aloud, the models provide a convenient running transcript of conversations accessible through the Gemini Live UI in the Gemini app on Android (and soon the Google app on iOS).

For the Gemini Live voice on my Pixel 8a, I selected Ursa, a voice that Google describes as being in the mid-range and highly engaged. (It seemed as if a younger woman was speaking.) The company claims to have collaborated with experienced actors to create Gemini Live’s 10 voices, and the results are evident. Ursa truly surpasses many of Google’s older synthetic voices in terms of expressiveness, especially when compared to the default Google Assistant voice.

However, Ursa and the other Gemini Live voices manage to maintain a detached tone that avoids any unsettling or artificial vibes. It seems unclear if this was done on purpose; users also lack the ability to modify the pitch, timbre, or tone of any of its voices or even control the speed at which the voice speaks, which puts it at a clear disadvantage compared to Advanced Voice Mode.

There are no traces of laughter, breathing, or shouting in Gemini Live’s Advanced Voice Mode. Additionally, you won’t find any hesitations or disfluencies like “ahs” or “uhms”. The chatbot maintains a neutral tone, appearing as a courteous but indifferent assistant, as if Live has numerous conversations to manage and cannot devote specific attention to yours.

Having a conversation with Ursa

When Google introduced Gemini Live at its I/O developer conference in May, it hinted that the feature could be valuable for preparing for job interviews. So I thought I’d give that a try first.

I informed Gemini Live that I was interested in a tech journalism position, opting for a straightforward approach that aligns with my expertise. The bot enquired about my preferred role in journalism, such as investigative or breaking news reporting, and the reasons behind my choice. It also posed a mix of general questions like “Can you tell me a little about yourself?” and more tailored ones like “What do you enjoy most about tech journalism?”

Gemini Live Ursa Voice
Image credit: Google

I provided a few sentences per question, nothing excessive, and requested feedback from Gemini Live. The chatbot was extremely generous with its praise. “Based on our practice, it seems like you possess a solid understanding of the difficulties and benefits of tech journalism,” it said.

The positive review appeared questionable to me, considering that my responses were completely spontaneous. So, I decided to play a little trick on Gemini Live, pretending that I had only provided one-word answers (which I hadn’t) and questioning the bot’s confidence in its critique.

I fell for it. “That’s correct,” Gemini Live responded, providing mostly concise answers. Could you please provide me with more information about why you believe you only provided one-word responses?

This type of manipulation, if you may, was a recurring theme in my conversations with Gemini Live. The model, similar to other generative AI models, frequently generates content that may not be entirely accurate. And that creates a sense of uncertainty.

Unusual actions

Gemini Live has an impressive ability to recall past conversations within the same chat session. If you ask it to retell a story it shared yesterday, there’s a good chance it will remember and recount it for you. However, when it comes to questions about people, places, and things—questions that you would expect to be easier since Gemini Live has access to Google Search—the bot’s tendency to provide inaccurate information becomes apparent.

On a Friday evening, I reached out to Gemini Live for recommendations on affordable activities in New York City. I suggested the well-known nightclub Le Bain (which is strangely pronounced as “Le Boulebouleboulebou”), a rooftop bar known as The Skylark, and another club, Output, located in Williamsburg.

Impressive!, I thought, before I conducted some online research.

According to Gemini Live, The Skylark’s happy hour was said to be exceptional. However, I was unable to locate any information about it on the bar’s website. Unfortunately, the business closed its doors for good in 2019. Le Bain was not a bad suggestion (it rarely charges cover); however, the queue to get in can often be an hour or more.

Indeed, you are correct! Gemini acknowledged that Output had closed its doors when I pointed out the correction. Thank you for pointing out that outdated information.

I decided to try something different and asked Gemini Live to recommend a game for me to play.

“I have a challenge for you,” suggested the chatbot. “I will give you a word, and your task is to create a new word using only the letters from the original word.” That seemed intriguing, until Gemini Live attempted to persuade me that the term “quiet” could be formed using the letters in “cloud.”

Let’s try something different. Feeling a bit uninterested, I enquired with Gemini Live about its intriguing perspectives.

“The growing emphasis on mental health awareness is having unintended negative consequences,” Gemini responded. By pathologising typical human emotions and experiences, we are fostering a generation that heavily relies on medication and therapy rather than promoting resilience and healthy coping mechanisms. Wow!

Gemini Live responded by stating that the statement was intended to stimulate discussion rather than provide a comprehensive analysis of a complex issue. The growing emphasis on mental health awareness is definitely a step in the right direction.

Indecisive

Gemini Live’s contrasting perspectives on mental health highlight the frustrating lack of specificity in the bot’s responses. Even though the responses may seem factual, they are so generic that they lack practicality.

Consider, for instance, the feedback I received during a recent job interview. Gemini Live suggested that I prioritise my interview preparation and work on articulating my enthusiasm for the industry. However, despite my request for more specific and detailed notes with references to my answers, Gemini remained committed to providing generic advice that one might typically hear at a college career fair, such as “elaborate on your thoughts” and “spin challenges into positives.”

When it came to questions about current events such as the ongoing war in Gaza and the recent Google Search antitrust decision, I found Gemini Live to be mostly accurate, although it did tend to be verbose and use too many words. The responses were excessively long, resembling a lecture, and I had to interrupt the bot to prevent it from rambling on. And so it continues. And so it continues.

Gemini Live completely ignored certain content, much to my dismay. The bot interrupted me as I was reading Congresswoman Nancy Pelosi’s criticism of California’s proposed AI bill, SB 1047, and said that it couldn’t comment on elections or political figures. (Gemini Live doesn’t appear to be threatening the jobs of political speechwriters just yet.)

I had no hesitation in interrupting Gemini. However, when it comes to the topic, I believe there is room for improvement in making the act of interjecting in conversations feel more natural. Currently, Gemini Live lowers its volume but remains engaged in conversation whenever it detects someone potentially speaking. It can be quite disorienting when your thoughts get jumbled up due to Gemini’s constant chatter. It becomes particularly frustrating when there are glitches, such as Gemini picking up background noise.

Seeking a sense of direction

I cannot overlook the numerous technical issues that Gemini Live has.

Getting it to function initially was quite a task. Gemini Live only started working for me once I followed the steps mentioned in this Reddit thread. These steps could have been more user-friendly and, ideally, should not have been required in the first place.

During our conversations, Gemini Live’s voice would unexpectedly cut out a few words into a response. Requesting the chatbot to reiterate proved somewhat helpful, although it often required multiple attempts before the complete answer was provided. There were instances when Gemini Live didn’t seem to register my response right away. I had to repeatedly tap the “Pause” button in the Gemini Live UI to get the bot to acknowledge my input.

This is more of an oversight than a bug, but it’s worth mentioning that Gemini Live currently lacks support for several integrations that are available in Google’s text-based Gemini chatbot (although this may change in the future). Unfortunately, it is not possible to request it to summarise emails in your Gmail inbox or create a playlist on YouTube Music.

We are left with a basic bot that cannot be relied upon to accurately handle tasks and, to be honest, is a rather dull conversational companion.

Having used Gemini Live for a few days, I’m uncertain about its usefulness, especially since it’s only available with Google’s $20-per-month Google One AI Premium Plan. Maybe the true usefulness will be revealed when Live is able to analyse images and real-time video, a feature that Google plans to introduce in a future update.

However, this iteration gives off a sense of being a preliminary model. Without the added expressiveness of Advanced Voice Mode (although some argue whether that expressiveness is beneficial), there isn’t much incentive to choose Gemini Live over the text-based Gemini experience. Actually, I would contend that the text-based Gemini is currently more valuable. And that doesn’t make Live look good at all.

Gemini Live didn’t seem to appreciate my work either.

The bot responded by pointing out that it had been challenged without any additional context or explanation. Your responses were often concise and lacked further explanation. Additionally, you frequently changed the topic abruptly, which made it challenging to maintain a coherent conversation.

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Add a comment Add a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post
Zt-Systems-Logo

AMD buys infrastructure player ZT Systems for $4.9 billion to boost the AI ecosystem.

Next Post
Defcon Ai Logistics

Defcon AI closes $44 million in seed money to address "maximum complexity" in military logistics.

Advertisement