This German NGO is creating a public voice assistant.

There have been various open-source AI-powered voice assistants (Rhasspy, Mycroft, and Jasper) that aim to provide privacy-preserving, offline experiences without compromising functionality. However, progress has been quite sluggish. Programming with assistance is difficult, in addition to the open-source project hurdles. Google Assistant, Siri, and Alexa have decades of R&D and massive infrastructure.

However, the Large-scale Artificial Intelligence Open Network (LAION), a German organization that maintains popular AI training data sets, is undeterred. LAION launched BUD-E last month to provide a “fully open” voice assistant for consumer products.

Why start a new voice assistant project when many are abandoned? Wieland Brendel, an Ellis Institute fellow and BUD-E developer, feels there is no open assistant with an extensible architecture that can fully use GenAI technologies, notably large language models (LLMs) like OpenAI’s ChatGPT.

Brendel told Eltrys in an email interview that most assistant engagements use clunky chat interfaces and seem awkward and artificial. “Those systems are fine for controlling music or lights, but not for long, engaging conversations. BUD-E aims to create a voice assistant that seems more natural to people, mimics human speech patterns, and recalls prior talks.

Brendel stated that LAION intends to guarantee that every BUD-E component may be licensed-free, even commercially, unlike previous open assistant projects.

BUD-E—“Buddy for Understanding and Digital Empathy”—has an ambitious agenda with the Ellis Institute in Tübingen, IT consultant Collabora, and the Tübingen AI Center. A blog post by the LAION team outlines their goals for the next several months, including adding “emotional intelligence” to BUD-E and making it capable of multi-speaker interactions.

“A well-working natural voice assistant is needed,” Brendel stated. LAION has shown to foster communities, and the ELLIS Institute Tübingen and Tübingen AI Center are dedicated to funding the assistance.

BUD-E may be downloaded and installed from GitHub on Ubuntu or Windows PCs (macOS is coming), but it’s still under development.

LAION used Microsoft’s Phi-2 LLM, Columbia’s StyleTTS2, and Nvidia’s FastConformer to create an MVP. The experience is unoptimized. A powerful GPU like Nvidia’s RTX 4090 is needed for BUD-E to reply to orders in 500 milliseconds, like Google Assistant and Alexa.

Collabora is adapting WhisperLive and WhisperSpeech for BUD-E for free.

Jakub Piotr Cłapa, an AI researcher at Collabora and BUD-E project member, said that building text-to-speech and voice recognition systems allows for more customization than using closed models available via APIs. “Collabora began working on open assistants because we couldn’t find a viable text-to-speech solution for an LLM-based voice agent for a client. We joined together with the open source community to make our models more accessible and useful.”

LAION plans to minimize BUD-E’s hardware needs and assistant latency in the near future. Building a collection of dialogs to fine-tune BUD-E, a memory mechanism to preserve prior discussions, and a speech processing pipeline to monitor many individuals conversing is a longer-term project.

I questioned the developers about whether accessibility was a concern, as voice recognition algorithms have struggled with non-English and non-Transatlantic dialects. According to Stanford research, Amazon, IBM, Google, Microsoft, and Apple voice recognition algorithms were nearly twice as likely to mishear black speakers as white speakers of the same age and gender.

Brendel said LAION isn’t neglecting accessibility, but BUD-E doesn’t prioritize it.

Brendel added, “The first focus is on really redefining the experience of how we interact with voice assistants before generalizing that experience to more diverse accents and languages.

LAION has some wild ideas for BUD-E, including an animated avatar, personifying the assistant, and webcam-based emotional analysis.

Obviously, face-analysis ethics are questionable. Co-founder Robert Kaczmarczyk said LAION would prioritize safety.

“[We] adhere strictly to the safety and ethical guidelines formulated by the EU AI Act,” he emailed Eltrys. The EU AI Act regulates AI sales and usage. The EU AI Act lets EU governments ban “high-risk” AI, including emotion classifiers.

“This commitment to transparency not only facilitates the early identification and correction of potential biases but also aids scientific integrity,” Kaczmarczyk said. By sharing our data sets, we allow the scientific community to conduct high-reproducibility research.

LAION is working on a contentious emotion-detecting technology and has a questionable ethical history. Maybe BUD-E will be different—we’ll see.

Juliet P.
Author: Juliet P.

Share this article
0
Share
Shareable URL
Prev Post

OpenAI has no ‘GPT’ trademark.

Next Post

How Neara protects utilities from harsh weather using AI

Leave a Reply

Your email address will not be published. Required fields are marked *

Read next
Subscribe to our newsletter
Get notified of the best deals on our WordPress themes.