Give Your Chat Agent a Voice — Luke Harries, ElevenLabs

Luke Harries from ElevenLabs presents Voice Engine, a solution that easily adds natural, context-aware voice capabilities to existing chat agents without requiring extensive redevelopment, enhancing accessibility and interaction across multiple channels. He emphasizes that while 2023 was the year of chat agents, the future of conversational AI lies in upgrading these agents with voice to create more natural and efficient user experiences.

In this talk, Luke Harries from ElevenLabs discusses the evolution of chat agents and the importance of giving them a voice. He highlights that 2023 was the year of chat agents, with many companies integrating chat interfaces as the primary way to interact with AI. However, he argues that while chat is useful, voice is a more natural, faster, and accessible medium that can unlock new interaction paradigms across multiple channels, such as joining Zoom calls or handling phone support.

ElevenLabs initially focused on building the best text-to-speech models but soon realized that many companies already had sophisticated chat agents built with various integrations like LLMs, retrieval-augmented generation (RAG), and tool calling. Rather than forcing companies to rebuild their agents from scratch, ElevenLabs developed a new product called Voice Engine, which acts as a first-class primitive that can easily wrap around existing chat agents to add voice capabilities without extensive redevelopment.

Voice Engine combines state-of-the-art speech-to-text and text-to-speech models, advanced turn-taking features that are context and emotion-aware, and supports thousands of voices and languages. The developer experience is a key focus, with a simple server SDK that allows developers to attach the voice engine to their existing chat agents with minimal code. Additionally, a client SDK enables easy embedding of voice widgets on websites, and the system supports telephony and SaaS integrations out of the box.

Luke demonstrates how Voice Engine can convert a generic chat support agent into a voice agent with just one prompt, which analyzes the codebase and automatically wraps the chat agent for voice interaction. This approach significantly lowers the barrier to adding voice to existing chat systems. ElevenLabs also provides well-designed UI components to help developers quickly deploy voice-enabled agents that align with popular design frameworks.

In conclusion, Luke emphasizes that the future of conversational AI lies in upgrading chat agents to voice agents. ElevenLabs offers two main solutions: the Voice Engine for wrapping existing chat agents with voice, and a full conversational agent platform for those who want an out-of-the-box solution. He invites interested developers to become design partners and collaborate on advancing voice-enabled AI agents, signaling a shift towards more natural and accessible AI interactions.