The company’s blog posts are full of the enthusiasm of American television advertising in the 1990s. WellSaid Labs described what customers can expect from its “eight new digital voice actors”! Tobin is “energetic and insightful.” Page is “come, calm and expressive.” Ava is “elegant, confident and professional.”

Each is based on a real voice actor whose portrait (with permission) has been saved using AI. Companies can now authorize these voices to say anything they need. They just input some text into the speech engine and output a clear audio clip with natural sound.

How to say the laboratoryIt is a Seattle-based start-up company, spun off from the research non-profit organization Allen Institute of Artificial Intelligence, and is the latest company to provide artificial intelligence voice to customers. Currently, it specializes in providing sound for corporate e-learning videos.Other startups are Digital assistant, Call center operator, Even Video game character.

Not long ago, this deeply forged voice was notorious for its use in certain areas. Scam call with Cyber ​​scamBut since then, their ever-improving quality has aroused the interest of more and more companies. The latest breakthroughs in deep learning have made it possible to replicate many of the subtleties of human speech. These sounds pause and breathe in all the right places. They can change their style or emotions. If they talk for too long, you can discover the trick, but in the short audio clips, some have become indistinguishable from humans.

AI voice is also cheap, scalable, and easy to use. Unlike the recordings of human voice actors, synthesized voices can also update their scripts in real time, opening up new opportunities for personalized advertising.

But the rise of surrealist false voices is not without consequences. Especially human voice actors have always wanted to know what this means for their livelihoods.

How to pretend to be a voice

Synthetic sounds have been around for some time.But those old ones, including the original sound Siri with Alexa, Just glue words and sounds together to achieve clumsy robot effects. Making them sound more natural is a laborious manual task.

Deep learning has changed this. Speech developers no longer need to specify the exact pace, pronunciation, or intonation of the generated speech. Instead, they can feed several hours of audio into the algorithm and let the algorithm learn these patterns on its own.

“If I were Pizza Hut, I certainly wouldn’t sound like Domino’s, and I certainly wouldn’t be like Papa John’s.”

Rupal Patel, founder and CEO of VocaliD

Over the years, researchers have used this basic idea to build increasingly complex speech engines. For example, one built by WellSaid Labs uses two main deep learning models. The first prediction is to predict the general style of the speaker’s sound from a piece of text-including accent, pitch and timbre. The second fills in details, including the way breathing and sound resonate in their environment.

However, making convincing synthesized sounds is more than just pressing a button. Part of what makes the human voice so human is its inconsistency, expressiveness, and ability to express the same lines in a completely different style according to the context.

Capturing these nuances involves finding the right voice actors to provide the appropriate training data and fine-tuning the deep learning model. WellSaid said that this process requires at least one or two hours of audio and several weeks of labor to develop a realistic-sounding synthetic copy.

AI voice is especially popular among brands that want to maintain a consistent voice over millions of interactions with customers. With the popularity of smart speakers today and the rise of automated customer service agents and digital assistants embedded in cars and smart devices, brands may need to produce more than 100 hours of audio per month. But they also no longer want to use the universal voice provided by traditional text-to-speech technology—a trend that has accelerated during the pandemic as more and more customers skip in-store interactions and engage in virtual interactions with the company.

“If I were Pizza Hut, I certainly couldn’t sound like Domino, and I certainly couldn’t be like Papa John,” said Rupar Patel, a professor at Northeastern University and founder and CEO of VocaliD. A custom voice that matches the company’s brand identity. “These brands have considered their colors. They have considered their fonts. Now they must also start to think about the way their voice sounds.”

Although companies used to have to hire different voice actors for different markets—the northeastern United States and the southern United States, or France and Mexico—some voice AI companies can manipulate accents in different ways or switch the language of a single voice. This opens up the possibility of adjusting the advertisement based on who is listening to the advertisement on the streaming platform, not only changing the characteristics of the voice, but also the words being said. For example, a beer advertisement can tell listeners to stop at different bars, depending on whether it is shown in New York or Toronto. Resemble.ai, which designs voices for advertising and smart assistants, said it is already working with customers to launch such personalized audio ads on Spotify and Pandora.

The gaming and entertainment industries have also seen benefits. Sonantic is a company that specializes in making emotional sounds that can laugh, cry, or whisper and yell. It works with video game manufacturers and animation studios to provide voice-overs for their characters. Many of its clients only use synthesized voices in pre-production, and switch to real voice actors in the final production. But Sonantic says that some people have already started using them throughout the process, perhaps for characters with fewer lines. Resemble.ai and other companies are also working with movies and TV shows to repair actors’ performances when words appear garbled or pronounced incorrectly.


Source link

Leave a Reply