| Log in
Photo Credit: Reuters
TMT

Google develops new AI system that can match human voice

03 January, 2018

Tech giant Google’s attempts to develop a natural-sounding voice from text have taken a big jump forward.

The company has developed a text-to-speech artificial intelligence system, called Tacotron 2, that can speak in a very human-like voice, it said in blog post.

A team of Google researchers wrote in the blog post that the new approach does not use complex linguistic and acoustic features as input. “Instead, we generate human-like speech from text using neural networks trained using only speech examples and corresponding text transcripts,” they said.

Research into text-to-speech technology has progressed greatly over the past few years and many tech companies have been working on it.

The Google researchers said that they incorporated ideas from past work such as Tacotron and WaveNet to come up with the improved Tacotron 2 system.

How does Tacotron 2 work? The researchers explained that the new system uses a sequence-to-sequence model optimised for text-to-speech to map a sequence of letters to a sequence of features that encode the audio.

“These features, an 80-dimensional audio spectrogram with frames computed every 12.5 milliseconds, capture not only pronunciation of words, but also various subtleties of human speech, including volume, speed and intonation. Finally, these features are converted to a 24 kHz waveform using a WaveNet-like architecture,” the researchers said.

The researchers also evaluated the generated voices. “In an evaluation where we asked human listeners to rate the naturalness of the generated speech, we obtained a score that was comparable to that of professional recordings,” they said.

Still, there are some difficult problems to solve.

For example, the new system has difficulties pronouncing complex words such as ‘decorum’ and ‘merlot’. In extreme cases, it can randomly generate strange noises.

Also, the system cannot yet generate audio in real time. “Furthermore, we cannot yet control the generated speech, such as directing it to sound happy or sad. Each of these is an interesting research problem on its own,” the researchers wrote.

Like this report? Sign up for our daily newsletter to get our top reports.


Leave Your Comment
Microsoft's artificial intelligence application can now turn text into images

Microsoft’s artificial intelligence application can now turn text into images

Anirban Ghoshal 6 months ago
Microsoft has developed an artificial intelligence application that turns text...
How Liv.Ai is helping machines talk to Indians in their mother tongue

How Liv.Ai is helping machines talk to Indians in their mother tongue

Disha Sharma 1 year ago
For those not comfortable in English, using messaging platforms can be a tedious...
Amazon launches chatbot in bid to lead voice-controlled computing

Amazon launches chatbot in bid to lead voice-controlled computing

Reuters 1 year ago
Amazon.com Inc’s chief technology officer is working toward a day when...
No Comments

Google develops new AI system that can match human voice

Powered by WordPress.com VIP