The Human Voice

2023-09-30 22:48:42

Our voice is the main means of communication and most of us do not use it for more than a few minutes. We will not talk in your voice, our voices can be used to do everything. The most obvious example is singing. Obviously, human voice is a means of communication, but it is also a source of our pleasure. Human voice is not restricted to some sounds No, human voice can release complicated range, but it is impossible unless it is a throat complex system.

Human voice is the most natural tool for contact and connection with the surrounding world, until recently there was an irreversible gap between human voice and technology. A movie 'her' seamlessly communicating with a perceptual operating system called Samantha seems to have people who like it in the decades of the future. However, there is already a system that attracts computers using your voice, and more and more people are using them, improving them and developing them.

Last year, Google released WaveNet, Baidu released Deep Speech. Both are deep learning networks that automatically generate voice. The system imitates the human voice and learns to improve over time. It is far more difficult than people think when the audience is trying to distinguish them from real human speech. Automatic speech generation has not been done yet, but by deep learning, you can take a step further towards the computer and let the computer actually talk like a human.

Last year, Google released Deep Speech from WaveNet and Baidu. Both are deep learning networks that automatically generate vocals. This system learns to mimic the human voice and improves the ability to imitate humans over time. It is becoming increasingly difficult to distinguish their words from the words of real people. LipNet is a detailed network created by Oxford University, funded by the alphabet DeepMind, has achieved 93% success by reading people's lips. The best human lip reader has a success rate of 52%. The University of Washington team is creating a system to add synthesized audio to existing videos using lip sync

After the Deep Voice was released earlier this year, Deep Voice 2 produced a real-time speech almost indistinguishable from vocals. Even more impressive is that you only need 30 minutes of audio to build a working model and you can imitate regional accent of hundreds of different speakers. The important thing is that Deep Voice 2 recognizes similarities between hundreds of different speakers and builds a working model of vocals. Then, it autonomously gets a unique sound from the model - Unlike Apple's Siri's voice assistant, which requires thousands of hours of manual recording of engineering by the engineer, Deep Voice 2 provides guidance or manual No intervention is required.