Hanson Robotics and Cereproc Further Collaborate on Human-Centered AI to Enhance Sophia the Robot’s Voice
Sophia the Robot of Hanson Robotics has a distinctive voice recognized around the world. Her speaking and singing voice was developed as a collaboration between Hanson Robotics and CereProc, a text-to-speech solutions company known for synthesizing character-rich voices using deep learning. As a result of this ongoing collaboration, Sophia famously learned to sing, and performed a duet with Jimmy Fallon on the Tonight Show.
Since then, Hanson Robotics and CereProc have continued to enhance Sophia’s voice capabilities and advance synthetic speech technology. In their latest collaboration, Sophia has gained the ability to mimic the inflection, stress, and tone of a human voice. Using a neural network model, Sophia can transform a human’s voice into her own in real-time, all while maintaining the desired intonation and stress. In other words, a person can speak into a microphone to show Sophia how to pronounce a new phrase, and Sophia’s voice will be able to mimic it. Similarly, Sophia can learn to sing new musical passages by precisely copying pre-recorded audio, or by mimicking a person singing in real-time, all in her own unique voice.
These new abilities will allow Sophia to deliver more dramatic and entertaining performances. Having more precise control over Sophia’s inflection will allow her to deliver better punchlines, sound more natural, and better convey the hidden meaning behind her words. For example, people often use upward inflection to imply a question, even when giving a declarative statement. To indicate sarcasm, people often use inverse pitch obtrusion, where they lower their pitch for a stressed word rather than raising it. With these new capabilities, Sophia will be able to harness the power of subtle vocal cues widely used by humans to convey deeper layers of meaning.
This new technology allows a human speaker to use their own voice to finely control any synthetic voice. This will be attractive to a variety of industries including applications in healthcare, customer service agents, social robotics, filmmakers, game developers, and immersive training companies. For example, creating voice synthesis for an actor who plays one of the characters and later having a different person speaking into the microphone and to have the output being a text-to-speech voice of the actor. This technology would be very attractive to the film industry, as this would solve scheduling issues based on the actor’s availability and save on long term production costs.
So not only does this new technology have the potential to revolutionize the text to speech industry in film, game development, and social robotics, you may one day soon experience the overwhelming joy of having your very own sarcastic singing robot.