The Making of Sophia: Sophia's Singing Voice

The Making of Sophia: Sophia’s Singing Voice

On November 21st, 2018, Sophia the Robot of Hanson Robotics Limited made history when she sang Christina Aguilera’s ‘Say Something’ on ‘The Tonight Show’ with host Jimmy Fallon, in what is thought to be the first robot-human duet ever performed on live television.

The achievement was the result of a long-lasting partnership between Hanson Robotics and Cereproc, a text to speech (TTS) technology company that creates synthetic voices with character and personality.

Cereproc first created the speaking voice of Sophia in 2016 by training a machine learning model on many hours of voice recordings by one of Hanson Robotics’ character development team members. The Cereproc model was able to capture the unique character and personality traits of the recordings, giving Sophia her instantly recognizable voice.

This past year, CereProc partnered with Hanson once again, giving Sophia the ability to sing for the first time.

Learning How to Sing

The goal of Hanson Robotics and CereProc was to translate the same character and personality of Sophia’s speaking voice into a singing voice.

The first step was for the woman who made the original recordings for Sophia’s voice to record a number of musical passages at a recording studio. CereProc then built a new database for singing synthesis and trained a Deep Neural Network (DNN) to reproduce the expression, contours, and timbre of the recorded passages.

The resulting model was then able to produce a sequence of musical notes combined with vocal phonemes, which form the sounds necessary to create human speech. To add an extra layer of richness, they used frequency and amplitude modulation to control the pitch and volume of Sophia’s voice, creating a natural-sounding vibrato that brings her performance to life.

Applications

This new technology has the potential for many applications in the musical industry and in the field of social robotics. Musical AI voice synthesis could be used as a musical instrument, an automated vocal accompaniment for musical performances, or as a teaching tool for new musicians. In social robotics, singing can improve robot performance in a number of areas including healthcare, elder care, and education. For example, robots can sing to improve the mood or stimulate the brain of patients or create songs to help teach young students in a creative and memorable way.

While Sophia might have been the first humanoid robot to sing a human-robot duet, she certainly won’t be the last!

Carolyn Ayers, Hanson Robotics

The Making of Sophia: Sophia’s Singing Voice