More often than not, when we make a voice call over the internet using clients like Google Duo, Skype or Facebook Messenger, due to audio packet loss there is a lot of jitter at the receiver’s end. For instance, 20% of Duo calls have a total audio loss of more than 3% and 10% of all calls lose around 8% audio. To counter this packet loss, concealment (PLC) module is used to fill the gaps for a better output. Going a step ahead of the competition, Google Duo now uses WaveNetEQ PLC system which is based on DeepMind’s WaveRNN technology to fully synthesize raw waveform of the missing bits for a more natural voice call output.

How WaveNetEQ works

To ensure that the output audio of the missing packets is as natural as possible, WaveNetEQ for Duo extracts the contextual information and generates credible sound to preserve the voice characteristics. The recent past audio is used as a reference for input to the conditioning network to predict the next sample in waveform.

After the waveform audio is corrected, it seamlessly merges with the real-time audio stream to make the transition smooth and virtually unnoticeable. The more data samples this technology gets the better it gets at crossfading the correct packets with the ones that are faulty.

Practical implementation

Since in real-world conditions, the VoIP calls are made using different hardware and the human voice is also different each time, the current model is trained in 48 different languages with 100 different speakers. Also, WaveNetEQ takes into consideration noisy environments like answering a phone in a crowded station or in a restaurant.

To make sure that the model is not producing false syllables, the data is evaluated using Google Cloud Speech-to-Text API. Currently, there have not been many differences in word error rate, which puts the tech on the verge of success already. The WaveNetEQ technology is currently being used in all Duo calls made using Pixel 4 phones and is being rolled out to other models steadily. As the AI model improves, one can expect smooth jitter-free audio reception on internet calls in the coming days.