A Recent announcement from a Canadian start-up is stirring up the media. They introduced a AI model capable of synthesizing a person's voice from just a one-minute audio sample. In other words, you can get anyone to say anything you want.
The system, named Lyrebird after the Australian bird, relies on deep learning models developed by the University of Montréal, where the tech startup is based. Initially the developers worked on a research paper that looked at using neural networks to generate audio from a series of samples. This study later became the basis for their model for speech synthesis. They state Lyrebird can “compress voice DNA into a unique key and use this key to generate anything with its corresponding voice”. It does it at the rate of 1000 sentences in less than half a second. It even allows to control the emotions of the speech, such as sympathy, anger, whatever suits your mood.
TechCrunch called the technology a “voice mimic for the fake news era”, while The Inquirer defined the company a “sinister startup”. Lyrebird replied with a press release addressed to the developers around the world, wishing to raise awareness on the existence of such technology, and questioning the reliability of audio evidence in courts or other for other uses.