Imitating people’s speech patterns precisely could bring trouble

Cloning voices

Май 2017
Опубликовано 2017-05-16 16:00

The ecomomist

Apr 20th 2017

Not any more. Software exists that can store slivers of recorded speech a mere five milliseconds long, each annotated with a precise pitch. These can be shuffled together to make new words, and tweaked individually so that they fit harmoniously into their new sonic homes. This is much cheaper than conventional voice banking, and permits novel uses to be developed. With little effort, a wife can lend her voice to her blind husband’s screen-reading software. A boss can give his to workplace robots. A Facebook user can listen to a post apparently read aloud by its author. Parents often away on business can personalise their children’s wirelessly connected talking toys. And so on. At least, that is the vision of Gershon Silbert, boss of VivoText, a voice-cloning firm in Tel Aviv.

Words to the wise

Next year VivoText plans to release an app that lets users select the emphasis, speed and level of happiness or sadness with which individual words and phrases are produced. Mr Silbert refers to the emotive quality of the human voice as “the ultimate instrument”. Yet this power also troubles him. VivoText licenses its software to Hasbro, an American toymaker keen to sell increasingly interactive playthings. Hasbro is aware, Mr Silbert notes, that without safeguards a prankster might, for example, type curses on his mother’s smartphone in order to see a younger sibling burst into tears on hearing them spoken by a toy using mum’s voice.

More troubling, any voice—including that of a stranger—can be cloned if decent recordings are available on YouTube or elsewhere. Researchers at the University of Alabama, Birmingham, led by Nitesh Saxena, were able to use Festvox to clone voices based on only five minutes of speech retrieved online. When tested against voice-biometrics software like that used by many banks to block unauthorised access to accounts, more than 80% of the fake voices tricked the computer. Alan Black, one of Festvox’s developers, reckons systems that rely on voice-ID software are now “deeply, fundamentally insecure”.

And, lest people get smug about the inferiority of machines, humans have proved only a little harder to fool than software is. Dr Saxena and his colleagues asked volunteers if a voice sample belonged to a person whose real speech they had just listened to for about 90 seconds. The volunteers recognised cloned speech as such only half the time (ie, no better than chance). The upshot, according to George Papcun, an expert witness paid to detect faked recordings produced as evidence in court, is the emergence of a technology with “enormous potential value for disinformation”. Dr Papcun, who previously worked as a speech-synthesis scientist at Los Alamos National Laboratory, a weapons establishment in New Mexico, ponders on things like the ability to clone an enemy leader’s voice in wartime.

As might be expected, countermeasures to sniff out such deception are being developed. Nuance Communications, a maker of voice-activated software, is working on algorithms that detect tiny skips in frequency at the points where slices of speech are stuck together. Adobe, best known as the maker of Photoshop, an image-editing software suite, says that it may encode digital watermarks into speech fabricated by a voice-cloning feature called VoCo it is developing. Such wizardry may help computers flag up suspicious speech. Even so, it is easy to imagine the mayhem that might be created in a world which makes it easy to put authentic-sounding words into the mouths of adversaries—be they colleagues or heads of state.

Оставлять комментарии могут только зарегистрированные пользователи. Войдите в систему используя свою учетную запись на сайте:
Email: Пароль:	напомнить пароль Регистрация

Imitating people’s speech patterns precisely could bring trouble

The ecomomist

Читайте также: