AI learns how to fool text to speech That’s bad news for voice a ssistants
AI learns how to fool text-to-speech. That’s bad news for voice a.ssistants.
A pair of computer scientists at the University of California, Berkeley developed an AI-based attack that targets text-to-speech systems. With their method, no matter what an audio file sounds like, the text output will be whatever the attacker wants it to be.
This one is pretty cool, but it’s also another entry for the “terrifying uses of AI” category.
The team, Nicholas Carlini and Professor David Wagner, were able to trick Mozilla’s popular DeepSpeech open-source text-to-speech system by, essentially, turning it on itself. In a white paper published last week the researchers state:
Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (at a rate of up to 50 characters per second) … Our attack works with 100% success, regardless of the desired transcription, or initial source phrase being spoken. By starting with an arbitrary waveform instead of speech (such as music), we can embed speech into audio that should not be recognized as speech; and by choosing silence as the target, we can hide audio from a speech-to-text system.