Audio Deepfakes Are a Big Threat: Here’s How Researchers Expose Them

imagine the following scenario: A phone rings. An office worker responds and overhears his panicked boss say that she forgot to wire money to the new contractor before she left and needs him to do it. She gives him the bank transfer information and, with the money transferred, the crisis has been averted.

The worker leans back in his chair, takes a deep breath, and watches as his boss walks through the door. The voice on the other end of the call was not his boss. In fact, he wasn’t even human. The voice he heard was fake audio, a machine-generated audio sample designed to sound exactly like his boss.

Attacks like this using recorded audio have already occurred, and conversational audio deepfakes may not be far off.

Deepfakes, both audio and video, have only been possible with the development of sophisticated machine learning technologies in recent years.

Voice-only communications greatly expand the possibilities for attackers to use deepfakes.Kanok Sulaiman/Moment/Getty Images

Deepfakes have brought with them a new level of uncertainty around digital media. To detect deepfakes, many researchers have turned to analyzing visual artifacts (tiny glitches and inconsistencies) found in video deepfakes.

Audio deepfakes potentially pose an even greater threat because people often communicate verbally without video, for example through phone calls, radio, and voice recordings. These voice-only communications greatly expand the possibilities for attackers to use deepfakes.

To detect deep audio fakes, we and our research colleagues at the University of Florida have developed a technique that measures fluid dynamic and acoustic differences between speech samples organically created by human speakers and those generated synthetically by computers.

See also  MCU Fans Are Wondering If William Jackson Harper Joining The MCU In 'Ant-Man 3' Is 'Fantastic' News

Real versus fake voices

Attackers only need 10-20 seconds of audio from the targeted person’s voice.CSA-Printstock/DigitalVision Vectors/Getty Images

Humans vocalize by forcing air over the various structures of the vocal tract, including the vocal cords, tongue, and lips. By rearranging these structures, you alter the acoustic properties of your vocal tract, allowing you to create more than 200 different sounds or phonemes.

However, human anatomy fundamentally limits the acoustic behavior of these different phonemes, resulting in a relatively small range of correct sounds for each.

Rather, audio deepfakes are created by first allowing a computer to listen to the audio recordings of a target victim’s speaker. Depending on the exact techniques used, the computer may need to listen to as little as 10-20 seconds of audio. This audio is used to extract key information about the unique aspects of the victim’s voice.

The attacker selects a phrase for the deepfake to speak and then, using a modified text-to-speech algorithm, generates an audio sample that sounds as if the victim were saying the selected phrase. This process of creating a single spoofed audio sample can be accomplished in a matter of seconds, potentially allowing attackers enough flexibility to use the spoofed voice in a conversation.

Deep Counterfeit Detection

By estimating the anatomy responsible for creating the observed speech, it is possible to identify whether the audio was generated by a person or a computer.Shutterstock

The first step in differentiating human-produced speech from deepfake-generated speech is to understand how to acoustically model the vocal tract. Fortunately, scientists have techniques for estimating what someone, or some dinosaur-like being, might sound like based on anatomical measurements of their vocal tract.

See also  Exoprimal: the game of mechas and dinos shows more at the Capcom Showcase!

We did the other way around. By reversing many of these same techniques, we were able to extract an approximation of a speaker’s vocal tract during a segment of speech. This allowed us to effectively observe the anatomy of the speaker who created the audio sample.

From here, we hypothesized that deepfake audio samples would not be constrained by the same anatomical limitations that humans are. In other words, the analysis of fake audio samples simulated forms of the vocal tract that don’t exist in people.

The results of our tests not only confirmed our hypothesis, but also revealed something interesting. By extracting estimates of the vocal tract from fake audio, we found that the estimates were often comically wrong. For example, it was common for deepfake audio to result in vocal tracts with the same relative diameter and consistency as a drinking straw, in contrast to human vocal tracts, which are much wider and more variable in shape.

This realization demonstrates that deepfake audio, even when it convinces human listeners, is far from indistinguishable from human-generated speech. By estimating the anatomy responsible for creating the observed speech, it is possible to identify whether the audio was generated by a person or a computer.

Why does it matter? Today’s world is defined by the digital exchange of media and information. Everything from news to entertainment to conversations with loved ones usually happens through digital exchanges. Even in its infancy, fake videos and audios undermine the trust people have in these exchanges, effectively limiting their usefulness.

See also  Razer DeathAdder V3 Pro Gaming Mouse Review:

If the digital world is to remain a critical resource for information in people’s lives, effective and secure techniques for determining the source of an audio sample are crucial.

This article was originally published on The conversation by logan blue Y Patrick Traynor at the University of Florida. Read the original article here.

Related Posts

dear reality exoverb: bringing spatial audio to stereo

Dear Reality EXOVERB: bringing spatial audio to stereo

Dear Reality wants us to understand and use reverb thoughtfully with their EXOVERB plugin. It brings the idea of ​​spatial audio to stereo. We tested it in…

this weekend you have these 6 free games on pc

This weekend you have these 6 free games on PC and consoles to have a great time playing at home – Evil Dead: The Game – 3dgames

Today hundreds of thousands of players around the world will begin catching their first creatures in the Paldea region with Pokémon Scarlet and Purple as others land…

gen v official season 1 (2022) first look trailer

Gen V – Official Season 1 (2022) First Look Trailer Jaz Sinclair, Chance Perdomo, PJ Byrne

Set at America’s only university exclusively for young adult superheroes (run by Vought International), Gen V explores the lives of the hormonal and competitive super as they…

Warframe on Switch Just Got a Cross-Platform Play Update

Subscribe to Nintendo Life at Youtube Remember back in July when Digital Extremes announced that they would be adding cross-play and cross-save support to their hit free-to-play…

one of the best pc and xbox series racing wheels

One of the best PC and Xbox Series racing wheels is back on sale!

News good plan One of the best PC and Xbox Series racing wheels is back on sale! Published on 03/12/2022 at 21:35 A racing wheel at almost…

review: super kiwi 64 an n64 style platformer packed with

Review: Super Kiwi 64 – An N64-style platformer packed with deceptive energy

Super Kiwi 64 feels like it’s up to something. There’s a tricky energy about it that’s impossible to ignore. Players of previous Siactro games might be expecting…

Leave a Reply

Your email address will not be published. Required fields are marked *