Introduction

In September 2019 Reynold Journalism Institute , Columbia Missouri opened a deepfake verification Competition. RJI Student Innovation Competition challenge is to create a program, tool or prototype for photo, video or audio verification.

More Details Here

RJI 3rd Place Winner :

On February 8 Our team FakeLab Won the 3rd place at RJI Student Innovation Competition.

What is DeepFakes

Deepfakes are images, videos or voices that have been manipulated through the use of sophisticated machine-learning algorithms to make it almost impossible to differentiate between what is real and what isn’t.

Popular techniques for creating audio deepfakes.

Speech Synthesis:

Speech synthesis is the artificial production of human speech.
Computer or instrument used is speech synthesizer.
Text-To-Speech (TTS) synthesis is production of speech from normal language text.

Voice Conversion:

Transform the speech of a (source) speaker so that it sound- like the speech of a different (target) speaker.

FakeLab

We at University of Missouri Kansas City came up with solution to detect deepfakes called FakeLab.

Fakelab is a tool for journalists, media houses and tech companies that helps in identifying manipulated photos, videos and audio shared on their platforms.

Below are detailed steps of our work for DeepFake Audio

Working

To discern between real and fake audio, the detector uses visual representations of audio clips called spectrograms, which are also used to train speech synthesis models.

Spectrograms are visual representations of sound. If you look closely, you’ll notice that the blue and green bands on the bottom spectrogram are blurrier than the ones on top. That’s because the one on the bottom is a fake!

While to the unsuspecting ear they sound basically identical, spectrograms of real audio vs. fake audio actually *look* different from one another.

The data

We trained the detector on Google’s 2019 AVSSpoof dataset, released earlier this year by the company to encourage the development of audio deepfake detection. The dataset contains over 25,000 clips of audio, featuring both real and fake clips of a variety of male and female speakers.

The model

The deepfake detector model is a deep neural network that uses Temporal convolution. Here’s a high-level overview of the model’s architecture:

First, raw audio is preprocessed and converted into a mel-frequency spectrogram — this is the input for the model. The model performs convolutions over the time dimension of the spectrogram, then uses masked pooling to prevent overfitting. Finally, the output is passed into a dense layer and a sigmoid activation function, which ultimately outputs a predicted probability between 0 (fake) and 1 (real).

Results

Dessa’s baseline model achieved 99%, 95%, and 85% accuracy on the train, validation, and test sets respectively. The differing performance is caused by differences between the three datasets. While all three datasets feature distinct and different speakers, the test set uses a different set of fake audio generating algorithms that were not present in the train or validation set.

Put more simply, our detector model can currently predict over 90% of the fake audio clips it is shown.

FakeLab UI

In order to get the inference for test video we have created the Web UI for keeping the record of all inference run on test files.

Here is Demo

Github

All the source code we used in model training is available at GitHub here

March 30, 2020 Dhairya Chandra

5 Comments

Sumiko Steeb says:
July 29, 2021 at 1:52 AM
I like your blog. Its one of the great blogs online
1. Dhairya Chandra says:
  July 29, 2021 at 6:35 PM
  Thank you!
Mandyt says:
June 17, 2024 at 7:59 AM
This article offers a fascinating perspective on the subject. The depth of research and clarity in presentation make it a valuable read for anyone interested in this topic. It’s refreshing to see such well-articulated insights that not only inform but also provoke thoughtful discussion. I particularly appreciated the way the author connected various aspects to provide a comprehensive understanding. It’s clear that a lot of effort went into compiling this piece, and it certainly pays off. Looking forward to reading more from this author and hearing other readers’ thoughts. Keep up the excellent work!
Epic Gamert says:
June 20, 2024 at 3:48 PM
Fantastic article! I appreciate how clearly you explained the topic. Your insights are both informative and thought-provoking. I’m curious about your thoughts on the future implications of this. How do you see this evolving over time? Looking forward to more discussions and perspectives from others. Thanks for sharing!
Stacyt says:
June 23, 2024 at 12:05 PM
Great article! I appreciate the clear and insightful perspective you’ve shared. It’s fascinating to see how this topic is developing. For those interested in diving deeper, I found an excellent resource that expands on these ideas: check it out here. Looking forward to hearing others’ thoughts and continuing the discussion!