Fakelab – A Deepfake Audio Detection Tool

Introduction

In September 2019 Reynold Journalism Institute , Columbia Missouri opened a deepfake verification Competition. RJI Student Innovation Competition challenge is to create a program, tool or prototype for photo, video or audio verification. 

More Details Here

RJI 3rd Place Winner :

On February 8 Our team FakeLab Won the 3rd place at RJI Student Innovation Competition.

What is DeepFakes

Deepfakes are images, videos or voices that have been manipulated through the use of sophisticated machine-learning algorithms to make it almost impossible to differentiate between what is real and what isn’t.

Popular techniques for creating audio deepfakes.

Speech Synthesis:

  • Speech synthesis is the artificial production of human speech.
  • Computer or instrument used is speech synthesizer.
  • Text-To-Speech (TTS) synthesis is production of speech from normal language text.

Voice Conversion:

Transform the speech of a (source) speaker so that it sound- like the speech of a different (target) speaker.

FakeLab

We at University of Missouri Kansas City came up with solution to detect deepfakes called FakeLab.

Fakelab is a tool for journalists, media houses and tech companies that helps in identifying manipulated photos, videos and audio shared on their platforms.

 

 

Below are detailed steps of our work for DeepFake Audio

 

Working

To discern between real and fake audio, the detector uses visual representations of audio clips called spectrograms, which are also used to train speech synthesis models.

Spectrograms are visual representations of sound. If you look closely, you’ll notice that the blue and green bands on the bottom spectrogram are blurrier than the ones on top. That’s because the one on the bottom is a fake!

The data

We trained the detector on Google’s 2019 AVSSpoof dataset, released earlier this year by the company to encourage the development of audio deepfake detection. The dataset contains over 25,000 clips of audio, featuring both real and fake clips of a variety of male and female speakers.

The model

The deepfake detector model is a deep neural network that uses Temporal convolution. Here’s a high-level overview of the model’s architecture:

Results

Dessa’s baseline model achieved 99%, 95%, and 85% accuracy on the train, validation, and test sets respectively. The differing performance is caused by differences between the three datasets. While all three datasets feature distinct and different speakers, the test set uses a different set of fake audio generating algorithms that were not present in the train or validation set.

FakeLab UI 

In order to get the inference for test video we have created the Web UI for keeping the record of all inference run on test files.

Here is Demo

 

 

 

Github

All the source code we used in model training is available at GitHub here

5 Comments

  1. This article offers a fascinating perspective on the subject. The depth of research and clarity in presentation make it a valuable read for anyone interested in this topic. It’s refreshing to see such well-articulated insights that not only inform but also provoke thoughtful discussion. I particularly appreciated the way the author connected various aspects to provide a comprehensive understanding. It’s clear that a lot of effort went into compiling this piece, and it certainly pays off. Looking forward to reading more from this author and hearing other readers’ thoughts. Keep up the excellent work!

  2. Fantastic article! I appreciate how clearly you explained the topic. Your insights are both informative and thought-provoking. I’m curious about your thoughts on the future implications of this. How do you see this evolving over time? Looking forward to more discussions and perspectives from others. Thanks for sharing!

  3. Great article! I appreciate the clear and insightful perspective you’ve shared. It’s fascinating to see how this topic is developing. For those interested in diving deeper, I found an excellent resource that expands on these ideas: check it out here. Looking forward to hearing others’ thoughts and continuing the discussion!

Leave a Comment