Departed Summer 2023

Deep-Fake-Detector

faceswap detection · ML model · Archived

[image: Final 4x5 grid of held-out test faces with True/Pred/Confidence overlay text on each cell — top two rows labeled 'Fake' on the left, bottom two rows labeled 'Real', every cell reading True: __, Pred: __, Conf: 0.99–1.00 at ~94% accuracy]

Held-out test pass — every cell called correctly at 0.99–1.00 confidence.

Departure

UTD's summer research class taught machine learning the long way — model architectures, use cases, a detour through reinforcement learning — and ended on a single brief: pick a thing and train a CNN on it. Deep-fake detection picked itself. GAN portraits and InsightFace face-swaps were already trivial to generate, but no one in the room had a binary classifier that could tell a real face from a swapped one. The terrain was a straightforward Conv2D stack; the actual work was upstream — finding a real-people dataset that was actually real, and inventing a deepfake dataset that didn't exist as images.

Approach

Python
TensorFlow
Keras
InsightFace
TensorFlow.js
BlazeFace
React

No public deepfake image dataset — every available one was video. Had to generate my own.

Field log

Summer 2023 — UTD research class
Two weeks of fundamentals — model architectures, use cases, a detour through reinforcement learning — and a final brief: pick a thing and train a CNN on it. The brief picked itself.
The pipeline
Six steps on the whiteboard: collect a dataset, clean it, augment it, define the layers, train and test, ship. The model was step four. Almost everything that mattered happened before it.
The 'real' half — FFHQ
Most face datasets quietly mixed in stylized portraits and renders. NVlabs' Flickr-Faces-HQ was built as a benchmark for GANs — high-res, wide age and ethnicity range, and unmistakably real photos. Took it whole as the real class.
[image: Collage of sample real faces from FFHQ (Flickr-Faces-HQ) by NVlabs — a single high-resolution portrait alongside a grid of diverse faces spanning age, ethnicity, lighting, and pose, originally a GAN benchmark dataset]
The real class. Lifted whole from a GAN benchmark.
The 'fake' half — InsightFace
Every public deepfake dataset was video. Built my own with InsightFace, an open-source 2D/3D face toolbox on PyTorch and MXNet — pick a source face, pick a target, swap, save the frame. Not technically deepfake, but the artifact is indistinguishable from one.
[image: InsightFace face-swap example: a Tom Holland Spider-Man photo labeled 'Source' plus a Tobey Maguire Spider-Man photo labeled 'Target' arrowed into a 'Result' frame with Holland's face composited onto Maguire's body]
Source + target → result. The fake class, generated one frame at a time.
Architecture
Sequential CNN — four Conv2D blocks (32 → 64 → 128 → 128, 3×3, ReLU) with a MaxPool after each, Flatten, two Dropout(0.5) gates around a Dense(512) head, sigmoid out for binary. Standard shape, no surprises.
[image: CNN architecture diagram for the deep-fake detector — Sequential model with Conv2D(32,3x3,relu) + MaxPool, Conv2D(64) + MaxPool, Conv2D(128) + MaxPool, Conv2D(128) + MaxPool, Flatten, Dropout(0.5), Dense(512,relu), Dropout(0.5), Dense(1,sigmoid) for binary real-vs-fake classification]
The stack. Input on top, sigmoid verdict on the bottom.
Trial and error — unbalanced data
10 epochs landed at 60% — the model was guessing the majority class. 50 epochs got to 75%. 100 epochs on a balanced dataset got to 94%. The architecture didn't move between any of them; only the data and the patience did.
[image: Three pairs of training and validation accuracy/loss line graphs stacked vertically — top pair from a 10-epoch run topping out near 60%, middle pair from a 50-epoch run reaching 75%, bottom pair from a 100-epoch run climbing to 94% on a balanced dataset]
Three runs, same architecture. Different shape of data, different shape of curve.
Final result
Held-out test pass at ~94% accuracy. A 4×5 grid of unseen faces — two rows real, two rows fake — every cell called correctly at 0.99–1.00 confidence.
Browser deploy
Converted the .h5/Keras weights into a TensorFlow.js JSON folder, dropped them into a React/Tailwind page, and put BlazeFace upstream of the CNN to crop the face out of any uploaded image. Drag a photo into the upload zone, get a verdict in the browser. Pushed to GitHub Pages as deepdetect.github.io.
[image: deepdetect.github.io drag-and-drop upload zone for testing real vs deepfake faces in the browser via TensorFlow.js + BlazeFace — large dashed-border drop area with 'Click to upload or drag and drop, PNG, JPG or GIF (MAX. 800x400px)' prompt and a 'Powered by TensorFlow.js and BlazeFace' footer]
The shipped page. CNN running in the browser, BlazeFace cropping the face before it ever sees the model.

From the gallery

[image: DEEPFAKE DETECTION mobile web UI showing a live verdict — uploaded photo of a young man in a grey UTD shirt with a red bounding box drawn around his face, a blue 'Confirm and Detect' button, and a green result box reading 'Image Appears Real — Confidence: 100.00%']

Live verdict on a UTD selfie. Real, 100.00%.

[image: Accuracy progression bar chart for the three training runs — three bars climbing left to right, labeled '10 epochs · 60%', '50 epochs · 75%', and '100 epochs · 94%']

60 → 75 → 94. Three bars, one architecture.

What I came back with

94% after 100 epochs

deepdetect.github.io

Lesson from the terrain

The architecture was fixed from attempt two onward — the only thing that moved 60% to 94% was the shape of the data and how long the model was allowed to look at it. Most of the work on a CNN happens before the model is defined: locating a 'real' dataset that's actually real, generating a 'fake' dataset because the public ones don't exist as images, and balancing the two so the loss curve is measuring the thing you think it is. Once those weights existed, they were a few megabytes of JSON the browser could load with TensorFlow.js — the model turned out to be the easy part to ship.

← Back to the journal