Josh Ramirez
← Field guide

Entry 21 of 23

Departed Summer 2023

faceswap detectionML modelArchived
Deepfake Detection web app upload page — large white card over teal and blue background panels with warning text, best-results note, dashed drag-and-drop upload zone, TensorFlow.js and BlazeFace footer, and copyright line
The browser version — a TensorFlow.js model behind a drag-and-drop face upload.

Departure

UTD's summer research class taught machine learning the long way — model architectures, use cases, a detour through reinforcement learning — and ended on a single brief: pick a thing and train a CNN on it. Deep-fake detection picked itself. GAN portraits and InsightFace face-swaps were already trivial to generate, but no one in the room had a binary classifier that could tell a real face from a swapped one. The terrain was a straightforward Conv2D stack; the actual work was upstream — finding a real-people dataset that was actually real, and inventing a deepfake dataset that didn't exist as images.

Approach

7 tools

  • Python
  • TensorFlow
  • Keras
  • InsightFace
  • TensorFlow.js
  • BlazeFace
  • React

No public deepfake image dataset — every available one was video. Had to generate my own.

Field log

8 entries

  1. Summer 2023 — UTD research class

    Two weeks of fundamentals — model architectures, use cases, a detour through reinforcement learning — and a final brief: pick a thing and train a CNN on it. The brief picked itself.

  2. The pipeline

    Six steps on the whiteboard: collect a dataset, clean it, augment it, define the layers, train and test, ship. The model was step four. Almost everything that mattered happened before it.

  3. The 'real' half — FFHQ

    Most face datasets quietly mixed in stylized portraits and renders. NVlabs' Flickr-Faces-HQ was built as a benchmark for GANs — high-res, wide age and ethnicity range, and unmistakably real photos. Took it whole as the real class.

    Collage of sample real faces from FFHQ (Flickr-Faces-HQ) by NVlabs — a single high-resolution portrait alongside a grid of diverse faces spanning age, ethnicity, lighting, and pose, originally a GAN benchmark dataset
    The real class. Lifted whole from a GAN benchmark.
  4. The 'fake' half — InsightFace

    Every public deepfake dataset was video. Built my own with InsightFace, an open-source 2D/3D face toolbox on PyTorch and MXNet — pick a source face, pick a target, swap, save the frame. Not technically deepfake, but the artifact is indistinguishable from one.

    InsightFace face-swap example: a Tom Holland Spider-Man photo labeled 'Source' plus a Tobey Maguire Spider-Man photo labeled 'Target' arrowed into a 'Result' frame with Holland's face composited onto Maguire's body
    Source + target → result. The fake class, generated one frame at a time.
  5. Architecture

    Sequential CNN — four Conv2D blocks (32 → 64 → 128 → 128, 3×3, ReLU) with a MaxPool after each, Flatten, two Dropout(0.5) gates around a Dense(512) head, sigmoid out for binary. Standard shape, no surprises.

    CNN architecture diagram for the deep-fake detector — Sequential model with Conv2D(32,3x3,relu) + MaxPool, Conv2D(64) + MaxPool, Conv2D(128) + MaxPool, Conv2D(128) + MaxPool, Flatten, Dropout(0.5), Dense(512,relu), Dropout(0.5), Dense(1,sigmoid) for binary real-vs-fake classification
    The stack. Input on top, sigmoid verdict on the bottom.
  6. Trial and error — unbalanced data

    10 epochs landed at 60% — the model was guessing the majority class. 50 epochs got to 75%. 100 epochs on a balanced dataset got to 94%. The architecture didn't move between any of them; only the data and the patience did.

    Three pairs of training and validation accuracy/loss line graphs stacked vertically — top pair from a 10-epoch run topping out near 60%, middle pair from a 50-epoch run reaching 75%, bottom pair from a 100-epoch run climbing to 94% on a balanced dataset
    Three runs, same architecture. Different shape of data, different shape of curve.
  7. Final result

    Held-out test pass at ~94% accuracy. A 4×5 grid of unseen faces — two rows real, two rows fake — every cell called correctly at 0.99–1.00 confidence.

  8. Browser deploy

    Converted the .h5/Keras weights into a TensorFlow.js JSON folder, dropped them into a React/Tailwind page, and put BlazeFace upstream of the CNN to crop the face out of any uploaded image. Drag a photo into the upload zone, get a verdict in the browser. Pushed to GitHub Pages as deepdetect.github.io.

    deepdetect.github.io drag-and-drop upload zone for testing real vs deepfake faces in the browser via TensorFlow.js + BlazeFace — large dashed-border drop area with 'Click to upload or drag and drop, PNG, JPG or GIF (MAX. 800x400px)' prompt and a 'Powered by TensorFlow.js and BlazeFace' footer
    The shipped page. CNN running in the browser, BlazeFace cropping the face before it ever sees the model.

From the gallery

2 figures

Deepfake Detection web app after upload — centered white card with upload guidance and an uploaded portrait of a young man, with a red bounding box marking the selected face to analyze
Face selected. The model only sees the crop.
Final model result training charts — fourth attempt at 100 epochs and 94 percent accuracy, with training and validation accuracy rising toward 0.9 to 1.0 and training and validation loss dropping across the run
Fourth attempt. 100 epochs, 94 percent.

What I came back with

94% after 100 epochs

deepdetect.github.io

Lesson from the terrain

The architecture was fixed from attempt two onward — the only thing that moved 60% to 94% was the shape of the data and how long the model was allowed to look at it. Most of the work on a CNN happens before the model is defined: locating a 'real' dataset that's actually real, generating a 'fake' dataset because the public ones don't exist as images, and balancing the two so the loss curve is measuring the thing you think it is. Once those weights existed, they were a few megabytes of JSON the browser could load with TensorFlow.js — the model turned out to be the easy part to ship.