Josh Ramirez
← Field guide

Entry 14 of 23

Departed Spring 2025

handwritten notesWeb appShipped
APUSH Notes Viewer grid at learnq.org showing scanned note cards for Period 8 and 9 practice, SAQ Practice, Progressive Era DBQ Practice, American Imperialism, Period 6 Urbanization, and a floating filter and search bar
The binder, flattened into a searchable grid.

Departure

APUSH notes were useful only while I remembered which packet they were in. The pile had handwritten pages, DBQs, SAQs, quizzes, Woodward notes, and review sheets, but no search box. The expedition was to turn the physical class archive into a website: photograph every page, clean the scans, ask Gemini for OCR and APUSH metadata, then make the whole thing filterable by period, theme, type, and keyword.

Approach

6 tools

  • Python
  • OpenCV
  • Pillow
  • Gemini structured output
  • Next.js
  • DreamHost

No separate backend in the first version. The images and JSON were baked into the frontend, which made localhost simple and deployment heavy.

Field log

10 entries

  1. Spring 2025 — the pile

    The source material was not clean. It was school-paper archaeology: pencil notes, packet holes, answer lines, cartoons, rubrics, and half-cropped phone photos. Some pages were standalone; some belonged to packets that needed to stay together.

    Two photographed handwritten APUSH notes pages about industrialization, the Great Railroad Strike of 1877, Taylorism, Social Darwinism, labor unions, Haymarket, and the American Federation of Labor
    The actual input: useful notes, bad searchability.
  2. Step 1 — every page becomes an image

    Individual pages stayed in the root. Packets became folders, with the pages inside each folder. That rule mattered because a single Gemini request could represent either one loose page or a multi-page packet.

    macOS Finder showing a messy APUSH image directory with numbered folders 1 through 55 mixed with raw IMG_7393.jpeg style page photos
    Before formatting: folders and raw camera filenames mixed together.
  3. Step 2 — names had to become structure

    A Python natural-sort pass renamed packet pages as packet_page, like 3_2.png, and standalone pages as single increasing numbers. The filenames became a data model small enough for the later scripts to trust.

    macOS Finder list view showing APUSH packet folders where folder 1 contains 1_1.png, 1_2.png, 1_3.png, 1_4.png; folder 2 contains 2_1.png and 2_2.png; and later folders continue the same pattern
    Packet pages after the rename script made the structure explicit.
  4. Step 3 — clean the paper

    The batch processor walked every JPEG, cropped the page, found document corners, applied a four-point transform, bumped contrast and brightness, wrote a PNG, and deleted the original JPEG. The goal was not perfect archival quality; it was a cleaner page for OCR and a less ugly website image.

    Side-by-side APUSH Period 8 DBQ worksheet before and after OpenCV processing; the left image has dark table background and perspective skew, while the right image is brighter, straighter, and cropped close to the page
    Before and after the OpenCV crop, transform, and enhancement pass.
  5. Step 4 — Gemini as the clerk

    Gemini got the image or packet plus a strict JSON schema: apush_period, APUSH themes, title, document_type, tags, summary, and full_text. The useful part was not just OCR. It could decide that a page belonged to Period 6, tag it with Taylorism and Haymarket, and summarize the point of the notes.

    Code editor showing Gemini JSON metadata for an APUSH note with apush_period Period 6: 1865 - 1898, themes WXT POL CUL, document_type Handwritten Notes, tags including Great Railroad Strike of 1877 and Haymarket Affair, full_text, summary, and title Industrialization and Labor
    Structured output turned OCR into searchable metadata.
  6. After the run — packets plus JSON

    Each folder ended with its cleaned page images and a JSON file beside them. For the app, that was the whole database: the image files were the originals, and the JSON files were the index.

    macOS Finder list view showing APUSH packet folders expanded with cleaned PNG pages and matching JSON metadata files such as 1.json, 2.json, 3.json, 4.json, and 5.json
    Cleaned pages plus one metadata file per packet.
  7. The site — search first

    The frontend was not trying to be a generic note app. It rendered a grid of scanned pages and put the APUSH-specific filters at the bottom: period, theme, document type, and text search. A note detail page showed the scan next to the extracted title, period, summary, tags, themes, and document type.

    APUSH note detail page on localhost showing a scanned Period 7 review page on the left and a metadata panel on the right titled Period 7-1898-1945-1920-1930s Review + WW2 Entry with summary, tags, themes, and document type
    The payoff: handwritten paper beside searchable structured context.
  8. Apr 2025 — Vercel said no

    Because the first version shipped all data statically, the build folder got huge. Vercel was the obvious host until the deployment failed under the weight of the notes. The architecture was convenient locally and expensive at deploy time.

    Vercel deployment details page showing Build Failed with the message data is too long and build logs reporting missing public data JSON files
    The static-data shortcut hit the hosting wall.
  9. DreamHost — shipped, but rough

    DreamHost could host the files, so learnq.org became the public version. It worked best as my own localhost tool, though. On DreamHost the certificate was flaky, and hydration sometimes lagged long enough that clicking a note changed the URL while the home grid kept rendering.

    DreamHost file manager open to the learnq.org directory with _next, data, notes, .htaccess, 404.html, favicon, and index files
    Static files uploaded into DreamHost.
    Chrome warning page for https://learnq.org saying Your connection is not private with NET::ERR_CERT_AUTHORITY_INVALID
    The certificate issue that made the public version feel fragile.
    APUSH Notes Viewer at learnq.org showing the grid and search controls while the browser status bar points at learnq.org/notes/5, illustrating a navigation or hydration mismatch
    The link changed before the page reliably did.
  10. Apr 24, 2025 — completed

    The core loop worked better than expected. I could run the script, leave it alone, and come back to packets that had been OCRed, summarized, tagged, and filed. Searching my own APUSH notes by period was immediately useful, even if the public deployment was not the final shape.

From the gallery

4 figures

APUSH Notes Viewer grid at learnq.org with scanned note cards and a bottom filter bar for period, theme, type, search text, and reset
Grid view: scanned notes as searchable cards.
APUSH note detail page showing a handwritten review page and extracted metadata including summary, tags, APUSH themes, and document type
Detail view: the scan and its generated metadata.
Before and after APUSH worksheet processing with the cleaned page brighter and straighter than the raw photo
Image cleanup before OCR.
Gemini structured JSON output for APUSH Period 6 industrialization notes
Gemini output as the metadata layer.

What I came back with

~60 packets indexed with OCR, tags, periods, and summaries

learnq.org

Lesson from the terrain

This was the first LLM workflow that felt obviously useful instead of futuristic for its own sake. The model was good at the boring clerk work: reading messy pages, extracting key terms, choosing a period, and returning valid structured output again and again. The weak part was architecture, not intelligence. Baking every scan and JSON file into the frontend made the app easy to build and easy to use on localhost, but it made hosting brittle. The next version wants Supabase for metadata, compressed thumbnails for the grid, and a thinner frontend that fetches notes instead of carrying the whole binder in the build.

Cross-links