TCG Card Scanner

A mobile app that identifies trading cards by pointing a camera at them and instantly retrieves pricing and recent sales data. Supports both English and Japanese PokΓ©mon cards across all eras.

Multi-scan - photograph up to 20 cards at once

Live scan - continuous 1-by-1 identification in real time

Note: price fetch times for newly scanned cards reflect live web scraping latency. This would be eliminated with a dedicated pricing API - an intentional upgrade path if the project scales.

TL;DR - Two main use cases:

  • Display case / binder scan: photograph an entire spread of cards at once. The app detects each card, identifies it, and streams prices back one-by-one as they resolve - no waiting for the full batch.
  • Bulk 1-by-1 scan: hold cards up to the camera in sequence. Each card is identified automatically and added to a running session list in real time, with a total value updating as you go.

Both modes use fine-tuned on-device and server-side ML models: a custom-trained YOLO11n for card detection, and a fine-tuned CLIP visual embedding model for image-based identification - neither is a third-party recognition API.

The app fuses three signals to identify a card: YOLO detects each card's boundaries, ML Kit reads its text on-device, and a fine-tuned CLIP model matches it visually. Prices, recent sales, and trend graphs are scraped live from PriceCharting and cached in Redis. It grew across 31 iterations - from a single-card OCR scanner to a multi-model streaming pipeline - with each version answering a concrete failure in the one before it.

31 iterations 47,442 cards indexed 20,187 EN Β· 27,255 JP YOLO mAP50-95 0.964 ~800 ms live scan 512-dim CLIP embeddings

Key Features

  • πŸ“· Multi-card batch scanning
  • Photograph a whole spread of cards; each one is detected, identified, and streamed back to the UI one-by-one as it resolves (first card in ~3s on a Samsung S22+).
  • πŸŽ₯ Live continuous scanning
  • Hold cards up to the camera and they're identified automatically into a running session list - no shutter tap, with a two-frame confirmation gate to reject phantom matches mid-swap.
  • πŸ€– Three-mode recognition engine
  • OCR text search, CLIP image matching across 47,442 card vectors, or both fused via Reciprocal Rank Fusion for the most accurate identification.
  • πŸ‡―πŸ‡΅ Automatic Japanese support
  • A single photo of mixed English and Japanese cards is handled with no toggle; JP names are translated and matched against 27,255 first-class JP records (all eras, 1996–present).
  • πŸ’° Live pricing & sales
  • Ungraded and PSA 7–10 prices, recent eBay/TCGPlayer sales, and trend charts scraped from PriceCharting and cached in Redis, with an instant USD/JPY toggle.
  • πŸ” Fuzzy search & disambiguation
  • Partial names, set hints, and exclusion terms, with trigram matching to recover OCR misreads and a printed-total lookup to separate cards that share a name and number across sets.

Engineering story

The interesting part of this project was the iteration. Each major rewrite started from a specific failure - here are the five biggest, as problem β†’ action β†’ outcome.

  • 🎯 Finding cards in a messy photo
  • Problem: my first detector inferred card positions by clustering OCR text - it failed completely on holofoil, face-down, or overlapping cards with no readable text.
    Action: trained a YOLO11n detector on 1,688 real images, then built a synthetic-data pipeline that pastes card art onto random desk and display-case backgrounds with auto-generated bounding boxes (zero manual labeling).
    Outcome: detection became robust to surface finish and overlap, and bounding-box accuracy (mAP50-95) rose from 0.904 to 0.964 - giving the matcher cleaner crops.
  • 🧠 Too little real data to train the visual matcher
  • Problem: an off-the-shelf image model gave near-random matches (it had never seen a phone photo of a card next to its clean scan), and I didn't have enough real labeled photos to fine-tune it.
    Action: generated synthetic training pairs instead - each clean card image augmented with perspective warp, color jitter, blur, JPEG artifacts, and random backgrounds to mimic real scanning conditions - and fine-tuned CLIP's visual encoder on them.
    Outcome: recognition held up across different lighting and angles, lifting match scores from a near-random 0.45–0.58 to 0.78–0.86 on distinctive cards.
  • πŸ‡―πŸ‡΅ A broken shortcut for Japanese cards
  • Problem: I first derived Japanese card images by mapping JP numbers onto English sets - but EN and JP numbering are unrelated, so every JP match returned the wrong artwork.
    Action: re-architected the data model, scraping 27,255 JP cards as independent records with their own images, numbers, and sets, and made every search filter by language.
    Outcome: fully independent EN and JP pipelines - mixed-language scans now resolve each card to the correct artwork automatically.
  • ⚑ Three slow round-trips per scan
  • Problem: a multi-card scan made three sequential network calls (detect β†’ OCR search β†’ image search), so the UI only updated after all three finished.
    Action: collapsed them into a single /scan endpoint that batches all CLIP embeddings in one pass, runs database and OCR searches concurrently, and streams results as NDJSON.
    Outcome: the first card now appears while the last is still processing, instead of waiting for the whole batch.
  • πŸŽ₯ Live scan too slow to feel real-time
  • Problem: the original live mode used a two-step loop with a stability hold - 2–3 seconds per card, hitting the backend twice each cycle.
    Action: rewrote it as a single round-trip (one snapshot β†’ one backend call handling both detection and recognition), removing the on-device step entirely.
    Outcome: cycle time dropped to ~700–900 ms, making continuous hold-and-scan feel instant.

πŸ“– Want the full depth? The complete engineering write-up - step-by-step scan internals, model training logs and metrics, known limitations, and third-party attributions - lives in the GitHub README.

Read the full write-up on GitHub β†—

Tech Stack

  • Mobile: TypeScript, React Native, Expo, Zustand, react-native-vision-camera, react-native-fast-tflite (on-device YOLO), Google ML Kit (on-device OCR)
  • Backend: Python, FastAPI, SQLAlchemy (async), open-clip-torch, Ultralytics YOLO, Playwright + BeautifulSoup, asyncio
  • Data & caching: PostgreSQL + pgvector (47,442 embeddings, IVFFlat index), pg_trgm fuzzy search, Redis
  • Infra & sources: Docker Compose, pokemontcg.io, TCGCollector, PriceCharting, Frankfurter API

Built on open models - a fine-tuned YOLO11n (AGPL-3.0) and CLIP ViT-B/32 (MIT) - with community training data. Full attributions and licenses are in the README.