TCG Card Scanner
A mobile app that identifies trading cards by pointing a camera at them and instantly retrieves pricing and recent sales data. Supports both English and Japanese PokΓ©mon cards across all eras.
Multi-scan - photograph up to 20 cards at once
Live scan - continuous 1-by-1 identification in real time
Note: price fetch times for newly scanned cards reflect live web scraping latency. This would be eliminated with a dedicated pricing API - an intentional upgrade path if the project scales.
TL;DR - Two main use cases:
- Display case / binder scan: photograph an entire spread of cards at once. The app detects each card, identifies it, and streams prices back one-by-one as they resolve - no waiting for the full batch.
- Bulk 1-by-1 scan: hold cards up to the camera in sequence. Each card is identified automatically and added to a running session list in real time, with a total value updating as you go.
Both modes use fine-tuned on-device and server-side ML models: a custom-trained YOLO11n for card detection, and a fine-tuned CLIP visual embedding model for image-based identification - neither is a third-party recognition API.
The app fuses three signals to identify a card: YOLO detects each card's boundaries, ML Kit reads its text on-device, and a fine-tuned CLIP model matches it visually. Prices, recent sales, and trend graphs are scraped live from PriceCharting and cached in Redis. It grew across 31 iterations - from a single-card OCR scanner to a multi-model streaming pipeline - with each version answering a concrete failure in the one before it.
Key Features
- π· Multi-card batch scanning
- Photograph a whole spread of cards; each one is detected, identified, and streamed back to the UI one-by-one as it resolves (first card in ~3s on a Samsung S22+).
- π₯ Live continuous scanning
- Hold cards up to the camera and they're identified automatically into a running session list - no shutter tap, with a two-frame confirmation gate to reject phantom matches mid-swap.
- π€ Three-mode recognition engine
- OCR text search, CLIP image matching across 47,442 card vectors, or both fused via Reciprocal Rank Fusion for the most accurate identification.
- π―π΅ Automatic Japanese support
- A single photo of mixed English and Japanese cards is handled with no toggle; JP names are translated and matched against 27,255 first-class JP records (all eras, 1996βpresent).
- π° Live pricing & sales
- Ungraded and PSA 7β10 prices, recent eBay/TCGPlayer sales, and trend charts scraped from PriceCharting and cached in Redis, with an instant USD/JPY toggle.
- π Fuzzy search & disambiguation
- Partial names, set hints, and exclusion terms, with trigram matching to recover OCR misreads and a printed-total lookup to separate cards that share a name and number across sets.
Engineering story
The interesting part of this project was the iteration. Each major rewrite started from a specific failure - here are the five biggest, as problem β action β outcome.
- π― Finding cards in a messy photo
-
Problem: my first detector inferred card positions by clustering OCR text - it failed completely on holofoil, face-down, or overlapping cards with no readable text.
Action: trained a YOLO11n detector on 1,688 real images, then built a synthetic-data pipeline that pastes card art onto random desk and display-case backgrounds with auto-generated bounding boxes (zero manual labeling).
Outcome: detection became robust to surface finish and overlap, and bounding-box accuracy (mAP50-95) rose from 0.904 to 0.964 - giving the matcher cleaner crops. - π§ Too little real data to train the visual matcher
-
Problem: an off-the-shelf image model gave near-random matches (it had never seen a phone photo of a card next to its clean scan), and I didn't have enough real labeled photos to fine-tune it.
Action: generated synthetic training pairs instead - each clean card image augmented with perspective warp, color jitter, blur, JPEG artifacts, and random backgrounds to mimic real scanning conditions - and fine-tuned CLIP's visual encoder on them.
Outcome: recognition held up across different lighting and angles, lifting match scores from a near-random 0.45β0.58 to 0.78β0.86 on distinctive cards. - π―π΅ A broken shortcut for Japanese cards
-
Problem: I first derived Japanese card images by mapping JP numbers onto English sets - but EN and JP numbering are unrelated, so every JP match returned the wrong artwork.
Action: re-architected the data model, scraping 27,255 JP cards as independent records with their own images, numbers, and sets, and made every search filter by language.
Outcome: fully independent EN and JP pipelines - mixed-language scans now resolve each card to the correct artwork automatically. - β‘ Three slow round-trips per scan
-
Problem: a multi-card scan made three sequential network calls (detect β OCR search β image search), so the UI only updated after all three finished.
Action: collapsed them into a single/scanendpoint that batches all CLIP embeddings in one pass, runs database and OCR searches concurrently, and streams results as NDJSON.
Outcome: the first card now appears while the last is still processing, instead of waiting for the whole batch. - π₯ Live scan too slow to feel real-time
-
Problem: the original live mode used a two-step loop with a stability hold - 2β3 seconds per card, hitting the backend twice each cycle.
Action: rewrote it as a single round-trip (one snapshot β one backend call handling both detection and recognition), removing the on-device step entirely.
Outcome: cycle time dropped to ~700β900 ms, making continuous hold-and-scan feel instant.
π Want the full depth? The complete engineering write-up - step-by-step scan internals, model training logs and metrics, known limitations, and third-party attributions - lives in the GitHub README.
Read the full write-up on GitHub βTech Stack
- Mobile: TypeScript, React Native, Expo, Zustand, react-native-vision-camera, react-native-fast-tflite (on-device YOLO), Google ML Kit (on-device OCR)
- Backend: Python, FastAPI, SQLAlchemy (async), open-clip-torch, Ultralytics YOLO, Playwright + BeautifulSoup, asyncio
- Data & caching: PostgreSQL + pgvector (47,442 embeddings, IVFFlat index), pg_trgm fuzzy search, Redis
- Infra & sources: Docker Compose, pokemontcg.io, TCGCollector, PriceCharting, Frankfurter API
Built on open models - a fine-tuned YOLO11n (AGPL-3.0) and CLIP ViT-B/32 (MIT) - with community training data. Full attributions and licenses are in the README.