TCG Card Scanner
A mobile app that identifies trading cards by pointing a camera at them and instantly retrieves pricing and recent sales data.
The app runs a multi-model AI pipeline entirely from a phone photo: YOLO detects individual card boundaries in the frame, ML Kit reads the card text, and a fine-tuned CLIP model matches the card visually β all three results are fused together to produce the most accurate identification. Prices, recent sales, and trend graphs are scraped live from PriceCharting and cached in Redis. The app supports both English and Japanese cards.
The project grew from a simple single-card OCR scanner into a full multi-card batch pipeline with a streaming backend, custom-trained object detection, and a fine-tuned visual embedding model β each version addressing the limitations of the last.
Key Features
- π· Multi-card batch scanning
- Capture an entire spread of cards in one photo. The app detects each card's bounding box, crops and re-OCRs each one independently, then streams results back to the UI card-by-card as they complete β no waiting for the whole batch to finish.
- π€ Three-mode recognition engine
- OCR mode reads card text (name, number, set total) and searches the database. Image AI mode converts the card crop into a 512-dimensional visual embedding and finds the nearest match across 20,187 stored card vectors. Combined mode runs both in parallel and merges the ranked results using Reciprocal Rank Fusion, with OCR weighted 2Γ over image signal.
- π° Live pricing from PriceCharting
- Ungraded, PSA 7, PSA 8, PSA 9, and PSA 10 prices are scraped per card and cached in Redis for 24 hours. Recent sales rows include direct links to the original eBay or TCGPlayer listings. Price trends are rendered as a chart pulled from PriceCharting's embedded JavaScript data.
- ποΈ Batch price lookup
- After scanning, select multiple cards and fetch market prices for all of them at once. Each card shows its market price and most recent sale entry on a single screen.
- π―π΅ Japanese card support
- ML Kit Japanese OCR reads katakana text, which is translated to English via a 1,028-entry kana-to-English dictionary. Japanese card images are resolved from a scraped index of 27,255 cards from TCGCollector.com. Pricing uses PriceCharting's Japanese set URL scheme.
- π Fuzzy name matching
- OCR misreads like "Lotacl" or "Sulcune" are recovered by a PostgreSQL trigram similarity fallback (pg_trgm) when exact name search returns nothing. Set disambiguation uses a printed-total lookup across 172 sets to distinguish cards that share a name and number across different releases.
- β‘ Unified streaming backend
- A single POST /scan endpoint batches all CLIP embeddings in one forward pass, runs all pgvector and OCR searches in parallel via asyncio, and streams results as NDJSON. The mobile client parses the stream incrementally over XHR, updating the UI as each card resolves.
AI Models & Training
- YOLO11n β Card Detection (fine-tuned)
-
Fine-tuned from COCO pretrained weights to detect card boundaries in a phone photo. Replaced an earlier OpenCV contour approach that failed on holofoil surfaces and overlapping cards.
Dataset assembled from three sources: 221 own photos (Roboflow, auto-labeled), 576 images from the TCG Detector Roboflow dataset, and 891 images from Aaron's Raw Photos dataset β 1,688 images total after format conversion and merging.
Training config: 50 epochs Β· imgsz 640 Β· batch 16 Β· AMD Ryzen 5 5600X (CPU only) Β· 3.68 hours
Final results (best.pt):Metric Value Target mAP50 0.992 > 0.85 mAP50-95 0.904 > 0.70 Precision 0.977 β Recall 0.985 β Inference speed 33.9ms/image (CPU) β - CLIP ViT-B/32 β Visual Embedding (fine-tuned)
-
CLIP converts a card crop into a 512-dimensional vector. pgvector finds the nearest match across all 20,187 embedded cards using an IVFFlat index. The base CLIP model was pre-trained on general internet images β it understands visual categories but not specific card identity, confusing visually similar PokΓ©mon.
Fine-tuning closed this domain gap by training on (clean official card art, simulated phone photo) pairs using InfoNCE contrastive loss. Augmentation pipeline: paste card onto random background texture β perspective warp β color jitter β Gaussian blur β JPEG compression β art-region crop (y=12%β52%). Only the visual encoder was fine-tuned (87.8M parameters); the text encoder was frozen.
Training config: 10 epochs Β· 82,964 pairs/epoch (20,741 cards Γ 4 augmentations) Β· AdamW lr=1e-5 Β· cosine LR schedule Β· RTX 3080 Β· ~13 hours total
Epoch log:Epoch Loss LR Duration 1 0.0255 9.76e-06 78 min 2 0.0098 9.05e-06 77 min 3 0.0099 7.96e-06 78 min 4 0.0095 6.58e-06 78 min 5 0.0088 5.05e-06 76 min 6 0.0080 3.52e-06 77 min 7 β best 0.0077 2.14e-06 77 min 8 0.0081 1.05e-06 77 min 9 0.0083 3.42e-07 79 min 10 0.0081 1.00e-07 82 min
Best weights saved at epoch 7 (loss 0.0077). Re-embedded all 20,187 cards with fine-tuned weights; IVFFlat index rebuilt. 50 cards unembeddable due to CDN 404s (McDonald's Collection promos). - Google ML Kit OCR
- On-device text recognition for English and Japanese scripts. Runs entirely on the phone β no network call, no latency cost. Extracts card name, HP, card number, and set total. Misreads are recovered downstream via fuzzy matching.
Tech Stack
Mobile (Frontend)
- TypeScript
- React Native
- Expo
- Zustand β global scan state
- react-native-chart-kit β price trend graphs
- Google ML Kit β on-device OCR
Backend
- Python
- FastAPI
- open-clip-torch β CLIP ViT-B/32 inference and fine-tuning
- Ultralytics β YOLO11n inference
- Playwright β TCGCollector.com scraper (Japanese card images)
- BeautifulSoup β PriceCharting scraper
Database & Caching
- PostgreSQL with pgvector β card metadata and 512-dim embeddings, IVFFlat index
- Redis β price cache (24h TTL), search cache (1h TTL)
Infrastructure
- Docker / Docker Compose β local dev environment
- pokemontcg.io API β card metadata and images
- PriceCharting β pricing, sales history, trend data (scraped)