Multimodal AI Intern
Vardera Labs
Software Engineering, Data Science
Boston, MA, USA · Boston, MA, USA · Remote
Posted on May 19, 2025
Multimodal AI Intern
🗣
Interested in this role? Email us at careers@varderalabs.com
About the Role
Join Vardera Labs as a Multimodal AI Intern! You'll help build cutting-edge systems that understand and reason over both images and text — specifically focused on coins and collectibles submitted to auction platforms and resale sites. You’ll work with computer vision models, large language models (LLMs), and tools that structure messy data (images and text) into clean, actionable outputs. Whether you're a pro or only used PyTorch once — let's chat.
This role starts as soon as you're ready!
About Us
Vardera Labs is building the next generation of AI infrastructure for collectible marketplaces — starting with coins, expanding to everything else. We're a funded, product-focused startup led by experienced founders in AI. Our mission is to bring trust, transparency, and intelligence to high-value resale markets.
What You’ll Work On
Train and fine-tune vision-language models (e.g., Qwen, LLaMA) to extract structured metadata from item images
Help improve image quality detection using classical vision techniques (e.g., blur detection)
Build and validate structured JSON outputs using tools like Pydantic or Guardrails
Explore model evaluation strategies, including visual mismatch detection and anomaly scoring
Technologies You Might Use
PyTorch, Transformers, OpenCV
LLaMA, ViT, Qwen
FAISS, Pydantic, Hugging Face
Docker, MLFlow, GCP
Who You Are
A current student or recent grad in CS, ML, Computer Vision, or a related field
Comfortable with Python and familiar with deep learning concepts
Curious about multimodal AI and applying it to real-world messy data
Bonus: experience with image data, embeddings, or LLMs
Bonus: interest or experience in collectibles, resale markets, or authentication problems
What You’ll Get
Mentorship from a founder with deep AI/ML experience
A chance to ship real product-facing ML tools
Flexibility (hybrid or remote-friendly)
Resume-building experience with state-of-the-art vision and language models