AI & automation
AI pipeline that turns saved videos into a searchable library
We built a tool that watches the short videos you save on your phone and turns them into a searchable, summarized library you can actually use later.
A tool that captures the short videos you save, transcribes and analyzes each one with several AI models, and turns them into a searchable, summarized library with a weekly digest.
- Next.js 16
- Supabase + pgvector
- Inngest
- Groq Whisper
- Gemini
- Resend
This one we built for ourselves, to prove out a pattern we bring to client work. You save a lot of short videos, and they vanish into a feed you never revisit. We built a pipeline that captures each saved video with one tap, transcribes the audio, reads the visuals, scores it, and files it into a searchable library, then emails a weekly summary of what you saved. It is the same kind of AI pipeline we build for clients, run end to end on our own.
Capturing a video had to be one tap from the phone, with no app to open and no friction.
Each video had to be transcribed, watched, and understood, which means coordinating several AI models, each good at a different part of the job.
Search had to understand meaning, not just keywords, so a video can be found by what it was actually about.
It had to cost effectively nothing to run at personal volume.
Built a one-tap capture from the phone's share menu that hands each saved video to a background pipeline, with no app to open.
Coordinated several AI models end to end: speech-to-text for the audio, a vision model to read the frames, and a reasoning model to summarize, tag, and score each video, with automatic fallback if one provider goes down.
Made the whole library searchable by meaning using vector search, so a video can be found by what it was about rather than its caption.
Grouped the saved videos into topics and sent a weekly digest of the highlights, all on free-tier infrastructure that costs effectively nothing at personal volume.
Outcomes
One tap from the phone's share menu sends a video into the pipeline, with no app to open and nothing to file by hand.
Speech-to-text, a vision model, and a reasoning model work together, with automatic fallback, to transcribe, read, summarize, and score each video.
Vector search makes the whole library findable by what each video was actually about, not just the words in its caption.
