Design your vector database

Build a vector database (Pinecone / Weaviate / pgvector style) (15 scenes)

Scene 15 · Design your vector database

Capstone: pick the index, filter, hybrid, and sharding for RAG vs recommendation vs semantic cache vs billion-on-a-budget — each knob traceable to the scene that justified it.

Previously

The same primitives — vector, metric, index, filter, hybrid, shards — configure radically different systems. The capstone is choosing them deliberately for a real workload, with every knob traceable to the scene that justified it, and the trilemma as the compass.

Scene 15

Design your vector database

Watch

Diagram

The capstone canvas. Four workload cards are docked across the top — each carries a one-line constraints summary (corpus size, latency budget, RAM, how much recall it can forgive). One card is live at a time. Down the side sits the palette of every knob the arc earned: index type, metric, hybrid, sharding, re-rank. As you choose knobs, the verifier panel grades each one against the LIVE workload — green 'fits', amber 'wasteful' (you're paying on an axis this workload doesn't constrain), red 'violation' (it breaks a hard limit) — and every verdict cites the scene that justifies it.

Sources

four workloads, one toolkit

every knob traces back to a scene →

Here is everything the arc built, in one place. Four workload cards are docked across the top — *RAG over docs*, *Recommendation*, *Semantic cache*, *Billion-on-a-budget* — and each card states its real constraints: how many vectors, how tight the latency budget, how much RAM, and how much being wrong actually costs. Down the side is the palette: the *index* choices (Flat, IVF, HNSW, IVFPQ), the *metric* choice, *hybrid* (dense + sparse fused by RRF), *sharding*, and *re-rank*. Every one of those knobs was earned in an earlier scene. The job now is not to learn anything new — it's to read each workload's constraints off the *trilemma* (recall, latency, memory) and pick the honest configuration. The same toolkit; four different right answers.

Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.

PreviousDistribute it — shards, scatter-gather, the LLM stack