What the namespaces hold, and how the query flows through them.
Two namespaces. One dense vector per document. One ColBERT vector per token.
late-interaction-test — denseOne row per document · 10,000 rows · 62 MB · 1536-dim OpenAI embeddings
| id | vector (1536 dims) | text |
|---|---|---|
| 0 | [ 0.013, -0.421, 0.087, ..., 0.204 ] | How do I make money online? |
| 1 | [ 0.092, 0.155, -0.301, ..., -0.088 ] | What are the best ways to earn money from home? |
| 2 | [-0.044, 0.288, 0.117, ..., 0.331 ] | How can I start a successful online business? |
| 3 | [ 0.211, -0.067, 0.392, ..., 0.018 ] | What programming languages should I learn first? |
| ... 9,996 more rows ... | ||
| 9999 | [-0.173, 0.252, -0.041, ..., 0.110 ] | ... |
late-interaction-tokens-test — ColBERT tokensOne row per token · 157,736 rows · 83 MB · 128-dim ColBERT embeddings · doc_id is filterable
| id | vector (128 dims) | doc_id | token (not stored) |
|---|---|---|---|
| 0 | [ 0.18, -0.04, 0.22, ..., 0.09 ] | 0 | [CLS] |
| 1 | [ 0.09, 0.31, -0.12, ..., 0.27 ] | 0 | [D] |
| 2 | [-0.21, 0.08, 0.45, ..., -0.03 ] | 0 | how |
| 3 | [ 0.33, -0.17, 0.02, ..., 0.14 ] | 0 | do |
| 4 | [ 0.05, 0.29, -0.31, ..., 0.18 ] | 0 | i |
| 5 | [ 0.41, 0.13, 0.07, ..., -0.22 ] | 0 | make |
| 6 | [-0.08, 0.36, 0.19, ..., 0.31 ] | 0 | money |
| 7 | [ 0.22, -0.11, 0.28, ..., 0.06 ] | 0 | online |
| 8 | [ 0.14, 0.07, -0.05, ..., 0.19 ] | 0 | ? |
| 9 | [-0.18, 0.24, 0.31, ..., 0.02 ] | 0 | [SEP] |
| ... tokens 10–999 empty (doc 0 only has 10 tokens) ... | |||
| 1000 | [ 0.27, -0.09, 0.14, ..., 0.33 ] | 1 | [CLS] |
| 1001 | [ 0.11, 0.33, -0.21, ..., 0.04 ] | 1 | [D] |
| 1002 | [-0.14, 0.19, 0.42, ..., -0.11 ] | 1 | what |
| ... 12 more tokens for doc 1 ... | |||
| 2000 | [ 0.31, 0.05, -0.17, ..., 0.22 ] | 2 | [CLS] |
| ... 157,722 more rows ... | |||
You can't ColBERT-score every doc. So narrow the field with cheap dense ANN first, then apply ColBERT to the survivors.
Stage 1 is fast but blurry — one similarity per doc — so the right answer might land at rank 47 instead of rank 1.
Stage 2 is slow but sharp — ~480 similarities per doc — and can only fit if we first narrow to 100 candidates.
The ColBERT score for one (query, doc) pair is the sum of per-query-token best matches. Here it is with 3-dim unit vectors so the arithmetic is doable in your head.
Query: "make money" Doc A: "earn cash" q1 = "make" = [0.6, 0.8, 0.0] d1 = "earn" = [0.5, 0.7, 0.1] q2 = "money" = [0.0, 0.5, 0.9] d2 = "cash" = [0.1, 0.4, 0.9]
Dot product of two unit vectors is a similarity in [–1, 1].
| d1 = earn | d2 = cash | |
|---|---|---|
| q1 = make | 0.86 | 0.38 |
| q2 = money | 0.44 | 0.99 |
make · earn = 0.6×0.5 + 0.8×0.7 + 0.0×0.1 = 0.86 make · cash = 0.6×0.1 + 0.8×0.4 + 0.0×0.9 = 0.38 money · earn = 0.0×0.5 + 0.5×0.7 + 0.9×0.1 = 0.44 money · cash = 0.0×0.1 + 0.5×0.4 + 0.9×0.9 = 0.99
q1 = "make" → max(0.86, 0.38) = 0.86 (best: "earn") q2 = "money" → max(0.44, 0.99) = 0.99 (best: "cash")
Doc A score = 0.86 + 0.99 = 1.85
make → best match "earn" = 0.86
money → best match "cash" = 0.99
────
score = 1.85
make → best match "buy" = 0.21
money → best match "shoes" = 0.18
────
score = 0.39
Doc A wins — because every query word found a strong match, even though "earn" and "cash" aren't the same words as "make" and "money". Dense retrieval averages this signal away; MaxSim keeps each word independent.
Naive path: fetch every doc's token vectors to the client, compute MaxSim locally. Lots of network, lots of math.
Server-side path (what the guide uses): for each of the 32 query token vectors, send an ANN query into the token namespace, filtered to the 100 candidate doc IDs. turbopuffer returns the nearest doc tokens with their similarity already computed. Client just aggregates.
for each query_token q (32 total):
hits = token_ns.query(
rank_by=("vector", "ANN", q),
filters=("doc_id", "In", candidate_doc_ids),
top_k=1500
)
# hits contains (doc_id, similarity) for the nearest tokens
for doc_id, sim in best_per_doc(hits):
scores[doc_id] += sim # sum across the 32 query tokens
Those 32 queries are batched 16 at a time into multi_query, so stage 2 is 2 API calls regardless of candidate count. No raw doc vectors cross the network.
Query: "How can I make money online free of cost?" Stage 1 — dense returns top 100, ordered by cosine similarity: rank doc_id text ────────────────────────────────────────────────────────── 1 273 "How do I earn money online without investment?" 2 891 "Best ways to make passive income" 3 42 "Free online income opportunities" ... 10 0 "How do I make money online?" ← actual best match ... 47 9921 "Online business ideas for beginners" ... 100 5043 "How to budget your monthly expenses" Stage 2 — ColBERT rescores those 100: rank doc_id colbert_score ────────────────────────────────────────────────────────── 1 0 14.7 "How do I make money online?" ← promoted from rank 10 2 273 13.9 "How do I earn money online..." 3 42 13.2 "Free online income opportunities" ...
ColBERT promoted doc 0 because every query word (make, money, online) found an exact token match. Dense ranked it #10 because the extra query words (free, of, cost) diluted the averaged similarity. MaxSim doesn't care about averages — each query word picks its own best match, independently.
Built as part of the turbopuffer late interaction guide · full guide →