Late Interaction on turbopuffer

What the namespaces hold, and how the query flows through them.

1. What lives in the database

Two namespaces. One dense vector per document. One ColBERT vector per token.

Namespace 1 late-interaction-test — dense

One row per document · 10,000 rows · 62 MB · 1536-dim OpenAI embeddings

idvector (1536 dims)text
0[ 0.013, -0.421, 0.087, ..., 0.204 ]How do I make money online?
1[ 0.092, 0.155, -0.301, ..., -0.088 ]What are the best ways to earn money from home?
2[-0.044, 0.288, 0.117, ..., 0.331 ]How can I start a successful online business?
3[ 0.211, -0.067, 0.392, ..., 0.018 ]What programming languages should I learn first?
... 9,996 more rows ...
9999[-0.173, 0.252, -0.041, ..., 0.110 ]...

Namespace 2 late-interaction-tokens-test — ColBERT tokens

One row per token · 157,736 rows · 83 MB · 128-dim ColBERT embeddings · doc_id is filterable

idvector (128 dims)doc_idtoken (not stored)
0[ 0.18, -0.04, 0.22, ..., 0.09 ]0[CLS]
1[ 0.09, 0.31, -0.12, ..., 0.27 ]0[D]
2[-0.21, 0.08, 0.45, ..., -0.03 ]0how
3[ 0.33, -0.17, 0.02, ..., 0.14 ]0do
4[ 0.05, 0.29, -0.31, ..., 0.18 ]0i
5[ 0.41, 0.13, 0.07, ..., -0.22 ]0make
6[-0.08, 0.36, 0.19, ..., 0.31 ]0money
7[ 0.22, -0.11, 0.28, ..., 0.06 ]0online
8[ 0.14, 0.07, -0.05, ..., 0.19 ]0?
9[-0.18, 0.24, 0.31, ..., 0.02 ]0[SEP]
... tokens 10–999 empty (doc 0 only has 10 tokens) ...
1000[ 0.27, -0.09, 0.14, ..., 0.33 ]1[CLS]
1001[ 0.11, 0.33, -0.21, ..., 0.04 ]1[D]
1002[-0.14, 0.19, 0.42, ..., -0.11 ]1what
... 12 more tokens for doc 1 ...
2000[ 0.31, 0.05, -0.17, ..., 0.22 ]2[CLS]
... 157,722 more rows ...
ID SCHEME
token_id = doc_id × 1000 + tok_idx
doc 0 takes IDs 0–9, doc 1 takes 1000–1014, doc 42 takes 42000–42007, etc.

2. Query flow (the funnel)

You can't ColBERT-score every doc. So narrow the field with cheap dense ANN first, then apply ColBERT to the survivors.

Full corpus · 10,000,000 docs
↓ Stage 1: dense ANN against 1536-dim vectors · ~15 ms server-side
Top 100 candidates (rough ordering)
↓ Stage 2: ColBERT MaxSim against 128-dim token vectors · ~130 ms server-side
Top 10 reranked (precise ordering)

Stage 1 is fast but blurry — one similarity per doc — so the right answer might land at rank 47 instead of rank 1.
Stage 2 is slow but sharp — ~480 similarities per doc — and can only fit if we first narrow to 100 candidates.


3. MaxSim, in tiny numbers

The ColBERT score for one (query, doc) pair is the sum of per-query-token best matches. Here it is with 3-dim unit vectors so the arithmetic is doable in your head.

Set up the vectors

Query: "make money"                      Doc A: "earn cash"
q1 = "make"  = [0.6, 0.8, 0.0]           d1 = "earn"  = [0.5, 0.7, 0.1]
q2 = "money" = [0.0, 0.5, 0.9]           d2 = "cash"  = [0.1, 0.4, 0.9]

Compute every pairwise dot product

Dot product of two unit vectors is a similarity in [–1, 1].

d1 = earnd2 = cash
q1 = make0.860.38
q2 = money0.440.99
make · earn  = 0.6×0.5 + 0.8×0.7 + 0.0×0.1 = 0.86
make · cash  = 0.6×0.1 + 0.8×0.4 + 0.0×0.9 = 0.38
money · earn = 0.0×0.5 + 0.5×0.7 + 0.9×0.1 = 0.44
money · cash = 0.0×0.1 + 0.5×0.4 + 0.9×0.9 = 0.99

"Max": each query token keeps its best doc match

q1 = "make"  → max(0.86, 0.38) = 0.86   (best: "earn")
q2 = "money" → max(0.44, 0.99) = 0.99   (best: "cash")

"Sum": add up across query tokens

Doc A score = 0.86 + 0.99 = 1.85

Compare to another doc

Doc A: "earn cash"
make  → best match "earn"  = 0.86
money → best match "cash"  = 0.99
                       ────
              score = 1.85
Doc B: "buy shoes"
make  → best match "buy"   = 0.21
money → best match "shoes" = 0.18
                       ────
              score = 0.39

Doc A wins — because every query word found a strong match, even though "earn" and "cash" aren't the same words as "make" and "money". Dense retrieval averages this signal away; MaxSim keeps each word independent.

What real ColBERT does

vector dims
128 instead of 3
query tokens
32 (with [MASK] padding for "query expansion")
doc tokens
~15 per Quora question, up to 180 per longer doc
pairwise table
32 × 15 = 480 dot products per candidate
score per doc
one number (sum of 32 row-wise maxes)

4. How turbopuffer executes stage 2

Naive path: fetch every doc's token vectors to the client, compute MaxSim locally. Lots of network, lots of math.

Server-side path (what the guide uses): for each of the 32 query token vectors, send an ANN query into the token namespace, filtered to the 100 candidate doc IDs. turbopuffer returns the nearest doc tokens with their similarity already computed. Client just aggregates.

for each query_token q (32 total):
    hits = token_ns.query(
        rank_by=("vector", "ANN", q),
        filters=("doc_id", "In", candidate_doc_ids),
        top_k=1500
    )
    # hits contains (doc_id, similarity) for the nearest tokens
    for doc_id, sim in best_per_doc(hits):
        scores[doc_id] += sim   # sum across the 32 query tokens

Those 32 queries are batched 16 at a time into multi_query, so stage 2 is 2 API calls regardless of candidate count. No raw doc vectors cross the network.


5. Putting it together for one query

Query: "How can I make money online free of cost?"

Stage 1 — dense returns top 100, ordered by cosine similarity:
  rank  doc_id  text
  ──────────────────────────────────────────────────────────
  1     273     "How do I earn money online without investment?"
  2     891     "Best ways to make passive income"
  3     42      "Free online income opportunities"
  ...
  10    0       "How do I make money online?"          ← actual best match
  ...
  47    9921    "Online business ideas for beginners"
  ...
  100   5043    "How to budget your monthly expenses"

Stage 2 — ColBERT rescores those 100:
  rank  doc_id  colbert_score
  ──────────────────────────────────────────────────────────
  1     0       14.7   "How do I make money online?"   ← promoted from rank 10
  2     273     13.9   "How do I earn money online..."
  3     42      13.2   "Free online income opportunities"
  ...

ColBERT promoted doc 0 because every query word (make, money, online) found an exact token match. Dense ranked it #10 because the extra query words (free, of, cost) diluted the averaged similarity. MaxSim doesn't care about averages — each query word picks its own best match, independently.

Built as part of the turbopuffer late interaction guide · full guide →