Operator-honest · Siren-based ranking · 2026-05-11

Pinecone · Weaviate · Qdrant · Milvus / Zilliz · Chroma · pgvector · Turbopuffer · MongoDB Atlas Vector · Vespa · LanceDB.
One question: which one is right for your stage?

Q: What's the recall vs latency tradeoff at scale?

Every vector DB at scale trades recall (% of true nearest neighbors found) against latency. Default index settings give 90-95% recall at sub-100ms latency for most workloads. Pushing recall to 98-99% typically doubles or triples latency. Pinecone, Weaviate, Qdrant, Milvus all expose recall-vs-latency tuning knobs (HNSW efSearch, IVF nprobe, etc) — measure on YOUR workload, don't trust marketing benchmark numbers. The honest 2026 default: 95% recall at sub-100ms p99 is the production sweet spot for most AI features. If you need 99%+ recall (e.g. legal discovery, medical search), accept the 200-500ms p99 cost or use a smaller-corpus exact search.

Q: How do I size vector DB infrastructure for 1M / 100M / 1B vectors?

Rough operator-honest sizing (varies by embedding dimension, recall target, query rate): 1M vectors at 1536-dim (OpenAI ada-002 / text-embedding-3-small): ~6GB raw + ~2-4GB HNSW index = single-node trivial on $20-50/mo VPS or Pinecone free tier. 10M vectors: ~60GB raw + ~20-40GB index = mid-range single-node ($200-500/mo VPS or Pinecone Standard). 100M vectors: ~600GB raw + ~200-400GB index = small distributed cluster ($2K-5K/mo Kubernetes or Pinecone Enterprise tier). 1B vectors: ~6TB raw + ~2-4TB index = serious distributed cluster ($10K-50K+/mo of nodes, GPU-accelerated indexing typically required, or Zilliz Cloud Dedicated / Pinecone Enterprise tier). Cold-storage workloads (Turbopuffer) trade these compute costs for object-storage costs which are ~10-100x cheaper at large scale.

Q: When does scale dictate the engine vs when does workload dictate the engine?

Workload typically dictates the engine until ~50M vectors — at small-to-medium scale most engines work; pick based on hybrid needs, multi-tenant needs, embedded vs hosted, and DX. Scale starts to dictate the engine around 100M vectors — purpose-built billion-scale engines (Milvus/Zilliz, Vespa, Pinecone Enterprise) become structurally necessary; pgvector and MongoDB Atlas Vector start to lose on $/QPS, Chroma + LanceDB hit embedded-mode ceilings. By 1B+ vectors the shortlist compresses to 4 realistic options: Pinecone (hosted), Milvus/Zilliz (OSS or Zilliz Cloud), Vespa (search-engine workloads), Turbopuffer (cold-storage). Everything else either can't scale that far or can't compete on $/vector at that scale.

Q: Multi-region + global low-latency — which engines support it natively?

Pinecone Enterprise + Multi-region (multiple cloud regions, automatic replication) — strongest native support. Zilliz Cloud Dedicated tier supports multi-region. Weaviate Cloud Services + Enterprise BYOC can deploy multi-region with engineering effort. Qdrant + Milvus self-host can be deployed multi-region with manual replication architecture. pgvector inherits Postgres replication (which is real but operationally complex for multi-region). MongoDB Atlas Vector inherits Atlas multi-region (real and operationally polished). Turbopuffer + LanceDB serverless inherit object-storage multi-region (cheap but adds latency on cold queries). The honest 2026 default for global low-latency AI products: Pinecone Enterprise + multi-region replication is the cleanest path; everything else requires meaningful engineering investment.

Honest 10-way comparison of Vector Databases — Scale & Recall Comparison (1M vs 100M vs 1B vectors · QPS · p99 latency · index strategy) across Pinecone · Weaviate · Qdrant · Milvus/Zilliz · Chroma · pgvector · Turbopuffer · MongoDB Atlas Vector · Vespa · LanceDB platforms. No vendor sponsorship. Calling Matrix by buyer persona below — operator's siren-based read on which one to pick when you're forced to pick.

The 10 platforms · what each is actually best at.

Honest read on positioning, ideal customer, and where each one is the wrong call. No vendor sponsorship, no affiliate links — operator-grade signal.

1. Pinecone Strong at 1M-1B vectors · Sub-50ms p99 hosted serverless · Multi-region

Strong across the full scale spectrum (1M to billion-vector workloads) with consistent sub-50ms p99 latency on hosted serverless. 1M vectors: trivial, sub-30ms p99. 100M vectors: serverless tier handles cleanly with ~30-50ms p99. 1B+ vectors: Enterprise tier with dedicated capacity + multi-region replication. Recall: HNSW-based with custom ANN tuning, well-tuned defaults give 95%+ recall at sub-50ms latency. The default hosted production-pick when scale is unpredictable and you want one engine that doesn't require re-architecting at each scale milestone.

✓ Strongest atSub-50ms p99 latency from 1M through 1B+ vectors, well-tuned recall defaults (95%+ on most workloads), zero re-architecture as scale grows, multi-region replication for global low-latency, hybrid sparse + dense search.

✗ Wrong forCold-storage-heavy workloads at billion scale (Turbopuffer cheaper per-query at low-QPS), shops needing absolute highest recall on tuned per-workload index (Milvus multiple index types win), on-prem requirements (no native on-prem option).

Pick Pinecone if: scale is unpredictable and you want one hosted engine that delivers sub-50ms p99 from 1M to 1B+ vectors.

2. Weaviate Strong at 1M-500M vectors · Sub-100ms p99 · Multi-tenant scale path

Strong at 1M-500M vectors with sub-100ms p99 on hosted, multi-tenant architecture scales horizontally for SaaS workloads. 1M vectors: trivial, sub-50ms p99. 100M vectors: solid with proper sharding, ~50-100ms p99. 500M-1B vectors: requires careful tuning + multi-shard architecture, p99 climbs. Multi-tenant: scales to thousands of tenants per cluster with namespace isolation. Recall: HNSW + BM25 fusion gives strong hybrid recall.

✓ Strongest atMulti-tenant scale (thousands of tenants per cluster), strong hybrid recall (BM25 + vector fusion), 1M-500M vector range with sub-100ms p99, OSS + cloud parity at any scale.

✗ Wrong forPure billion-vector workloads (Pinecone + Milvus/Zilliz scale further), absolute lowest p99 latency (Pinecone's serverless slightly faster), cold-storage-heavy workloads (Turbopuffer cheaper).

Pick Weaviate if: multi-tenant scale across thousands of tenants matters more than absolute billion-vector ceiling.

3. Qdrant Strong at 1M-200M vectors self-host · Sub-100ms p99 · Filtered search optimized

Strong at 1M-200M vectors with sub-100ms p99, particularly strong on filtered vector search (vector + metadata predicate in one query). 1M vectors: trivial on a single $20/mo VPS. 100M vectors: handles cleanly on a $200-500/mo Kubernetes cluster. 200M-1B vectors: requires careful sharding + collection design, p99 climbs without GPU acceleration. Recall: HNSW with payload-aware indexing gives strong filtered-recall performance — Qdrant is purpose-built for 'find similar vectors WITH these metadata filters' as one operation, not vector-then-filter.

✓ Strongest atFiltered vector search at 1M-200M scale (purpose-built for vector + metadata predicates fused), self-host scale on commodity hardware, single-binary deployment scales horizontally, payload-aware indexing.

✗ Wrong forPure billion-vector enterprise workloads (Milvus/Zilliz designed for that), absolute hosted-managed scale ceiling (Pinecone wins), GPU-accelerated indexing (Milvus wins).

Pick Qdrant if: filtered-vector-search at 1M-200M self-host scale is your workload.

4. Milvus / Zilliz Designed for 100M-10B+ vectors · GPU-accelerated · Multiple index types per workload

Purpose-built for billion-vector enterprise scale — GPU-accelerated indexing (CAGRA, IVF-PQ on GPU) handles workloads other engines can't reach. 1M-100M vectors: overkill (operational complexity not justified). 100M-1B vectors: shines, with multiple index types tunable per workload (HNSW for accuracy, DiskANN for cost-efficient disk indexing, IVF-PQ for speed, GPU-CAGRA for throughput). 1B+ vectors: the only realistic OSS option besides Vespa. Recall: highest in category at scale due to per-workload index choice and GPU acceleration.

✓ Strongest atBillion-vector enterprise scale, GPU-accelerated indexing (10x+ throughput vs CPU), multiple index types per workload (tune per use case), distributed architecture for horizontal scale, OSS Apache 2.0 + Zilliz Cloud both.

✗ Wrong forTeams under 50M vectors (operational complexity not justified), prototyping (use Chroma + LanceDB), shops without ops capacity for distributed clusters or GPU operations.

Pick Milvus / Zilliz if: 100M-10B+ vectors with GPU-accelerated indexing is the workload.

5. Chroma Strong at 0.1M-10M vectors embedded · Server mode emerging · Prototyping leader

Strong at 0.1M-10M vectors in embedded mode — the prototyping + small-production sweet spot. 0.1M vectors: trivial, sub-10ms p99 in-process. 1M vectors: solid embedded performance. 10M vectors: approaching the embedded mode practical ceiling — Chroma Cloud + distributed architecture in development for higher scale. 100M+ vectors: not the right tool — use Pinecone, Qdrant, or Milvus. Recall: HNSW supported, less tuned than purpose-built engines at scale.

✓ Strongest atEmbedded mode at 0.1M-10M vectors with sub-10ms p99 in-process, prototyping velocity, local-first AI app scale, simplest API in category.

✗ Wrong for100M+ vector workloads (Pinecone + Qdrant + Milvus designed for that), high-QPS production hosted (purpose-built engines win), enterprise scale.

Pick Chroma if: 0.1M-10M vector embedded scale is your workload and prototyping velocity matters more than scale ceiling.

6. pgvector Strong at 0.1M-50M vectors · HNSW since 0.5 · Postgres-tier-dependent scale

Strong at 0.1M-50M vectors with HNSW index — performance scales with the Postgres compute tier you provision. 0.1M-1M vectors: trivial on a $25/mo Supabase Pro tier. 1M-10M vectors: handles cleanly with HNSW index on appropriately sized Postgres. 10M-50M vectors: requires careful Postgres tuning + larger compute tier. 50M+ vectors: recall + QPS start to lose to purpose-built engines. The 'one less dependency' scale story has a real ceiling — but for 90% of use cases that ceiling is higher than the workload requires.

✓ Strongest at0.1M-50M vector scale with HNSW recall comparable to purpose-built engines at smaller scale, scales linearly with Postgres compute tier, transactional consistency between vector + relational data.

✗ Wrong for100M+ vector workloads (purpose-built engines win on $/QPS at scale), high-throughput production AI (Pinecone + Qdrant faster), GPU-accelerated indexing needs (Milvus wins).

Pick pgvector if: your scale ceiling is 50M vectors or less and Postgres-tier scaling is acceptable.

7. Turbopuffer Designed for 10M-10B+ cold-storage vectors · 100-300ms p99 · Object-storage backed

Designed for huge-corpus low-QPS workloads — scales to 10B+ vectors on object storage with 100-300ms p99 query latency. 10M vectors: works but Pinecone simpler at this scale. 100M-1B vectors: Turbopuffer's sweet spot — cold-storage economics dominate when query rate is low. 1B-10B+ vectors: cleanest path in category for cold-storage scale (Vespa + Milvus alternatives but more operational overhead). Latency: 100-300ms p99 vs Pinecone's 30-50ms — the trade-off is the entire point.

✓ Strongest at10B+ vector cold-storage scale, lowest $/stored-vector at extreme scale, serverless object-storage backend, archival + audit + research workload fit.

✗ Wrong forReal-time AI products (latency too high vs Pinecone), high-QPS workloads (always-on compute wins), small-to-medium scale (Pinecone + Qdrant simpler at <100M vectors).

Pick Turbopuffer if: 100M-10B+ cold-storage scale + 100-300ms p99 latency tolerance is your workload.

8. MongoDB Atlas Vector Strong at 1M-100M vectors · Atlas Search architecture · MongoDB-native scale

Strong at 1M-100M vectors using Atlas Search's HNSW implementation — scales with the MongoDB Atlas cluster tier you provision. 1M vectors: trivial on M10+ Atlas cluster. 10M-50M vectors: solid with appropriate cluster sizing. 50M-100M vectors: requires larger cluster tier + careful sharding. 100M+ vectors: not the right architecture — Pinecone or Milvus designed for that. Recall: HNSW comparable to purpose-built at moderate scale; degrades vs Milvus + Pinecone at billion scale.

✓ Strongest at1M-100M vector scale with MongoDB document + vector queries fused, Atlas-native scaling (no separate vector cluster), MongoDB-shop scale story.

✗ Wrong forNon-MongoDB shops (purpose-built engines win), 100M+ billion-vector workloads (Milvus + Pinecone designed for that), cost-sensitive scale (Atlas pricing not cheap).

Pick MongoDB Atlas Vector if: 1M-100M scale + MongoDB-native architecture matters more than pure vector engine performance.

9. Vespa Battle-tested at billion-document scale · Yahoo + Spotify production · Distributed

The most battle-tested billion-doc-scale engine in the list — runs Yahoo Mail (billions of documents), Spotify recommendations, OkCupid matching at production scale for over a decade. 1M-100M documents: overkill. 100M-1B documents: shines, with hybrid lexical + vector + ML-ranking in one query at sub-200ms p99. 1B-10B+ documents: Vespa's actual lane — distributed architecture purpose-built for search-engine scale. Recall: A on hybrid retrieval; ML-ranking models run in-engine for second-stage reranking.

✓ Strongest atBattle-tested billion-doc-scale (Yahoo + Spotify production proof), hybrid lexical + vector + ML-ranking in one query, distributed architecture, in-engine reranking with ML models.

✗ Wrong forSolo founders + small teams (operational complexity prohibitive), teams under 100M documents (overkill — Weaviate + Qdrant simpler), prototyping.

Pick Vespa if: billion-document hybrid + ML-ranking scale is your workload and you have search-engine ops capacity.

10. LanceDB Strong at 1M-100M multi-modal vectors · Embedded + serverless emerging · Lance format

Strong at 1M-100M vectors with multi-modal data in same storage layer — Lance columnar format optimizes for ML-workload access patterns (random access + versioning + analytics queries). 1M vectors: trivial in embedded mode. 10M-100M vectors: solid with serverless cloud + object-storage backend. 100M+ vectors: serverless mode emerging, ceiling depends on workload. Recall: IVF + HNSW on Lance format — competitive at 1M-100M scale for multi-modal workloads.

✓ Strongest atMulti-modal vector scale (image + text + audio + video in one storage), columnar Lance format for analytics-grade access, embedded Python/JS/Rust at small-to-medium scale, serverless cloud emerging.

✗ Wrong forPure billion-vector workloads (Pinecone + Milvus + Vespa designed for that), high-QPS production hosted (purpose-built engines win), simplest API (Chroma wins at small scale).

Pick LanceDB if: 1M-100M multi-modal vector scale with analytics-grade access is your workload.

The Calling Matrix · siren-based ranking by who you are.

Most comparison sites refuse to forced-rank because their revenue depends on staying neutral. SideGuy ranks because it doesn't take vendor money. Here's the call by buyer persona.

🚀 If you're a Solo founder at 0.1M-10M vector scale (RAG over docs · semantic search prototype)

Your problem: You're at small scale today — 0.1M-10M vectors, single-tenant, 1-100 QPS. Pick wrong and you'll re-architect in 12 months. Pick right and the engine grows with you to Series A. See the Vector Databases megapage for the full operator-honest cluster.

Pinecone — Sub-50ms p99 from prototype through 1B+ — same engine, no re-architecture as scale grows
Chroma — 0.1M-10M sweet spot embedded mode — sub-10ms p99 in-process
pgvector — 0.1M-50M Postgres-tier scaling — JOIN with relational data, no second DB
Qdrant — 1M-200M self-host range — single Rust binary, scales with VPS tier
Weaviate — 1M-500M with hybrid + multi-tenant if SaaS-shape from day one

If forced to one pick: Pinecone — sub-50ms p99 from 0.1M through 1B+ means zero re-architecture cost as you scale. The substrate that grows with you.

📈 If you're a Series A startup at 10M-100M vector scale (production RAG · 100-1K QPS)

Your problem: Real production scale. 10M-100M vectors, 100-1K QPS, sub-100ms p99 budget. Customer-facing AI features depend on this engine. Pair with the AI Infrastructure megapage for the model-substrate decision.

Pinecone — Production-default at this scale — sub-50ms p99 hosted serverless, zero ops
Weaviate — 10M-500M range with hybrid + multi-tenant; cheaper than Pinecone at comparable scale
Qdrant — 10M-200M self-host on Kubernetes — strong filtered-search performance
Milvus / Zilliz — If approaching 100M vectors and considering GPU-accelerated indexing for cost-efficiency
pgvector — If still on Postgres at this scale — feasible to 50M with HNSW + larger Postgres tier

If forced to one pick: Pinecone or Weaviate Cloud — production-default hosted with sub-100ms p99 at 10M-100M scale. Self-host (Qdrant) wins on TCO if ops capacity exists.

🏢 If you're a Mid-market at 100M-1B vector scale (multi-team · hybrid filtering · 1K-10K QPS)

Your problem: 100M-1B vectors across multiple teams + use cases. Hybrid search (vector + keyword + metadata filter) load-bearing. Sub-200ms p99 budget. Single-engine standardization being evaluated. Coordinate with the Compliance Authority Graph for SOC 2 + DPA across multi-team usage.

Pinecone — Enterprise tier with dedicated capacity at 100M-1B scale — sub-100ms p99 with PrivateLink + multi-region
Milvus / Zilliz — Purpose-built for 100M-1B+ scale with GPU-accelerated indexing — best recall + QPS at this scale
Weaviate — 100M-500M scale with strongest hybrid + multi-tenant in category — A+ if hybrid is core
Vespa — If hybrid lexical + vector + ML-ranking at 1B+ doc scale is the workload
Turbopuffer — If significant cold-storage component at 1B scale — pair with Pinecone for hot path

If forced to one pick: Milvus/Zilliz for billion-vector scale + Pinecone for hosted hot-path workloads. Two engines, complementary scale stories.

🏛 If you're a Enterprise CTO at 1B+ vector scale (5-year substrate · multi-team · multi-cloud)

Your problem: Billion-vector enterprise scale. Multiple teams, multi-cloud reality, 5-year substrate decision. Recall + QPS at scale dominates the decision. AI-baked-in vs AI-bolted-on matters at this horizon. See /operator cockpit for the operator-layer view.

Milvus / Zilliz — 1B-10B+ scale with GPU-accelerated indexing + multiple index types — only realistic OSS option besides Vespa
Pinecone — Hosted production-default at 1B+ scale with strongest compliance posture + multi-region replication
Vespa — Battle-tested at billion-doc Yahoo + Spotify scale — search-engine architecture if hybrid + ML-ranking are core
Weaviate — Enterprise tier with BYOC at 500M-1B range — strong if multi-tenant SaaS workload
Turbopuffer — If significant cold-storage at 10B+ scale — complement to Pinecone hot path

If forced to one pick: Pinecone Enterprise (hot path) + Milvus/Zilliz (billion-scale + on-prem regulated) + Turbopuffer (cold-storage). Three engines, one operator-honest billion-scale stack.

⚠ Operator-honest read

These rankings are SideGuy's lived-data + observed-buyer-pattern read as of 2026-05-11. They're directional, not gospel. The right answer for YOUR specific situation may diverge — text PJ for a 10-min operator-honest read on your actual buying context.

Vendor pricing + features + market positioning shift quarterly. SideGuy may earn referral commissions from some of these vendors, but rankings are independent — affiliate relationships never change rank order. Sister doctrines: /open/ live operator dashboard · install packs · operator network.

Or skip all of them. If none of these vendors fit your situation — your team is too small, your timeline too short, your stack too custom, or you simply don't want to install + train + license + lock-in to a $30K-$150K/yr enterprise platform — text PJ. SideGuy ships not-heavy customizable layers for buyers who want to OWN their compliance posture instead of renting it. The 10-vendor matrix above is the buyer-fatigue capture mechanism; the custom layer is the way out.

FAQ · most asked questions.

What's the recall vs latency tradeoff at scale?

Every vector DB at scale trades recall (% of true nearest neighbors found) against latency. Default index settings give 90-95% recall at sub-100ms latency for most workloads. Pushing recall to 98-99% typically doubles or triples latency. Pinecone, Weaviate, Qdrant, Milvus all expose recall-vs-latency tuning knobs (HNSW efSearch, IVF nprobe, etc) — measure on YOUR workload, don't trust marketing benchmark numbers. The honest 2026 default: 95% recall at sub-100ms p99 is the production sweet spot for most AI features. If you need 99%+ recall (e.g. legal discovery, medical search), accept the 200-500ms p99 cost or use a smaller-corpus exact search.

How do I size vector DB infrastructure for 1M / 100M / 1B vectors?

Rough operator-honest sizing (varies by embedding dimension, recall target, query rate): 1M vectors at 1536-dim (OpenAI ada-002 / text-embedding-3-small): ~6GB raw + ~2-4GB HNSW index = single-node trivial on $20-50/mo VPS or Pinecone free tier. 10M vectors: ~60GB raw + ~20-40GB index = mid-range single-node ($200-500/mo VPS or Pinecone Standard). 100M vectors: ~600GB raw + ~200-400GB index = small distributed cluster ($2K-5K/mo Kubernetes or Pinecone Enterprise tier). 1B vectors: ~6TB raw + ~2-4TB index = serious distributed cluster ($10K-50K+/mo of nodes, GPU-accelerated indexing typically required, or Zilliz Cloud Dedicated / Pinecone Enterprise tier). Cold-storage workloads (Turbopuffer) trade these compute costs for object-storage costs which are ~10-100x cheaper at large scale.

When does scale dictate the engine vs when does workload dictate the engine?

Workload typically dictates the engine until ~50M vectors — at small-to-medium scale most engines work; pick based on hybrid needs, multi-tenant needs, embedded vs hosted, and DX. Scale starts to dictate the engine around 100M vectors — purpose-built billion-scale engines (Milvus/Zilliz, Vespa, Pinecone Enterprise) become structurally necessary; pgvector and MongoDB Atlas Vector start to lose on $/QPS, Chroma + LanceDB hit embedded-mode ceilings. By 1B+ vectors the shortlist compresses to 4 realistic options: Pinecone (hosted), Milvus/Zilliz (OSS or Zilliz Cloud), Vespa (search-engine workloads), Turbopuffer (cold-storage). Everything else either can't scale that far or can't compete on $/vector at that scale.

Multi-region + global low-latency — which engines support it natively?

Pinecone Enterprise + Multi-region (multiple cloud regions, automatic replication) — strongest native support. Zilliz Cloud Dedicated tier supports multi-region. Weaviate Cloud Services + Enterprise BYOC can deploy multi-region with engineering effort. Qdrant + Milvus self-host can be deployed multi-region with manual replication architecture. pgvector inherits Postgres replication (which is real but operationally complex for multi-region). MongoDB Atlas Vector inherits Atlas multi-region (real and operationally polished). Turbopuffer + LanceDB serverless inherit object-storage multi-region (cheap but adds latency on cold queries). The honest 2026 default for global low-latency AI products: Pinecone Enterprise + multi-region replication is the cleanest path; everything else requires meaningful engineering investment.

Stuck choosing? Text PJ.

10-minute operator-honest read on your actual buying context. No deck, no demo call, no signup. If we're not the right fit, we'll say so.

📱 Text PJ · 858-461-8054

Audit in 6 weeks? Enterprise customer waiting? Regulator finding?

Skip the 5 vendor demos. 30-day delivery. No procurement cycle. No demo theater. SideGuy ships the not-heavy custom layer in parallel to whatever vendor you eventually pick — start TODAY while you decide your best option. Custom builds in 30 days →

📱 Urgent? Text PJ · 858-461-8054

You can go at it without SideGuy — but no custom shareables for your friends & family. You'll be short a bag of laughs. 🌸

Pinecone · Weaviate · Qdrant · Milvus / Zilliz · Chroma · pgvector · Turbopuffer · MongoDB Atlas Vector · Vespa · LanceDB.One question: which one is right for your stage?