Honest 10-way comparison of Vector Databases — Scale & Recall Comparison (1M vs 100M vs 1B vectors · QPS · p99 latency · index strategy) across Pinecone · Weaviate · Qdrant · Milvus/Zilliz · Chroma · pgvector · Turbopuffer · MongoDB Atlas Vector · Vespa · LanceDB platforms. No vendor sponsorship. Calling Matrix by buyer persona below — operator's siren-based read on which one to pick when you're forced to pick.
Honest read on positioning, ideal customer, and where each one is the wrong call. No vendor sponsorship, no affiliate links — operator-grade signal.
Strong across the full scale spectrum (1M to billion-vector workloads) with consistent sub-50ms p99 latency on hosted serverless. 1M vectors: trivial, sub-30ms p99. 100M vectors: serverless tier handles cleanly with ~30-50ms p99. 1B+ vectors: Enterprise tier with dedicated capacity + multi-region replication. Recall: HNSW-based with custom ANN tuning, well-tuned defaults give 95%+ recall at sub-50ms latency. The default hosted production-pick when scale is unpredictable and you want one engine that doesn't require re-architecting at each scale milestone.
Strong at 1M-500M vectors with sub-100ms p99 on hosted, multi-tenant architecture scales horizontally for SaaS workloads. 1M vectors: trivial, sub-50ms p99. 100M vectors: solid with proper sharding, ~50-100ms p99. 500M-1B vectors: requires careful tuning + multi-shard architecture, p99 climbs. Multi-tenant: scales to thousands of tenants per cluster with namespace isolation. Recall: HNSW + BM25 fusion gives strong hybrid recall.
Strong at 1M-200M vectors with sub-100ms p99, particularly strong on filtered vector search (vector + metadata predicate in one query). 1M vectors: trivial on a single $20/mo VPS. 100M vectors: handles cleanly on a $200-500/mo Kubernetes cluster. 200M-1B vectors: requires careful sharding + collection design, p99 climbs without GPU acceleration. Recall: HNSW with payload-aware indexing gives strong filtered-recall performance — Qdrant is purpose-built for 'find similar vectors WITH these metadata filters' as one operation, not vector-then-filter.
Purpose-built for billion-vector enterprise scale — GPU-accelerated indexing (CAGRA, IVF-PQ on GPU) handles workloads other engines can't reach. 1M-100M vectors: overkill (operational complexity not justified). 100M-1B vectors: shines, with multiple index types tunable per workload (HNSW for accuracy, DiskANN for cost-efficient disk indexing, IVF-PQ for speed, GPU-CAGRA for throughput). 1B+ vectors: the only realistic OSS option besides Vespa. Recall: highest in category at scale due to per-workload index choice and GPU acceleration.
Strong at 0.1M-10M vectors in embedded mode — the prototyping + small-production sweet spot. 0.1M vectors: trivial, sub-10ms p99 in-process. 1M vectors: solid embedded performance. 10M vectors: approaching the embedded mode practical ceiling — Chroma Cloud + distributed architecture in development for higher scale. 100M+ vectors: not the right tool — use Pinecone, Qdrant, or Milvus. Recall: HNSW supported, less tuned than purpose-built engines at scale.
Strong at 0.1M-50M vectors with HNSW index — performance scales with the Postgres compute tier you provision. 0.1M-1M vectors: trivial on a $25/mo Supabase Pro tier. 1M-10M vectors: handles cleanly with HNSW index on appropriately sized Postgres. 10M-50M vectors: requires careful Postgres tuning + larger compute tier. 50M+ vectors: recall + QPS start to lose to purpose-built engines. The 'one less dependency' scale story has a real ceiling — but for 90% of use cases that ceiling is higher than the workload requires.
Designed for huge-corpus low-QPS workloads — scales to 10B+ vectors on object storage with 100-300ms p99 query latency. 10M vectors: works but Pinecone simpler at this scale. 100M-1B vectors: Turbopuffer's sweet spot — cold-storage economics dominate when query rate is low. 1B-10B+ vectors: cleanest path in category for cold-storage scale (Vespa + Milvus alternatives but more operational overhead). Latency: 100-300ms p99 vs Pinecone's 30-50ms — the trade-off is the entire point.
Strong at 1M-100M vectors using Atlas Search's HNSW implementation — scales with the MongoDB Atlas cluster tier you provision. 1M vectors: trivial on M10+ Atlas cluster. 10M-50M vectors: solid with appropriate cluster sizing. 50M-100M vectors: requires larger cluster tier + careful sharding. 100M+ vectors: not the right architecture — Pinecone or Milvus designed for that. Recall: HNSW comparable to purpose-built at moderate scale; degrades vs Milvus + Pinecone at billion scale.
The most battle-tested billion-doc-scale engine in the list — runs Yahoo Mail (billions of documents), Spotify recommendations, OkCupid matching at production scale for over a decade. 1M-100M documents: overkill. 100M-1B documents: shines, with hybrid lexical + vector + ML-ranking in one query at sub-200ms p99. 1B-10B+ documents: Vespa's actual lane — distributed architecture purpose-built for search-engine scale. Recall: A on hybrid retrieval; ML-ranking models run in-engine for second-stage reranking.
Strong at 1M-100M vectors with multi-modal data in same storage layer — Lance columnar format optimizes for ML-workload access patterns (random access + versioning + analytics queries). 1M vectors: trivial in embedded mode. 10M-100M vectors: solid with serverless cloud + object-storage backend. 100M+ vectors: serverless mode emerging, ceiling depends on workload. Recall: IVF + HNSW on Lance format — competitive at 1M-100M scale for multi-modal workloads.
Most comparison sites refuse to forced-rank because their revenue depends on staying neutral. SideGuy ranks because it doesn't take vendor money. Here's the call by buyer persona.
Your problem: You're at small scale today — 0.1M-10M vectors, single-tenant, 1-100 QPS. Pick wrong and you'll re-architect in 12 months. Pick right and the engine grows with you to Series A. See the Vector Databases megapage for the full operator-honest cluster.
Your problem: Real production scale. 10M-100M vectors, 100-1K QPS, sub-100ms p99 budget. Customer-facing AI features depend on this engine. Pair with the AI Infrastructure megapage for the model-substrate decision.
Your problem: 100M-1B vectors across multiple teams + use cases. Hybrid search (vector + keyword + metadata filter) load-bearing. Sub-200ms p99 budget. Single-engine standardization being evaluated. Coordinate with the Compliance Authority Graph for SOC 2 + DPA across multi-team usage.
Your problem: Billion-vector enterprise scale. Multiple teams, multi-cloud reality, 5-year substrate decision. Recall + QPS at scale dominates the decision. AI-baked-in vs AI-bolted-on matters at this horizon. See /operator cockpit for the operator-layer view.
These rankings are SideGuy's lived-data + observed-buyer-pattern read as of 2026-05-11. They're directional, not gospel. The right answer for YOUR specific situation may diverge — text PJ for a 10-min operator-honest read on your actual buying context.
Vendor pricing + features + market positioning shift quarterly. SideGuy may earn referral commissions from some of these vendors, but rankings are independent — affiliate relationships never change rank order. Sister doctrines: /open/ live operator dashboard · install packs · operator network.
Or skip all of them. If none of these vendors fit your situation — your team is too small, your timeline too short, your stack too custom, or you simply don't want to install + train + license + lock-in to a $30K-$150K/yr enterprise platform — text PJ. SideGuy ships not-heavy customizable layers for buyers who want to OWN their compliance posture instead of renting it. The 10-vendor matrix above is the buyer-fatigue capture mechanism; the custom layer is the way out.
Every vector DB at scale trades recall (% of true nearest neighbors found) against latency. Default index settings give 90-95% recall at sub-100ms latency for most workloads. Pushing recall to 98-99% typically doubles or triples latency. Pinecone, Weaviate, Qdrant, Milvus all expose recall-vs-latency tuning knobs (HNSW efSearch, IVF nprobe, etc) — measure on YOUR workload, don't trust marketing benchmark numbers. The honest 2026 default: 95% recall at sub-100ms p99 is the production sweet spot for most AI features. If you need 99%+ recall (e.g. legal discovery, medical search), accept the 200-500ms p99 cost or use a smaller-corpus exact search.
Rough operator-honest sizing (varies by embedding dimension, recall target, query rate): 1M vectors at 1536-dim (OpenAI ada-002 / text-embedding-3-small): ~6GB raw + ~2-4GB HNSW index = single-node trivial on $20-50/mo VPS or Pinecone free tier. 10M vectors: ~60GB raw + ~20-40GB index = mid-range single-node ($200-500/mo VPS or Pinecone Standard). 100M vectors: ~600GB raw + ~200-400GB index = small distributed cluster ($2K-5K/mo Kubernetes or Pinecone Enterprise tier). 1B vectors: ~6TB raw + ~2-4TB index = serious distributed cluster ($10K-50K+/mo of nodes, GPU-accelerated indexing typically required, or Zilliz Cloud Dedicated / Pinecone Enterprise tier). Cold-storage workloads (Turbopuffer) trade these compute costs for object-storage costs which are ~10-100x cheaper at large scale.
Workload typically dictates the engine until ~50M vectors — at small-to-medium scale most engines work; pick based on hybrid needs, multi-tenant needs, embedded vs hosted, and DX. Scale starts to dictate the engine around 100M vectors — purpose-built billion-scale engines (Milvus/Zilliz, Vespa, Pinecone Enterprise) become structurally necessary; pgvector and MongoDB Atlas Vector start to lose on $/QPS, Chroma + LanceDB hit embedded-mode ceilings. By 1B+ vectors the shortlist compresses to 4 realistic options: Pinecone (hosted), Milvus/Zilliz (OSS or Zilliz Cloud), Vespa (search-engine workloads), Turbopuffer (cold-storage). Everything else either can't scale that far or can't compete on $/vector at that scale.
Pinecone Enterprise + Multi-region (multiple cloud regions, automatic replication) — strongest native support. Zilliz Cloud Dedicated tier supports multi-region. Weaviate Cloud Services + Enterprise BYOC can deploy multi-region with engineering effort. Qdrant + Milvus self-host can be deployed multi-region with manual replication architecture. pgvector inherits Postgres replication (which is real but operationally complex for multi-region). MongoDB Atlas Vector inherits Atlas multi-region (real and operationally polished). Turbopuffer + LanceDB serverless inherit object-storage multi-region (cheap but adds latency on cold queries). The honest 2026 default for global low-latency AI products: Pinecone Enterprise + multi-region replication is the cleanest path; everything else requires meaningful engineering investment.
10-minute operator-honest read on your actual buying context. No deck, no demo call, no signup. If we're not the right fit, we'll say so.
📱 Text PJ · 858-461-8054Skip the 5 vendor demos. 30-day delivery. No procurement cycle. No demo theater. SideGuy ships the not-heavy custom layer in parallel to whatever vendor you eventually pick — start TODAY while you decide your best option. Custom builds in 30 days →
📱 Urgent? Text PJ · 858-461-8054I'm almost positive I can help. If I can't, you don't pay.
No signup. No seminar. No bullshit.
Don't see what you were looking for?
Text PJ a sentence about what you actually need — I'll build you a free custom shareable on the house. No email, no funnel, no SOW.
📲 Text PJ — free shareable