Every production RAG system needs a vector database. The choices have multiplied: Pinecone, Weaviate, Qdrant, Milvus, Chroma, and pgvector are the primary contenders, with new entrants appearing regularly. Each makes different tradeoffs between performance, operational complexity, cost, and scalability.
This guide compares the six major options with benchmark data at 1M, 10M, and 100M vector scales, plus clear recommendations by use case.
The Landscape in 2026
The vector database market has matured significantly since the RAG boom of 2023-2024. Early fragmentation is giving way to consolidation around a few clear winners in each category:
- Managed cloud: Pinecone dominates for teams that want zero operational burden
- Self-hosted performance: Qdrant and Milvus lead for teams that want maximum throughput
- Integrated: pgvector wins for teams that want vector search inside their existing PostgreSQL
- Embedded/prototyping: Chroma remains the simplest option for development and small-scale use
- Hybrid search: Weaviate leads for teams that need strong keyword + semantic search combination
Detailed Comparison
Pinecone
Architecture: Fully managed cloud service. Serverless and pod-based deployment options. No infrastructure to manage. Strengths:- Zero operational burden. No clusters to manage, no backups to configure, no scaling to handle.
- Serverless tier scales to zero (pay only for what you use).
- Strong consistency guarantees.
- Built-in metadata filtering.
- Good documentation and developer experience.
- Most expensive at scale (highest per-query cost above 10M vectors).
- Limited querying flexibility compared to self-hosted options.
- No on-premise deployment option.
- Vendor lock-in: your data lives in Pinecone's infrastructure.
Qdrant
Architecture: Written in Rust. Available as cloud-managed service or self-hosted. Single-node and distributed cluster deployments. Strengths:- Excellent query performance (consistently among the fastest in benchmarks).
- Rich filtering capabilities with payload-based queries.
- Strong hybrid search support (sparse + dense vectors).
- Active open-source community and rapid development.
- Flexible deployment: cloud, self-hosted, or embedded.
- Lower cost than Pinecone at scale when self-hosted.
- Self-hosted deployment requires infrastructure management.
- Distributed mode adds operational complexity.
- Smaller ecosystem than Pinecone or Weaviate.
Weaviate
Architecture: Written in Go. Cloud-managed and self-hosted options. Built-in vectorization (can generate embeddings from text/images). Strengths:- Best hybrid search implementation (BM25 + vector search, well-integrated).
- Built-in vectorization modules (no external embedding service needed for basic use).
- Multi-tenancy support for SaaS applications.
- Good GraphQL API for complex queries.
- Strong schema support and data modeling.
- Higher memory usage than Qdrant or Milvus.
- Slower raw vector search performance than Qdrant at high scale.
- More complex setup and configuration.
- GraphQL API has a learning curve.
Milvus
Architecture: Distributed architecture designed for massive scale. Cloud-managed (Zilliz Cloud) and self-hosted. Written in Go with C++ core. Strengths:- Highest throughput at 100M+ vector scale.
- Purpose-built distributed architecture (not a single-node system scaled up).
- GPU-accelerated search available.
- Strong batch operation performance.
- Most mature project (started in 2019, CNCF graduated project).
- Most complex to operate self-hosted (multiple components: proxy, query nodes, data nodes, etc.).
- Higher minimum infrastructure requirements.
- Overkill for small-scale applications.
- Steeper learning curve than alternatives.
pgvector
Architecture: PostgreSQL extension. Runs inside your existing PostgreSQL database. No separate service required. Strengths:- Zero additional infrastructure (uses your existing PostgreSQL).
- Join vector search results with relational data in a single query.
- Familiar SQL interface.
- Transaction support (ACID guarantees).
- Easy to deploy and manage (if you already run PostgreSQL).
- Cost is just your PostgreSQL instance cost.
- Slower than dedicated vector databases at scale.
- Performance degrades significantly above 5M vectors without careful tuning.
- Limited indexing options (HNSW and IVFFlat).
- No built-in distributed scaling.
- Competes with your relational workload for resources.
Chroma
Architecture: Embedded database. Runs in-process with your application. Also available as a client-server deployment. Strengths:- Simplest to get started.
pip install chromadband you're running. - Embedded mode requires no infrastructure.
- Good for local development and testing.
- Clean, simple API.
- Free and open-source.
- Not designed for production scale (performance drops above 1M vectors).
- Limited durability guarantees in embedded mode.
- No distributed scaling.
- Fewer filtering and querying options than alternatives.
Benchmark Data
These benchmarks represent approximate performance ranges based on published benchmarks and our testing. Actual performance depends on hardware, indexing parameters, query patterns, and data characteristics.
Query Latency (p50, 1536 Dimensions, HNSW Index)
At 1M vectors:- Qdrant: 5-10ms
- Milvus: 5-12ms
- Pinecone: 10-20ms
- Weaviate: 8-18ms
- pgvector: 15-30ms
- Chroma: 10-25ms
- Qdrant: 8-20ms
- Milvus: 8-18ms
- Pinecone: 15-35ms
- Weaviate: 15-35ms
- pgvector: 40-100ms
- Chroma: 50-150ms (not recommended at this scale)
- Qdrant: 15-40ms
- Milvus: 12-35ms
- Pinecone: 25-60ms
- Weaviate: 30-70ms
- pgvector: not recommended at this scale
- Chroma: not recommended at this scale
Throughput (Queries per Second, Single Node)
At 1M vectors:- Qdrant: 800-1,500 QPS
- Milvus: 1,000-2,000 QPS
- Pinecone: 500-1,000 QPS (managed, varies by plan)
- Weaviate: 500-1,200 QPS
- pgvector: 200-600 QPS
- Chroma: 300-800 QPS
- Qdrant: 400-900 QPS
- Milvus: 500-1,200 QPS
- Pinecone: 300-700 QPS
- Weaviate: 300-700 QPS
- pgvector: 50-200 QPS
Memory Usage (Per 1M Vectors, 1536 Dimensions)
- Qdrant: ~6-8 GB RAM
- Milvus: ~6-10 GB RAM
- Pinecone: managed (not visible)
- Weaviate: ~8-12 GB RAM
- pgvector: ~4-6 GB (shared with PostgreSQL memory)
- Chroma: ~4-7 GB RAM
Decision Framework
Step 1: Determine Your Scale
Under 1M vectors: Any option works. Choose based on team preferences and existing infrastructure. pgvector if you use PostgreSQL. Chroma for prototyping. Qdrant or Pinecone for production. 1M-10M vectors: Dedicated vector database recommended. Qdrant, Weaviate, or Pinecone. Choose based on operational preference (managed vs self-hosted) and feature needs (hybrid search, filtering). 10M-100M+ vectors: Pinecone (managed, simple scaling), Qdrant (performance-focused), or Milvus (highest throughput). pgvector and Chroma are not appropriate at this scale.Step 2: Evaluate Operational Preferences
Want zero ops: Pinecone (fully managed) or Qdrant Cloud / Weaviate Cloud (managed options). Comfortable with infrastructure: Qdrant or Milvus self-hosted (best performance per dollar). Already run PostgreSQL: pgvector (no new infrastructure) up to 5M vectors. Just prototyping: Chroma (embedded, zero setup).Step 3: Check Feature Requirements
Need hybrid search (keyword + semantic): Weaviate (best implementation) or Qdrant (good sparse vector support). Need rich metadata filtering: Qdrant (strongest filtering) or Weaviate (GraphQL queries). Need relational joins: pgvector (SQL joins with vector results). Need multi-tenancy: Weaviate (built-in) or Qdrant (collection-per-tenant pattern). Need GPU acceleration: Milvus (native GPU support).Step 4: Consider Cost
Lowest cost at small scale (under 1M): pgvector (cost of existing PostgreSQL), Chroma (free). Lowest cost at medium scale (1M-10M): Self-hosted Qdrant or Milvus. Infrastructure costs only. Lowest cost at large scale (10M+): Self-hosted Milvus (highest throughput per dollar) or self-hosted Qdrant. Lowest operational cost (any scale): Pinecone (managed, no infrastructure team needed). Higher per-query cost but zero ops cost.Embedding Model Selection
The vector database is only as good as the embeddings it stores. Embedding model selection significantly impacts retrieval quality.
Recommended Models (2026)
OpenAI text-embedding-3-small (1536 dimensions): The default choice. Good quality, reasonable cost ($0.02/1M tokens), widely supported. OpenAI text-embedding-3-large (3072 dimensions): Higher quality, 2x storage cost. Use when retrieval precision is critical and storage/compute budget allows. Cohere embed-v3 (1024 dimensions): Strong multilingual performance. Best choice for non-English or multilingual applications. BGE-M3 (1024 dimensions, open-source): Best open-source option. Runs locally without API costs. Strong multilingual support. Nomic embed-text-v1.5 (768 dimensions, open-source): Good quality at lower dimensionality. Efficient for storage-constrained deployments.Dimension Tradeoffs
Higher dimensions improve retrieval quality for nuanced semantic queries but increase storage costs, memory usage, and query latency linearly. For most RAG applications, 1024-1536 dimensions provide a good balance. Going above 1536 provides diminishing returns unless your queries require very fine-grained semantic distinction.
Matryoshka embeddings (supported by text-embedding-3 models) let you reduce dimensions after generation. Generate at full dimensions, store at reduced dimensions (512 or 768) for a storage/quality tradeoff that you control.
Migration Strategies
Starting Fresh
Pick one database and commit. The switching cost between vector databases is moderate (re-embedding and re-indexing data takes hours to days depending on scale). But the cost of running two databases in parallel is high. Make a decision and move forward.
Migrating Between Vector Databases
The migration path:
- Export vectors and metadata from the source database
- Transform data format to match the target database's schema
- Load data into the target database
- Update application code (query API changes)
- Run parallel queries to verify result consistency
- Switch traffic to the new database
- Decommission the old database
Abstraction Layers
Libraries like LangChain and LlamaIndex abstract the vector database interface, making it possible to switch databases with configuration changes rather than code rewrites. The tradeoff: you lose access to database-specific features and optimizations. For applications that might need to change databases, using an abstraction layer from the start reduces future migration cost.
Common Pitfalls
Choosing Based on Benchmarks Alone
Published benchmarks test specific configurations that may not match your workload. Always benchmark with your actual data, query patterns, and hardware. A database that's fastest on ANN-benchmarks might not be fastest for your specific combination of dimensions, filter patterns, and query volume.
Over-Provisioning for Future Scale
Don't buy infrastructure for 100M vectors when you have 500K. Start with a solution that fits your current scale and plan migration if you outgrow it. pgvector at 500K vectors is simpler and cheaper than a Milvus cluster. Upgrade when the need is real, not when it's hypothetical.
Ignoring Hybrid Search
Pure semantic search misses exact keyword matches that users expect. "Show me documents about RFC 2616" should find documents containing that exact string, not semantically similar concepts. Hybrid search (combining vector similarity with keyword matching) improves recall by 20-30% for queries with specific terms, names, or identifiers.
Not Testing Failure Modes
What happens when the vector database is unavailable? What happens when a query returns no results above the similarity threshold? What happens when the index is being rebuilt? Production systems need answers to these questions before they handle real traffic.
Skipping Evaluation
Changing your vector database, embedding model, or chunking strategy affects retrieval quality. Measure it. Build an evaluation set of queries with known relevant documents. Run it every time you change the retrieval stack. Without evaluation, you won't know if a change improved or degraded results until users complain.