Every production RAG system needs a vector database. The choices have multiplied: Pinecone, Weaviate, Qdrant, Milvus, Chroma, and pgvector are the primary contenders, with new entrants appearing regularly. Each makes different tradeoffs between performance, operational complexity, cost, and scalability.

This guide compares the six major options with benchmark data at 1M, 10M, and 100M vector scales, plus clear recommendations by use case.

The Landscape in 2026

AI market intelligence showing trends, funding, and hiring velocity

The vector database market has matured significantly since the RAG boom of 2023-2024. Early fragmentation is giving way to consolidation around a few clear winners in each category:

  • Managed cloud: Pinecone dominates for teams that want zero operational burden
  • Self-hosted performance: Qdrant and Milvus lead for teams that want maximum throughput
  • Integrated: pgvector wins for teams that want vector search inside their existing PostgreSQL
  • Embedded/prototyping: Chroma remains the simplest option for development and small-scale use
  • Hybrid search: Weaviate leads for teams that need strong keyword + semantic search combination

Detailed Comparison

Pinecone

Architecture: Fully managed cloud service. Serverless and pod-based deployment options. No infrastructure to manage. Strengths:
  • Zero operational burden. No clusters to manage, no backups to configure, no scaling to handle.
  • Serverless tier scales to zero (pay only for what you use).
  • Strong consistency guarantees.
  • Built-in metadata filtering.
  • Good documentation and developer experience.
Weaknesses:
  • Most expensive at scale (highest per-query cost above 10M vectors).
  • Limited querying flexibility compared to self-hosted options.
  • No on-premise deployment option.
  • Vendor lock-in: your data lives in Pinecone's infrastructure.
Best for: Teams that prioritize operational simplicity over cost optimization. Startups and small teams without dedicated infrastructure engineers. Applications where reliability matters more than per-query cost. Pricing: Serverless starts at ~$0.33/1M reads, $2/1M writes, plus storage. Pod-based starts at ~$70/month for the smallest pod. Costs scale with vector count, dimension, and query volume.

Qdrant

Architecture: Written in Rust. Available as cloud-managed service or self-hosted. Single-node and distributed cluster deployments. Strengths:
  • Excellent query performance (consistently among the fastest in benchmarks).
  • Rich filtering capabilities with payload-based queries.
  • Strong hybrid search support (sparse + dense vectors).
  • Active open-source community and rapid development.
  • Flexible deployment: cloud, self-hosted, or embedded.
  • Lower cost than Pinecone at scale when self-hosted.
Weaknesses:
  • Self-hosted deployment requires infrastructure management.
  • Distributed mode adds operational complexity.
  • Smaller ecosystem than Pinecone or Weaviate.
Best for: Teams that want strong performance and are comfortable managing infrastructure. Applications that need rich filtering and hybrid search. Cost-conscious teams at 1M-100M vector scale. Pricing: Cloud starts at ~$65/month. Self-hosted: infrastructure costs only. Typically 30-50% cheaper than Pinecone at comparable scale.

Weaviate

Architecture: Written in Go. Cloud-managed and self-hosted options. Built-in vectorization (can generate embeddings from text/images). Strengths:
  • Best hybrid search implementation (BM25 + vector search, well-integrated).
  • Built-in vectorization modules (no external embedding service needed for basic use).
  • Multi-tenancy support for SaaS applications.
  • Good GraphQL API for complex queries.
  • Strong schema support and data modeling.
Weaknesses:
  • Higher memory usage than Qdrant or Milvus.
  • Slower raw vector search performance than Qdrant at high scale.
  • More complex setup and configuration.
  • GraphQL API has a learning curve.
Best for: Applications that need strong hybrid search (keyword + semantic). Multi-tenant SaaS products. Teams that want built-in vectorization. Document-heavy applications with structured metadata. Pricing: Serverless starts at ~$25/month. Standard: $75-$1,500/month depending on resources. Self-hosted: infrastructure costs only.

Milvus

Architecture: Distributed architecture designed for massive scale. Cloud-managed (Zilliz Cloud) and self-hosted. Written in Go with C++ core. Strengths:
  • Highest throughput at 100M+ vector scale.
  • Purpose-built distributed architecture (not a single-node system scaled up).
  • GPU-accelerated search available.
  • Strong batch operation performance.
  • Most mature project (started in 2019, CNCF graduated project).
Weaknesses:
  • Most complex to operate self-hosted (multiple components: proxy, query nodes, data nodes, etc.).
  • Higher minimum infrastructure requirements.
  • Overkill for small-scale applications.
  • Steeper learning curve than alternatives.
Best for: Very large-scale applications (50M+ vectors). Teams with dedicated infrastructure engineers. High-throughput batch processing workloads. Applications requiring GPU-accelerated search. Pricing: Zilliz Cloud starts at ~$65/month. Self-hosted: infrastructure costs vary widely by scale ($200-$2,000+/month for moderate to large deployments).

pgvector

Architecture: PostgreSQL extension. Runs inside your existing PostgreSQL database. No separate service required. Strengths:
  • Zero additional infrastructure (uses your existing PostgreSQL).
  • Join vector search results with relational data in a single query.
  • Familiar SQL interface.
  • Transaction support (ACID guarantees).
  • Easy to deploy and manage (if you already run PostgreSQL).
  • Cost is just your PostgreSQL instance cost.
Weaknesses:
  • Slower than dedicated vector databases at scale.
  • Performance degrades significantly above 5M vectors without careful tuning.
  • Limited indexing options (HNSW and IVFFlat).
  • No built-in distributed scaling.
  • Competes with your relational workload for resources.
Best for: Teams that already use PostgreSQL and want to avoid adding infrastructure. Applications under 5M vectors. Use cases that need to join vector results with relational data. Prototypes and MVPs. Pricing: Cost of your PostgreSQL instance. On AWS RDS, a db.m5.xlarge (suitable for ~1M vectors): ~$200/month. No per-query pricing.

Chroma

Architecture: Embedded database. Runs in-process with your application. Also available as a client-server deployment. Strengths:
  • Simplest to get started. pip install chromadb and you're running.
  • Embedded mode requires no infrastructure.
  • Good for local development and testing.
  • Clean, simple API.
  • Free and open-source.
Weaknesses:
  • Not designed for production scale (performance drops above 1M vectors).
  • Limited durability guarantees in embedded mode.
  • No distributed scaling.
  • Fewer filtering and querying options than alternatives.
Best for: Local development and prototyping. Small-scale applications (under 1M vectors). Embedded use cases. Getting started with vector search quickly. Pricing: Free (open-source). Cloud hosting in development.

Benchmark Data

These benchmarks represent approximate performance ranges based on published benchmarks and our testing. Actual performance depends on hardware, indexing parameters, query patterns, and data characteristics.

Query Latency (p50, 1536 Dimensions, HNSW Index)

At 1M vectors:
  • Qdrant: 5-10ms
  • Milvus: 5-12ms
  • Pinecone: 10-20ms
  • Weaviate: 8-18ms
  • pgvector: 15-30ms
  • Chroma: 10-25ms
At 10M vectors:
  • Qdrant: 8-20ms
  • Milvus: 8-18ms
  • Pinecone: 15-35ms
  • Weaviate: 15-35ms
  • pgvector: 40-100ms
  • Chroma: 50-150ms (not recommended at this scale)
At 100M vectors:
  • Qdrant: 15-40ms
  • Milvus: 12-35ms
  • Pinecone: 25-60ms
  • Weaviate: 30-70ms
  • pgvector: not recommended at this scale
  • Chroma: not recommended at this scale

Throughput (Queries per Second, Single Node)

At 1M vectors:
  • Qdrant: 800-1,500 QPS
  • Milvus: 1,000-2,000 QPS
  • Pinecone: 500-1,000 QPS (managed, varies by plan)
  • Weaviate: 500-1,200 QPS
  • pgvector: 200-600 QPS
  • Chroma: 300-800 QPS
At 10M vectors:
  • Qdrant: 400-900 QPS
  • Milvus: 500-1,200 QPS
  • Pinecone: 300-700 QPS
  • Weaviate: 300-700 QPS
  • pgvector: 50-200 QPS

Memory Usage (Per 1M Vectors, 1536 Dimensions)

  • Qdrant: ~6-8 GB RAM
  • Milvus: ~6-10 GB RAM
  • Pinecone: managed (not visible)
  • Weaviate: ~8-12 GB RAM
  • pgvector: ~4-6 GB (shared with PostgreSQL memory)
  • Chroma: ~4-7 GB RAM

Decision Framework

Step 1: Determine Your Scale

Under 1M vectors: Any option works. Choose based on team preferences and existing infrastructure. pgvector if you use PostgreSQL. Chroma for prototyping. Qdrant or Pinecone for production. 1M-10M vectors: Dedicated vector database recommended. Qdrant, Weaviate, or Pinecone. Choose based on operational preference (managed vs self-hosted) and feature needs (hybrid search, filtering). 10M-100M+ vectors: Pinecone (managed, simple scaling), Qdrant (performance-focused), or Milvus (highest throughput). pgvector and Chroma are not appropriate at this scale.

Step 2: Evaluate Operational Preferences

Want zero ops: Pinecone (fully managed) or Qdrant Cloud / Weaviate Cloud (managed options). Comfortable with infrastructure: Qdrant or Milvus self-hosted (best performance per dollar). Already run PostgreSQL: pgvector (no new infrastructure) up to 5M vectors. Just prototyping: Chroma (embedded, zero setup).

Step 3: Check Feature Requirements

Need hybrid search (keyword + semantic): Weaviate (best implementation) or Qdrant (good sparse vector support). Need rich metadata filtering: Qdrant (strongest filtering) or Weaviate (GraphQL queries). Need relational joins: pgvector (SQL joins with vector results). Need multi-tenancy: Weaviate (built-in) or Qdrant (collection-per-tenant pattern). Need GPU acceleration: Milvus (native GPU support).

Step 4: Consider Cost

Lowest cost at small scale (under 1M): pgvector (cost of existing PostgreSQL), Chroma (free). Lowest cost at medium scale (1M-10M): Self-hosted Qdrant or Milvus. Infrastructure costs only. Lowest cost at large scale (10M+): Self-hosted Milvus (highest throughput per dollar) or self-hosted Qdrant. Lowest operational cost (any scale): Pinecone (managed, no infrastructure team needed). Higher per-query cost but zero ops cost.

Embedding Model Selection

The vector database is only as good as the embeddings it stores. Embedding model selection significantly impacts retrieval quality.

Recommended Models (2026)

OpenAI text-embedding-3-small (1536 dimensions): The default choice. Good quality, reasonable cost ($0.02/1M tokens), widely supported. OpenAI text-embedding-3-large (3072 dimensions): Higher quality, 2x storage cost. Use when retrieval precision is critical and storage/compute budget allows. Cohere embed-v3 (1024 dimensions): Strong multilingual performance. Best choice for non-English or multilingual applications. BGE-M3 (1024 dimensions, open-source): Best open-source option. Runs locally without API costs. Strong multilingual support. Nomic embed-text-v1.5 (768 dimensions, open-source): Good quality at lower dimensionality. Efficient for storage-constrained deployments.

Dimension Tradeoffs

Higher dimensions improve retrieval quality for nuanced semantic queries but increase storage costs, memory usage, and query latency linearly. For most RAG applications, 1024-1536 dimensions provide a good balance. Going above 1536 provides diminishing returns unless your queries require very fine-grained semantic distinction.

Matryoshka embeddings (supported by text-embedding-3 models) let you reduce dimensions after generation. Generate at full dimensions, store at reduced dimensions (512 or 768) for a storage/quality tradeoff that you control.

Migration Strategies

Starting Fresh

Pick one database and commit. The switching cost between vector databases is moderate (re-embedding and re-indexing data takes hours to days depending on scale). But the cost of running two databases in parallel is high. Make a decision and move forward.

Migrating Between Vector Databases

The migration path:

  1. Export vectors and metadata from the source database
  2. Transform data format to match the target database's schema
  3. Load data into the target database
  4. Update application code (query API changes)
  5. Run parallel queries to verify result consistency
  6. Switch traffic to the new database
  7. Decommission the old database
Most migrations take 1-2 weeks of engineering time for small to medium deployments. Large-scale migrations (100M+ vectors) can take 2-4 weeks including validation.

Abstraction Layers

Libraries like LangChain and LlamaIndex abstract the vector database interface, making it possible to switch databases with configuration changes rather than code rewrites. The tradeoff: you lose access to database-specific features and optimizations. For applications that might need to change databases, using an abstraction layer from the start reduces future migration cost.

Common Pitfalls

Choosing Based on Benchmarks Alone

Published benchmarks test specific configurations that may not match your workload. Always benchmark with your actual data, query patterns, and hardware. A database that's fastest on ANN-benchmarks might not be fastest for your specific combination of dimensions, filter patterns, and query volume.

Over-Provisioning for Future Scale

Don't buy infrastructure for 100M vectors when you have 500K. Start with a solution that fits your current scale and plan migration if you outgrow it. pgvector at 500K vectors is simpler and cheaper than a Milvus cluster. Upgrade when the need is real, not when it's hypothetical.

Ignoring Hybrid Search

Pure semantic search misses exact keyword matches that users expect. "Show me documents about RFC 2616" should find documents containing that exact string, not semantically similar concepts. Hybrid search (combining vector similarity with keyword matching) improves recall by 20-30% for queries with specific terms, names, or identifiers.

Not Testing Failure Modes

What happens when the vector database is unavailable? What happens when a query returns no results above the similarity threshold? What happens when the index is being rebuilt? Production systems need answers to these questions before they handle real traffic.

Skipping Evaluation

Changing your vector database, embedding model, or chunking strategy affects retrieval quality. Measure it. Build an evaluation set of queries with known relevant documents. Run it every time you change the retrieval stack. Without evaluation, you won't know if a change improved or degraded results until users complain.

Frequently Asked Questions

Based on our analysis of 37,339 AI job postings, demand for AI engineers keeps growing. The most in-demand skills include Python, RAG systems, and LLM frameworks like LangChain.
Based on our job market analysis, the most requested skills include: Python, RAG (Retrieval-Augmented Generation), LangChain, AWS, and experience with production ML systems. Rust is emerging as a valuable skill for performance-critical AI applications.
We collect data from major job boards and company career pages, tracking AI, ML, and prompt engineering roles. Our database is updated weekly and includes only verified job postings with disclosed requirements.
It depends on scale and operational preferences. Under 1M vectors: pgvector (simplest, runs inside PostgreSQL). 1M-10M vectors: Qdrant or Weaviate (strong performance, reasonable ops burden). 10M-100M+ vectors: Pinecone (managed, scales effortlessly) or Milvus (self-hosted, highest throughput). For prototyping: Chroma (embedded, zero setup).
Chroma: free (embedded). pgvector: cost of your PostgreSQL instance ($50-$500/month). Qdrant Cloud: $65-$1,200/month depending on vectors and replicas. Weaviate Cloud: $75-$1,500/month. Pinecone: $70-$2,000+/month for serverless, higher for pods. Milvus (self-hosted): infrastructure costs only ($200-$2,000/month for moderate scale). Self-hosted options trade lower costs for higher operational burden.
At 1M vectors with 1536 dimensions: pgvector 15-30ms, Qdrant 5-15ms, Weaviate 8-20ms, Pinecone 10-25ms, Milvus 5-12ms. At 10M vectors: latencies roughly double. At 100M vectors: only Pinecone, Milvus, and Qdrant maintain sub-100ms p99 latency. These are approximate figures; actual performance depends heavily on hardware, indexing strategy, and query patterns.
Use pgvector if: you already use PostgreSQL, your vectors are under 1M, you want to avoid adding another database to your stack, and you need to join vector search results with relational data. Use a dedicated vector database if: you need sub-10ms latency, you're scaling past 5M vectors, you need advanced filtering, or vector search is your primary workload.
OpenAI text-embedding-3-small: 1536 dimensions (good default). text-embedding-3-large: 3072 dimensions (higher quality, 2x storage). Cohere embed-v3: 1024 dimensions (strong multilingual). For most RAG applications, 1536 dimensions balances quality and cost. You can reduce dimensions with Matryoshka embeddings if storage is a concern. Higher dimensions improve recall on nuanced semantic queries.
RT

About the Author

Founder, AI Pulse

Rome Thorndike is the founder of AI Pulse, a career intelligence platform for AI professionals. He tracks the AI job market through analysis of thousands of active job postings, providing data-driven insights on salaries, skills, and hiring trends.

Connect on LinkedIn →

Get Weekly AI Career Insights

Join our newsletter for AI job market trends, salary data, and career guidance.

Get AI Career Intel

Weekly salary data, skills demand, and market signals from 16,000+ AI job postings.

Free weekly email. Unsubscribe anytime.