Vector databases are the infrastructure layer that makes similarity search fast at scale. They have moved from a niche ML tool to core infrastructure in AI application development. Here is why they matter and how they work.
What Vector Databases Do
Traditional databases search for exact matches: “find all rows where name = ‘Berlin'”. Vector databases search by similarity: “find the 10 most semantically similar items to this query.” This is enabled by storing data as embeddings — high-dimensional number vectors that encode semantic meaning. A text passage, an image, or an audio clip can all be converted to a vector; vectors for semantically similar items cluster in the high-dimensional space. The database provides efficient nearest-neighbour search across millions or billions of vectors, a problem that is computationally expensive without specialised indexing structures (HNSW, IVFFlat, PQ).
Why AI Applications Need Them
Four AI application types that require vector databases: RAG (retrieval-augmented generation) — finding relevant context from large document collections before prompting an LLM; semantic search — returning results based on meaning rather than keywords (better for natural language queries); recommendation systems — finding items similar to what a user has shown interest in; and multimodal search — finding images by text description, or audio by content. Each of these requires embedding all your content, storing the embeddings, and doing fast nearest-neighbour lookups at query time.
The Main Options
Pinecone: fully managed, simple API, expensive at scale, best for teams that want zero infrastructure overhead. Weaviate: open-source and managed options, strong multi-modal support, more complex to operate than Pinecone. Chroma: open-source, Python-native, excellent for local development and prototyping, production-ready with some caveats. Qdrant: open-source with a managed cloud option, strong performance benchmarks, good Rust client. pgvector: PostgreSQL extension — if you already use PostgreSQL, this eliminates a separate database for most use cases below 10M vectors. Faiss: Meta’s library for local vector search, powerful but not a database (no persistence layer, no API).
Practical Guidance
For prototyping: Chroma (local, no setup). For production with existing PostgreSQL: pgvector (no separate service). For dedicated vector search at scale: Qdrant (open-source) or Pinecone (managed). The performance differences between options become significant above 10M vectors; below that, any option works. The embedding model choice matters more than the database choice: better embeddings produce better retrieval quality, and retrieval quality is the primary determinant of RAG application quality. OpenAI’s text-embedding-3-large and Anthropic’s voyage embeddings (available via the Anthropic API) are current quality leaders for text; CLIP and ALIGN for multimodal.




