Introduction

I’ve been playing with semantic search because sometimes keyword-based lookups just don’t cut it, especially when I want to find conceptually similar documents rather than ones that happen to share the same words. By using PostgreSQL’s pgvector extension, I can keep everything in a relational database I already use, while getting fast, approximate nearest-neighbor search on embeddings I generate locally.

Why I Like This Setup

  • Unified Stack: No need for a separate vector DB; just extend Postgres.
  • SQL Power: I can combine vector similarity with regular SQL filters (dates, tags, user IDs).
  • Simplicity: One database to backup, one set of credentials, and everything lives in my existing infra.

High-Level Flow

Here’s how my semantic search pipeline flows:

  1. Install pgvector – Enable the extension in Postgres.
  2. Embed Data – Generate embeddings for each document or text chunk with a local model.
  3. Store Embeddings – Insert embeddings into a vector column in a Postgres table.
  4. Query by Similarity – Use <-> operator to find nearest neighbors to a query embedding.

I’ll dive into each step with code samples and tips that helped me avoid common pitfalls.


1. Installing pgvector

Why: pgvector adds a vector type and similarity operators right inside Postgres.

-- Run this once per database
enable_extension = "CREATE EXTENSION IF NOT EXISTS vector;"

-- Verify it’s available
SELECT * FROM pg_extension WHERE extname = 'vector';

If you’re using Docker, a quick Dockerfile snippet:

FROM postgres:15
RUN apt-get update && apt-get install -y postgresql-server-dev-15 build-essential && \
    git clone https://github.com/pgvector/pgvector.git && \
    cd pgvector && \
    make && make install

Once pgvector is loaded, I restart my Postgres container or service and I’m good to go.


2. Generating Embeddings

Why: I need consistent embeddings for both my stored data and any query text. My Model: I stick with all-MiniLM-L6-v2 for embeddings, same as my summarization pipeline, keeps things consistent.

from transformers import AutoTokenizer, AutoModel
import torch

# Load once
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
model     = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")

def get_embedding(text):
    inputs = tokenizer(text, return_tensors='pt', truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).squeeze().tolist()

# Example
docs = ["This is a sample document.", "Another text goes here."]
embeddings = [get_embedding(d) for d in docs]
print(f"Generated {len(embeddings)} embeddings.")

I always sanity-check embedding lengths, should be 384 floats, and occasionally eyeball a few values to ensure they’re not all zeros or NaNs.


3. Storing Embeddings in Postgres

Why: Getting data into the DB ensures I can query it with SQL tools and integrate with my dashboards.

-- Table creation
drop_and_create = "DROP TABLE IF EXISTS documents;"
create_table = '''
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  title TEXT,
  content TEXT,
  embedding VECTOR(384)
);
'''

-- Inserting from Python (psycopg2 example)
import psycopg2

conn = psycopg2.connect(dbname='mydb', user='me', password='pass')
cur = conn.cursor()

for title, doc, emb in zip(titles, docs, embeddings):
    cur.execute(
        "INSERT INTO documents (title, content, embedding) VALUES (%s, %s, %s);",
        (title, doc, emb)
    )
conn.commit()

Make sure to convert the Python list to a Postgres array or vector literal. I usually do embedding::vector casting in the SQL driver config.


4. Running Similarity Queries

Why: This is where the magic happens, finding similar docs by vector distance.

-- Find top 5 docs similar to a query embedding
enum_query = '[0.12, -0.34, ...]'  -- output of get_embedding()

SELECT id, title, content
FROM documents
ORDER BY embedding <-> enum_query::vector
LIMIT 5;

In Python I wrap that into a helper:

def search_similar(query_text, k=5):
    q_emb = get_embedding(query_text)
    cur.execute(
        "SELECT title, content FROM documents ORDER BY embedding <-> %s::vector LIMIT %s;",
        (q_emb, k)
    )
    return cur.fetchall()

# Usage
results = search_similar("How to set up a pgvector database?", k=3)
for title, content in results:
    print(f"- {title}")

A couple of gotchas I learned:

  • Indexing: For large tables (>10k rows), create an ivfflat index to speed up queries.

    CREATE INDEX ON documents USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);
    
  • Analyze: Remember to ANALYZE documents after bulk loads so Postgres knows about data distribution.


Wrapping Up

That’s my go-to setup for semantic search in Postgres. I can combine this with other SQL filters (dates, tags) or even join with metadata tables. All my data stays private, and I get flexible querying without the overhead of a specialized vector DB. Next, I’ll hook this into a simple Flask app so I can play with search in my browser.