Replace ChromaDB with pgvector for vector storage

Consolidate onto PostgreSQL by using pgvector instead of a separate
ChromaDB instance. This removes a Docker volume, a large dependency,
and simplifies the stack without meaningful performance impact at
our document scale.

- Swap langchain-chroma for langchain-postgres (PGVector)
- Use pgvector/pgvector:pg16 Docker image with init script
- Lazy-initialize vector store to avoid eager DB connections
- Add SQL helpers for stats/delete/list (replacing _collection access)
- Remove legacy main.py, chunker, petmd scraper, and /api/query endpoint

Re-index required after deploy (POST /api/rag/index + /index-obsidian).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-24 08:43:52 -04:00
parent 9ed4ca126a
commit 438399646f
19 changed files with 241 additions and 1690 deletions
+2 -2
View File
@@ -5,7 +5,8 @@ description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
"chromadb>=1.1.0",
"langchain-postgres>=0.0.13",
"psycopg[binary]>=3.1.0",
"python-dotenv>=1.0.0",
"flask>=3.1.2",
"httpx>=0.28.1",
@@ -30,7 +31,6 @@ dependencies = [
"asyncpg>=0.30.0",
"langchain-openai>=1.1.6",
"langchain>=1.2.0",
"langchain-chroma>=1.0.0",
"langchain-community>=0.4.1",
"jq>=1.10.0",
"tavily-python>=0.7.17",