Replace ChromaDB with pgvector for vector storage

Consolidate onto PostgreSQL by using pgvector instead of a separate
ChromaDB instance. This removes a Docker volume, a large dependency,
and simplifies the stack without meaningful performance impact at
our document scale.

- Swap langchain-chroma for langchain-postgres (PGVector)
- Use pgvector/pgvector:pg16 Docker image with init script
- Lazy-initialize vector store to avoid eager DB connections
- Add SQL helpers for stats/delete/list (replacing _collection access)
- Remove legacy main.py, chunker, petmd scraper, and /api/query endpoint

Re-index required after deploy (POST /api/rag/index + /index-obsidian).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-24 08:43:52 -04:00
parent 9ed4ca126a
commit 438399646f
19 changed files with 241 additions and 1690 deletions
+2 -3
View File
@@ -37,15 +37,14 @@ WORKDIR /app/raggr-frontend
RUN yarn install && yarn build
WORKDIR /app
# Create ChromaDB and database directories
RUN mkdir -p /app/chromadb /app/database
# Create database directory
RUN mkdir -p /app/database
# Expose port
EXPOSE 8080
# Set environment variables
ENV PYTHONPATH=/app
ENV CHROMADB_PATH=/app/chromadb
# Run the startup script
CMD ["./startup.sh"]