Compare commits

...

39 Commits

Author SHA1 Message Date
Ryan Chen add9946bc2 Improve Obsidian RAG retrieval for large vaults
- Markdown-aware chunking (split on headers before size-based splitting)
- Prepend note filename to each chunk for self-contained context
- Source-filtered retrieval (obsidian/paperless queries stay isolated)
- MMR search with k=8, fetch_k=24 for better recall and diversity
- Add source metadata to Paperless docs and folder metadata to Obsidian docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-04 13:34:15 -04:00
Ryan Chen 9bccac82f3 Add "Ask Simba" option to scheduled messages
When use_agent is enabled, the scheduler runs the message content as a
prompt through the LangChain agent and sends Simba's response instead of
the raw content. Frontend adds an Ask Simba toggle with visual indicator.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-04 11:15:11 -04:00
Ryan Chen 489066940d Increase CharEnumField max_length to 20 for scheduled_messages
Fixes aerich migration failure: value too long for type character varying(10)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-04 10:54:41 -04:00
Ryan Chen 467e752629 Add recurring scheduled messages (daily, weekly, monthly)
Extend scheduled messages with a recurrence field. After sending a
recurring message, the scheduler automatically creates the next pending
occurrence. Frontend adds repeat toggle (Once/Daily/Weekly/Monthly) and
displays recurrence in the messages table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 23:30:30 -04:00
Ryan Chen f5203e0466 Add scheduled messages and strip markdown from iMessage responses
Strip markdown formatting (bold, italic, headers, code, links, lists) from
LLM responses before sending via iMessage. Add scheduled messages feature
with CRUD API, background scheduler loop, and admin frontend panel.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 23:25:10 -04:00
Ryan Chen 3ba93c55f4 Add channel-scoped conversations for iMessage, WhatsApp, and email
Revert get_conversation_for_user to use Conversation.get() with
MultipleObjectsReturned fallback. Add channel field to Conversation
model and get_conversation_for_channel helper so each messaging
channel gets its own isolated conversation per user.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 19:39:39 -04:00
Ryan Chen a693874662 Add SendBlue env vars to docker-compose
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 19:34:08 -04:00
Ryan Chen 846477075e Add link_imessage management script
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 19:31:12 -04:00
Ryan Chen 1e753bfaab Add SendBlue webhook signature validation
Validates sb-signing-secret header against SENDBLUE_WEBHOOK_SECRET env var.
Can be disabled with SENDBLUE_SIGNATURE_VALIDATION=false for development.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 19:28:35 -04:00
Ryan Chen 20576cabf3 Add SendBlue iMessage integration with admin-only access
- New imessage blueprint: webhook receives inbound iMessages, runs through
  LangChain agent, replies via SendBlue REST API
- Admin-only: only users with lldap_admin group can use iMessage channel
- Admin endpoints to link/unlink imessage_number on user accounts
- Add imessage_number field to User model (needs aerich migration)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 19:22:15 -04:00
Ryan Chen 02dd3df1f9 Use calendar events list API for all-day event support
Switch from gws calendar +agenda to gws calendar events list with
explicit timeMin/timeMax and singleEvents=true to include all-day events.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 19:12:17 -04:00
Ryan Chen 33f19e704c Fix gws CLI npm package name to @googleworkspace/cli
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 19:04:10 -04:00
ryan 8eee58de60 ops 2026-06-03 19:02:18 -04:00
Ryan Chen 98c47d5507 Add read-only Google Calendar integration via gws CLI
Adds a get_calendar_events agent tool that shells out to `gws calendar +agenda`
for admin users. Controlled by GOOGLE_CALENDAR_ENABLED env var, with OAuth
credentials mounted from credentials.json.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 19:01:33 -04:00
Ryan Chen 9a149cdaa6 Use in-memory cache for obsidian indexed files instead of cross-engine DB query
The async/sync engine split caused visibility issues where newly indexed
files weren't found on the next cycle, triggering re-indexing of all 36
files every 60 seconds. Replace with a module-level dict that loads from
DB on cold start and stays in sync via cache updates after each indexing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-31 07:37:39 -04:00
Ryan Chen 00c9b44c0e Preserve wikilink text in Obsidian indexing and fix duplicate sync
Two fixes:
- Convert wikilinks to display text instead of stripping them entirely.
  [[Noah]] becomes "Noah", [[target|display]] becomes "display". This
  was causing names and references in wikilinks to be invisible to search.
- Switch _get_obsidian_indexed_files to async engine to avoid stale reads
  from the separate sync engine, which caused files to be re-indexed
  every cycle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-31 07:35:24 -04:00
Ryan Chen 73e952c617 Set Obsidian sync to pull-only mode
Prevents local changes from being pushed back to Obsidian servers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-31 07:29:18 -04:00
Ryan Chen 9f51dc3cdb Fix NOT NULL violation on empty splits and increase search results to k=6
Empty documents after sanitization caused aadd_documents to issue a
DEFAULT VALUES insert. Guard with an emptiness check. Also increase
similarity search k from 2 to 6 so multi-word queries like full names
have better recall.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-31 07:16:28 -04:00
Ryan Chen 1e6bc536b4 Fix datetime serialization in Obsidian metadata for pgvector
YAML frontmatter can contain datetime objects which aren't JSON
serializable. Add _make_serializable() to coerce all metadata values
before storing in pgvector.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-31 07:11:29 -04:00
Ryan Chen 869de1c250 Add incremental Obsidian-to-pgvector sync with background watcher
Replace full delete-and-reindex with mtime-based incremental sync that
only re-indexes changed/new files and removes deleted ones. A background
polling task keeps the vector store up-to-date automatically when
OBSIDIAN_CONTINUOUS_SYNC=true.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-31 07:05:48 -04:00
Ryan Chen 2cd77c68c1 Fix daily note path to match vault structure
Update from journal/YYYY/YYYY-MM-DD.md to
50 - Journal/YYYY/MM/YYYY-MM-DD.md to match the actual Obsidian vault
folder layout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-31 00:08:40 -04:00
Ryan Chen ad5b889bf1 Remove stale Obsidian sync lock before starting continuous sync
The ob CLI uses a directory lock at .obsidian/.sync.lock that persists
across container restarts via the volume mount, causing "Another sync
instance is already running" errors. Remove it before starting sync.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-31 00:07:37 -04:00
Ryan Chen 75e6b09464 Guard against missing Obsidian credentials to prevent startup hang
ob login blocks waiting for interactive input when OBSIDIAN_EMAIL or
OBSIDIAN_PASSWORD is empty. Check required env vars before attempting
login to skip sync gracefully with a warning instead of hanging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-31 00:02:28 -04:00
Ryan Chen 47238f8567 Fix Obsidian sync race condition and block credentials.json from being served
Run ob login and sync-setup in foreground before backgrounding sync to
prevent "Another sync instance is already running" error. Restrict the
catch-all route to only serve whitelisted static file extensions to
prevent sensitive files like credentials.json from being exposed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 23:59:29 -04:00
Ryan Chen 5e0e2994c2 Fix Obsidian sync setup in Docker by running login and sync-setup before sync
Replace OBSIDIAN_AUTH_TOKEN with OBSIDIAN_EMAIL/OBSIDIAN_PASSWORD and run
the full ob login → sync-setup → sync sequence on container startup so
fresh containers authenticate properly instead of failing with
"Run 'ob sync-setup' first".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 23:52:30 -04:00
Ryan Chen 9629bfcef4 Fix embedding tokenizer mismatch with custom embedding server
Disable tiktoken pre-encoding for custom embedding servers. LangChain
was encoding text into OpenAI token IDs then sending them to llama-server
which has a different vocabulary, causing "invalid tokens" errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-11 23:42:23 -04:00
Ryan Chen b4097730ef Add per-chunk error logging and broaden text sanitizer
Indexes chunks one at a time with error logging to identify which
document/chunk causes embedding failures. Also strips Unicode surrogates
and replacement characters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-11 23:38:03 -04:00
Ryan Chen abb06b78e2 Sanitize document text before embedding to fix tokenizer errors
Strips null bytes, control characters, and excessive whitespace from
document content before sending to embedding models. Fixes 400 errors
from BERT-based tokenizers (e.g. nomic-embed-text) on PDF-extracted text.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-11 23:35:25 -04:00
Ryan Chen 92171cbfb6 Support custom OpenAI-compatible embedding server with OpenAI fallback
Adds EMBEDDING_SERVER_URL and EMBEDDING_MODEL_NAME env vars, mirroring
the existing LLAMA_SERVER_URL pattern for LLM configuration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-11 23:24:54 -04:00
ryan 8e884b5e76 Merge pull request 'Refactor frontend to hook-based architecture' (#33) from refactor/frontend-hooks into main
Reviewed-on: #33
2026-04-24 09:12:47 -04:00
ryan ed973357e8 Merge pull request 'Improve Simba system prompt' (#31) from feat/improve-system-prompt into main
Reviewed-on: #31
2026-04-24 09:12:39 -04:00
Ryan Chen 4ac0754ea7 Refactor frontend to hook-based architecture
Extract logic from god components into custom hooks (useAuthCheck,
useConversations, useChat, usePresignedUrl, useAdminUsers, useOIDCAuth).
Eliminate unnecessary useEffects per React guidelines — scroll is now
imperative, isAdmin comes from useAuthCheck instead of a separate fetch.
ConversationList becomes a pure presentational component. Wrap bubble
components in React.memo.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-24 09:11:57 -04:00
Ryan Chen db977270a3 Improve Simba system prompt for more helpful responses
Shift focus from cat persona to genuine helpfulness. Keep light
cat flavor but prioritize thorough, detailed answers over the
assertive cat act.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-24 08:58:29 -04:00
ryan bac773ae4b Merge pull request 'Enable async_mode on PGVector for async method support' (#30) from refactor/chromadb-to-pgvector into main
Reviewed-on: #30
2026-04-24 08:53:34 -04:00
Ryan Chen 564a9b68a5 Enable async_mode on PGVector for async method support
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-24 08:53:21 -04:00
ryan 7742673cc0 Merge pull request 'Handle missing pgvector tables on first run' (#29) from refactor/chromadb-to-pgvector into main
Reviewed-on: #29
2026-04-24 08:49:38 -04:00
Ryan Chen c157c37cde Handle missing pgvector tables on first run
_get_collection_id now catches the UndefinedTable error that occurs
before the first index operation creates the langchain tables.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-24 08:49:00 -04:00
ryan 3b8fa3e7a0 Merge pull request 'Replace ChromaDB with pgvector' (#28) from refactor/chromadb-to-pgvector into main
Reviewed-on: #28
2026-04-24 08:44:39 -04:00
Ryan Chen 438399646f Replace ChromaDB with pgvector for vector storage
Consolidate onto PostgreSQL by using pgvector instead of a separate
ChromaDB instance. This removes a Docker volume, a large dependency,
and simplifies the stack without meaningful performance impact at
our document scale.

- Swap langchain-chroma for langchain-postgres (PGVector)
- Use pgvector/pgvector:pg16 Docker image with init script
- Lazy-initialize vector store to avoid eager DB connections
- Add SQL helpers for stats/delete/list (replacing _collection access)
- Remove legacy main.py, chunker, petmd scraper, and /api/query endpoint

Re-index required after deploy (POST /api/rag/index + /index-obsidian).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-24 08:43:52 -04:00
55 changed files with 2560 additions and 2172 deletions
+28 -6
View File
@@ -19,10 +19,11 @@ BASE_URL=192.168.1.5:8000
LLAMA_SERVER_URL=http://192.168.1.213:8080/v1
LLAMA_MODEL_NAME=llama-3.1-8b-instruct
# ChromaDB Configuration
# For Docker: This is automatically set to /app/data/chromadb
# For local development: Set to a local directory path
CHROMADB_PATH=./data/chromadb
# Embedding Server Configuration
# If set, uses a custom OpenAI-compatible embedding server (e.g. llama-server)
# Falls back to OpenAI embeddings if not set
EMBEDDING_SERVER_URL=http://192.168.1.7:8086/v1
EMBEDDING_MODEL_NAME=all-minilm
# OpenAI Configuration
OPENAI_API_KEY=your-openai-api-key
@@ -92,9 +93,30 @@ EMAIL_HMAC_SECRET=
# Set to false to disable Mailgun signature validation in development
MAILGUN_SIGNATURE_VALIDATION=true
# SendBlue Configuration (iMessage)
SENDBLUE_API_KEY=your-sendblue-api-key
SENDBLUE_API_SECRET=your-sendblue-api-secret
SENDBLUE_FROM_NUMBER=+1XXXXXXXXXX
SENDBLUE_WEBHOOK_SECRET=your-sendblue-webhook-secret
# Set to false to disable SendBlue signature validation in development
SENDBLUE_SIGNATURE_VALIDATION=true
# Comma-separated list of iMessage numbers allowed to use the service (E.164 format)
# Use * to allow any number
ALLOWED_IMESSAGE_NUMBERS=
# Rate limiting: max messages per window (default: 10 messages per 60 seconds)
# IMESSAGE_RATE_LIMIT_MAX=10
# IMESSAGE_RATE_LIMIT_WINDOW=60
# Google Calendar Configuration (via gws CLI)
GOOGLE_CALENDAR_ENABLED=true
# Export credentials: gws auth export --unmasked > credentials.json
# The file is mounted into the container at /app/config/gws-credentials.json
GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE=/app/config/gws-credentials.json
# Obsidian Configuration (headless sync)
# Auth token from Obsidian account (Settings → Account → API token)
OBSIDIAN_AUTH_TOKEN=your-obsidian-auth-token
# Obsidian account credentials (used for `ob login` on container startup)
OBSIDIAN_EMAIL=your-obsidian-email
OBSIDIAN_PASSWORD=your-obsidian-password
# Vault ID to sync (found in Obsidian sync settings)
OBSIDIAN_VAULT_ID=your-vault-id
# End-to-end encryption password (if vault uses E2E encryption)
+1 -3
View File
@@ -11,11 +11,9 @@ wheels/
# Environment files
.env
credentials.json
# Database files
chromadb/
chromadb_openai/
chroma_db/
database/
*.db
+3 -4
View File
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## Project Overview
SimbaRAG is a RAG (Retrieval-Augmented Generation) conversational AI system for querying information about Simba (a cat). It ingests documents from Paperless-NGX, stores embeddings in ChromaDB, and uses LLMs (Ollama or OpenAI) to answer questions.
SimbaRAG is a RAG (Retrieval-Augmented Generation) conversational AI system for querying information about Simba (a cat). It ingests documents from Paperless-NGX, stores embeddings in PostgreSQL via pgvector, and uses LLMs (Ollama or OpenAI) to answer questions.
## Commands
@@ -54,9 +54,8 @@ docker compose up -d
│ Docker Compose │
├─────────────────────────────────────────────────────────────┤
│ raggr (port 8080) │ postgres (port 5432) │
│ ├── Quart backend │ PostgreSQL 16
── React frontend (served) │ │
│ └── ChromaDB (volume) │ │
│ ├── Quart backend │ PostgreSQL 16 + pgvector
── React frontend (served) │ │
└─────────────────────────────────────────────────────────────┘
```
+3 -4
View File
@@ -8,7 +8,7 @@ RUN apt-get update && apt-get install -y \
curl \
&& curl -fsSL https://deb.nodesource.com/setup_22.x | bash - \
&& apt-get install -y nodejs \
&& npm install -g yarn obsidian-headless \
&& npm install -g yarn obsidian-headless @googleworkspace/cli \
&& rm -rf /var/lib/apt/lists/* \
&& curl -LsSf https://astral.sh/uv/install.sh | sh
@@ -37,15 +37,14 @@ WORKDIR /app/raggr-frontend
RUN yarn install && yarn build
WORKDIR /app
# Create ChromaDB and database directories
RUN mkdir -p /app/chromadb /app/database
# Create database directory
RUN mkdir -p /app/database
# Expose port
EXPOSE 8080
# Set environment variables
ENV PYTHONPATH=/app
ENV CHROMADB_PATH=/app/chromadb
# Run the startup script
CMD ["./startup.sh"]
+2 -3
View File
@@ -34,16 +34,15 @@ COPY . .
WORKDIR /app/raggr-frontend
RUN yarn build
# Create ChromaDB and database directories
# Create database directory
WORKDIR /app
RUN mkdir -p /app/chromadb /app/database
RUN mkdir -p /app/database
# Make startup script executable
RUN chmod +x /app/startup-dev.sh
# Set environment variables
ENV PYTHONPATH=/app
ENV CHROMADB_PATH=/app/chromadb
ENV PYTHONUNBUFFERED=1
# Expose port
+70 -37
View File
@@ -1,9 +1,10 @@
import asyncio
import logging
import os
from datetime import timedelta
from dotenv import load_dotenv
from quart import Quart, jsonify, render_template, request, send_from_directory
from quart import Quart, jsonify, render_template, send_from_directory
from quart_jwt_extended import JWTManager, get_jwt_identity, jwt_refresh_token_required
from tortoise import Tortoise
@@ -13,9 +14,10 @@ import blueprints.email
import blueprints.rag
import blueprints.users
import blueprints.whatsapp
import blueprints.imessage
import blueprints.scheduled_messages
import blueprints.users.models
from config.db import TORTOISE_CONFIG
from main import consult_simba_oracle
# Load environment variables
load_dotenv()
@@ -50,6 +52,29 @@ app.register_blueprint(blueprints.conversation.conversation_blueprint)
app.register_blueprint(blueprints.email.email_blueprint)
app.register_blueprint(blueprints.rag.rag_blueprint)
app.register_blueprint(blueprints.whatsapp.whatsapp_blueprint)
app.register_blueprint(blueprints.imessage.imessage_blueprint)
app.register_blueprint(blueprints.scheduled_messages.scheduled_messages_blueprint)
async def _obsidian_sync_loop():
"""Background task that incrementally syncs Obsidian documents to pgvector."""
from blueprints.rag.logic import sync_obsidian_documents
interval = int(os.getenv("OBSIDIAN_SYNC_INTERVAL", "60"))
logger = logging.getLogger("obsidian_sync")
logger.info(f"Obsidian sync watcher started (interval={interval}s)")
while True:
try:
result = await sync_obsidian_documents()
if result["added"] or result["updated"] or result["deleted"]:
logger.info(
f"Obsidian sync: {result['added']} added, "
f"{result['updated']} updated, {result['deleted']} deleted"
)
except Exception:
logger.exception("Obsidian sync error")
await asyncio.sleep(interval)
# Initialize Tortoise ORM with lifecycle hooks
@@ -58,7 +83,21 @@ async def lifespan():
logging.info("Initializing Tortoise ORM...")
await Tortoise.init(config=TORTOISE_CONFIG)
logging.info("Tortoise ORM initialized successfully")
watcher_task = None
if os.getenv("OBSIDIAN_CONTINUOUS_SYNC") == "true":
watcher_task = asyncio.create_task(_obsidian_sync_loop())
from blueprints.scheduled_messages.scheduler import scheduled_messages_loop
scheduler_task = asyncio.create_task(scheduled_messages_loop())
yield
scheduler_task.cancel()
if watcher_task is not None:
watcher_task.cancel()
logging.info("Closing Tortoise ORM connections...")
await Tortoise.close_connections()
@@ -69,48 +108,42 @@ async def static_files(filename):
return await send_from_directory(app.static_folder, filename)
# Allowed file extensions for static frontend assets
ALLOWED_STATIC_EXTENSIONS = {
".html",
".css",
".js",
".svg",
".png",
".ico",
".jpg",
".jpeg",
".webp",
".woff",
".woff2",
".ttf",
".txt",
}
# JSON files explicitly allowed to be served (e.g. PWA manifest)
ALLOWED_JSON_FILES = {"manifest.json"}
# Serve the React app for all routes (catch-all)
@app.route("/", defaults={"path": ""})
@app.route("/<path:path>")
async def serve_react_app(path):
if path and os.path.exists(os.path.join(app.template_folder, path)):
return await send_from_directory(app.template_folder, path)
if path:
ext = os.path.splitext(path)[1].lower()
basename = os.path.basename(path)
allowed = ext in ALLOWED_STATIC_EXTENSIONS or (
ext == ".json" and basename in ALLOWED_JSON_FILES
)
if allowed and os.path.exists(os.path.join(app.template_folder, path)):
return await send_from_directory(app.template_folder, path)
return await render_template("index.html")
@app.route("/api/query", methods=["POST"])
@jwt_refresh_token_required
async def query():
current_user_uuid = get_jwt_identity()
user = await blueprints.users.models.User.get(id=current_user_uuid)
data = await request.get_json()
query = data.get("query")
conversation_id = data.get("conversation_id")
conversation = await blueprints.conversation.logic.get_conversation_by_id(
conversation_id
)
await conversation.fetch_related("messages")
await blueprints.conversation.logic.add_message_to_conversation(
conversation=conversation,
message=query,
speaker="user",
user=user,
)
transcript = await blueprints.conversation.logic.get_conversation_transcript(
user=user, conversation=conversation
)
response = consult_simba_oracle(input=query, transcript=transcript)
await blueprints.conversation.logic.add_message_to_conversation(
conversation=conversation,
message=response,
speaker="simba",
user=user,
)
return jsonify({"response": response})
@app.route("/api/messages", methods=["GET"])
@jwt_refresh_token_required
async def get_messages():
+6 -2
View File
@@ -96,7 +96,9 @@ async def query():
conversation, query, system_prompt=system_prompt
)
payload = {"messages": messages_payload}
agent_config = {"configurable": {"user_id": str(user.id)}}
agent_config = {
"configurable": {"user_id": str(user.id), "is_admin": user.is_admin()}
}
response = await main_agent.ainvoke(payload, config=agent_config)
message = response.get("messages", [])[-1].content
@@ -183,7 +185,9 @@ async def stream_query():
conversation, query_text or "", image_description, system_prompt=system_prompt
)
payload = {"messages": messages_payload}
agent_config = {"configurable": {"user_id": str(user.id)}}
agent_config = {
"configurable": {"user_id": str(user.id), "is_admin": user.is_admin()}
}
async def event_generator():
final_message = None
+78 -5
View File
@@ -65,9 +65,10 @@ def get_current_date() -> str:
Returns:
Today's date in YYYY-MM-DD format
"""
from datetime import date
from datetime import datetime
from zoneinfo import ZoneInfo
return date.today().isoformat()
return datetime.now(ZoneInfo("America/New_York")).strftime("%Y-%m-%d")
@tool
@@ -120,7 +121,7 @@ async def simba_search(query: str):
Relevant information from Simba's documents
"""
print(f"[SIMBA SEARCH] Tool called with query: {query}")
serialized, docs = await query_vector_store(query=query)
serialized, docs = await query_vector_store(query=query, source="paperless")
print(f"[SIMBA SEARCH] Found {len(docs)} documents")
print(f"[SIMBA SEARCH] Serialized result length: {len(serialized)}")
print(f"[SIMBA SEARCH] First 200 chars: {serialized[:200]}")
@@ -328,8 +329,8 @@ async def obsidian_search_notes(query: str) -> str:
return "Obsidian integration is not configured. Please set OBSIDIAN_VAULT_PATH environment variable."
try:
# Query ChromaDB for obsidian documents
serialized, docs = await query_vector_store(query=query)
# Query vector store filtered to obsidian source only
serialized, docs = await query_vector_store(query=query, source="obsidian")
return serialized
except Exception as e:
@@ -618,6 +619,76 @@ async def save_user_memory(content: str, config: RunnableConfig) -> str:
return await save_memory(user_id=user_id, content=content)
@tool
async def get_calendar_events(
time_range: str = "today",
days: int = 0,
calendar_id: str = "primary",
*,
config: RunnableConfig,
) -> str:
"""Get upcoming Google Calendar events including all-day events.
Use this tool when the user asks about:
- What's on their calendar today or this week
- Upcoming meetings or events
- Scheduling or availability questions
Args:
time_range: One of "today", "tomorrow", or "week" (default: "today")
days: If set to a positive number, show events for this many upcoming days
(overrides time_range)
calendar_id: Calendar ID to query (default: "primary")
Returns:
Calendar events as JSON
"""
if not config["configurable"].get("is_admin"):
return "Calendar access is restricted to admin users."
import asyncio
from datetime import datetime, timedelta
from zoneinfo import ZoneInfo
tz = ZoneInfo("America/New_York")
now = datetime.now(tz)
start = now.replace(hour=0, minute=0, second=0, microsecond=0)
if days > 0:
end = start + timedelta(days=days)
elif time_range == "tomorrow":
start = start + timedelta(days=1)
end = start + timedelta(days=1)
elif time_range == "week":
end = start + timedelta(days=7)
else:
end = start + timedelta(days=1)
cmd = [
"gws",
"calendar",
"events",
"list",
"--calendarId",
calendar_id,
"--timeMin",
start.isoformat(),
"--timeMax",
end.isoformat(),
"--singleEvents",
"true",
"--orderBy",
"startTime",
]
proc = await asyncio.create_subprocess_exec(
*cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
)
stdout, stderr = await proc.communicate()
if proc.returncode != 0:
return f"Calendar error: {stderr.decode()}"
return stdout.decode()
# Create tools list based on what's available
tools = [get_current_date, simba_search, web_search, save_user_memory]
if ynab_enabled:
@@ -642,6 +713,8 @@ if obsidian_enabled:
journal_complete_task,
]
)
if os.getenv("GOOGLE_CALENDAR_ENABLED"):
tools.append(get_calendar_events)
# Llama 3.1 supports native function calling via OpenAI-compatible API
main_agent = create_agent(model=model_with_fallback, tools=tools)
+19 -3
View File
@@ -47,11 +47,27 @@ async def get_the_only_conversation() -> Conversation:
async def get_conversation_for_user(user: blueprints.users.models.User) -> Conversation:
try:
return await Conversation.get(user=user)
conversation = await Conversation.get(user=user)
except tortoise.exceptions.MultipleObjectsReturned:
conversation = (
await Conversation.filter(user=user).order_by("created_at").first()
)
except tortoise.exceptions.DoesNotExist:
await Conversation.get_or_create(name=f"{user.username}'s chat", user=user)
conversation = await Conversation.create(
name=f"{user.username}'s chat", user=user
)
return conversation
return await Conversation.get(user=user)
async def get_conversation_for_channel(
user: blueprints.users.models.User, channel: str
) -> Conversation:
conversation = await Conversation.filter(user=user, channel=channel).first()
if conversation is None:
conversation = await Conversation.create(
name=f"{user.username}'s {channel} chat", user=user, channel=channel
)
return conversation
async def get_conversation_by_id(id: str) -> Conversation:
+1
View File
@@ -21,6 +21,7 @@ class Conversation(Model):
user: fields.ForeignKeyRelation = fields.ForeignKeyField(
"models.User", related_name="conversations", null=True
)
channel = fields.CharField(max_length=20, default="web", null=True)
class Meta:
table = "conversations"
+2 -2
View File
@@ -1,4 +1,4 @@
SIMBA_SYSTEM_PROMPT = """You are a helpful cat assistant named Simba that understands veterinary terms. When there are questions to you specifically, they are referring to Simba the cat. Answer the user in as if you were a cat named Simba. Don't act too catlike. Be assertive.
SIMBA_SYSTEM_PROMPT = """You are Simba, Ryan's helpful personal assistant. You're named after his orange cat. You have a warm, friendly personality with a light cat-themed touch, but your priority is always being genuinely useful — give thorough, detailed answers and think things through carefully. When asked about Simba the cat, you speak as him in first person. For everything else, you're just a great assistant who happens to have a cat's name.
SIMBA FACTS (as of January 2026):
- Name: Simba
@@ -49,7 +49,7 @@ You have access to Ryan's Obsidian vault through the Obsidian integration. When
Always use these tools when users ask about notes, research, ideas, tasks, or when you want to save information for future reference.
DAILY JOURNAL (Task Tracking):
You have access to Ryan's daily journal notes. Each note lives at journal/YYYY/YYYY-MM-DD.md and has two sections: tasks and log.
You have access to Ryan's daily journal notes. Each note lives at 50 - Journal/YYYY/MM/YYYY-MM-DD.md and has two sections: tasks and log.
- Use journal_get_today to read today's full daily note (tasks + log)
- Use journal_get_tasks to list tasks (done/pending) for today or a specific date
- Use journal_add_task to add a new task to today's (or a given date's) note
+2 -2
View File
@@ -11,7 +11,7 @@ from quart import Blueprint, request
from blueprints.users.models import User
from blueprints.conversation.logic import (
get_conversation_for_user,
get_conversation_for_channel,
add_message_to_conversation,
)
from blueprints.conversation.agents import main_agent
@@ -176,7 +176,7 @@ async def webhook():
# Get or create conversation
try:
conversation = await get_conversation_for_user(user=user)
conversation = await get_conversation_for_channel(user=user, channel="email")
await conversation.fetch_related("messages")
except Exception as e:
logger.error(f"Failed to get conversation for user {user.username}: {e}")
+231
View File
@@ -0,0 +1,231 @@
import os
import hmac
import logging
import functools
import time
from collections import defaultdict
import httpx
from quart import Blueprint, request, jsonify
from blueprints.users.models import User
from blueprints.conversation.logic import (
get_conversation_for_channel,
add_message_to_conversation,
)
from blueprints.conversation.agents import main_agent
from blueprints.conversation.prompts import SIMBA_SYSTEM_PROMPT
imessage_blueprint = Blueprint("imessage_api", __name__, url_prefix="/api/imessage")
logger = logging.getLogger(__name__)
# Rate limiting: per-number message timestamps
_rate_limit_store: dict[str, list[float]] = defaultdict(list)
RATE_LIMIT_MAX = int(os.getenv("IMESSAGE_RATE_LIMIT_MAX", "10"))
RATE_LIMIT_WINDOW = int(os.getenv("IMESSAGE_RATE_LIMIT_WINDOW", "60"))
MAX_MESSAGE_LENGTH = 2000
SENDBLUE_API_BASE = "https://api.sendblue.co"
def _get_sendblue_headers() -> dict[str, str]:
return {
"sb-api-key-id": os.getenv("SENDBLUE_API_KEY", ""),
"sb-api-secret-key": os.getenv("SENDBLUE_API_SECRET", ""),
"Content-Type": "application/json",
}
def _check_rate_limit(phone_number: str) -> bool:
"""Check if a phone number has exceeded the rate limit.
Returns True if the request is allowed, False if rate-limited.
"""
now = time.monotonic()
cutoff = now - RATE_LIMIT_WINDOW
timestamps = _rate_limit_store[phone_number]
_rate_limit_store[phone_number] = [t for t in timestamps if t > cutoff]
if len(_rate_limit_store[phone_number]) >= RATE_LIMIT_MAX:
return False
_rate_limit_store[phone_number].append(now)
return True
async def send_imessage(to_number: str, content: str) -> dict:
"""Send an iMessage via SendBlue API."""
from_number = os.getenv("SENDBLUE_FROM_NUMBER", "")
async with httpx.AsyncClient() as client:
resp = await client.post(
f"{SENDBLUE_API_BASE}/api/send-message",
headers=_get_sendblue_headers(),
json={
"number": to_number,
"from_number": from_number,
"content": content,
},
timeout=30,
)
resp.raise_for_status()
return resp.json()
def validate_sendblue_signature(f):
"""Decorator to validate the SendBlue webhook signing secret."""
@functools.wraps(f)
async def decorated_function(*args, **kwargs):
if os.getenv("SENDBLUE_SIGNATURE_VALIDATION", "true").lower() == "false":
return await f(*args, **kwargs)
secret = os.getenv("SENDBLUE_WEBHOOK_SECRET")
if not secret:
logger.error("SENDBLUE_WEBHOOK_SECRET not set — rejecting request")
return jsonify({"error": "Server misconfigured"}), 500
sig = request.headers.get("sb-signing-secret", "")
if not hmac.compare_digest(sig, secret):
logger.warning("Invalid SendBlue signing secret")
return jsonify({"error": "Unauthorized"}), 403
return await f(*args, **kwargs)
return decorated_function
@imessage_blueprint.route("/webhook", methods=["POST"])
@validate_sendblue_signature
async def webhook():
"""Handle incoming iMessages from SendBlue."""
data = await request.get_json()
if not data:
return jsonify({"error": "Invalid payload"}), 400
from_number = data.get("from_number")
content = data.get("content")
is_outbound = data.get("is_outbound", False)
# Ignore outbound messages (our own replies echoed back)
if is_outbound:
return jsonify({"status": "ignored"}), 200
if not from_number or not content:
return jsonify({"error": "Missing from_number or content"}), 400
content = content.strip()
if not content:
await send_imessage(
from_number, "I received an empty message. Please send some text!"
)
return jsonify({"status": "ok"}), 200
# Rate limiting
if not _check_rate_limit(from_number):
logger.warning(f"Rate limit exceeded for {from_number}")
await send_imessage(
from_number,
"You're sending messages too quickly. Please wait a moment and try again.",
)
return jsonify({"status": "rate_limited"}), 200
# Truncate overly long messages
if len(content) > MAX_MESSAGE_LENGTH:
content = content[:MAX_MESSAGE_LENGTH]
logger.info(
f"Truncated long message from {from_number} to {MAX_MESSAGE_LENGTH} chars"
)
logger.info(f"Received iMessage from {from_number}: {content[:100]}")
# Identify or create user
user = await User.filter(imessage_number=from_number).first()
if not user:
allowed_numbers = os.getenv("ALLOWED_IMESSAGE_NUMBERS", "").split(",")
if from_number not in allowed_numbers and "*" not in allowed_numbers:
await send_imessage(
from_number, "Sorry, you are not authorized to use this service."
)
return jsonify({"status": "unauthorized"}), 200
username = f"im_{from_number.lstrip('+')}"
try:
user = await User.create(
username=username,
email=f"{username}@imessage.simbarag.local",
imessage_number=from_number,
auth_provider="imessage",
)
logger.info(f"Created new user for iMessage: {username}")
except Exception as e:
logger.error(f"Failed to create user for {from_number}: {e}")
await send_imessage(
from_number, "Sorry, something went wrong setting up your account."
)
return jsonify({"status": "error"}), 200
# iMessage is restricted to admins
if not user.is_admin():
logger.warning(f"Non-admin user {user.username} attempted iMessage access")
await send_imessage(from_number, "Sorry, this feature is restricted to admins.")
return jsonify({"status": "forbidden"}), 200
# Get or create conversation
try:
conversation = await get_conversation_for_channel(user=user, channel="imessage")
await conversation.fetch_related("messages")
except Exception as e:
logger.error(f"Failed to get conversation for user {user.username}: {e}")
await send_imessage(
from_number, "Sorry, something went wrong. Please try again later."
)
return jsonify({"status": "error"}), 200
# Add user message to conversation
await add_message_to_conversation(
conversation=conversation,
message=content,
speaker="user",
user=user,
)
# Build messages payload for LangChain agent
try:
messages = await conversation.messages.all()
recent_messages = list(messages)[-10:]
messages_payload = [{"role": "system", "content": SIMBA_SYSTEM_PROMPT}]
for msg in recent_messages[:-1]:
role = "user" if msg.speaker == "user" else "assistant"
messages_payload.append({"role": role, "content": msg.text})
messages_payload.append({"role": "user", "content": content})
logger.info(f"Invoking LangChain agent with {len(messages_payload)} messages")
response = await main_agent.ainvoke({"messages": messages_payload})
response_text = response.get("messages", [])[-1].content
except Exception as e:
logger.error(f"Error invoking agent: {e}")
response_text = "Sorry, I'm having trouble thinking right now."
# Save and send response
await add_message_to_conversation(
conversation=conversation,
message=response_text,
speaker="simba",
user=user,
)
from utils.strip_markdown import strip_markdown
await send_imessage(from_number, strip_markdown(response_text))
return jsonify({"status": "ok"}), 200
+9 -11
View File
@@ -1,7 +1,12 @@
from quart import Blueprint, jsonify
from quart_jwt_extended import jwt_refresh_token_required
from .logic import fetch_obsidian_documents, get_vector_store_stats, index_documents, index_obsidian_documents, vector_store
from .logic import (
delete_all_documents,
get_vector_store_stats,
index_documents,
sync_obsidian_documents,
)
from blueprints.users.decorators import admin_required
rag_blueprint = Blueprint("rag_api", __name__, url_prefix="/api/rag")
@@ -32,14 +37,7 @@ async def trigger_index():
async def trigger_reindex():
"""Clear and reindex all documents. Admin only."""
try:
# Clear existing documents
collection = vector_store._collection
all_docs = collection.get()
if all_docs["ids"]:
collection.delete(ids=all_docs["ids"])
# Reindex
delete_all_documents()
await index_documents()
stats = get_vector_store_stats()
return jsonify({"status": "success", "stats": stats})
@@ -50,9 +48,9 @@ async def trigger_reindex():
@rag_blueprint.post("/index-obsidian")
@admin_required
async def trigger_obsidian_index():
"""Index all Obsidian markdown documents into vector store. Admin only."""
"""Incrementally sync Obsidian documents into vector store. Admin only."""
try:
result = await index_obsidian_documents()
result = await sync_obsidian_documents()
stats = get_vector_store_stats()
return jsonify({"status": "success", "result": result, "stats": stats})
except Exception as e:
+384 -42
View File
@@ -1,11 +1,19 @@
import datetime
import logging
import os
import re
import time
from pathlib import Path
from dotenv import load_dotenv
from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_postgres import PGVector
from langchain_text_splitters import (
MarkdownHeaderTextSplitter,
RecursiveCharacterTextSplitter,
)
from sqlalchemy import create_engine, text
from .fetchers import PaperlessNGXService
from utils.obsidian_service import ObsidianService
@@ -13,13 +21,51 @@ from utils.obsidian_service import ObsidianService
# Load environment variables
load_dotenv()
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
logger = logging.getLogger(__name__)
vector_store = Chroma(
collection_name="simba_docs",
embedding_function=embeddings,
persist_directory=os.getenv("CHROMADB_PATH", ""),
_embedding_server_url = os.getenv("EMBEDDING_SERVER_URL")
_embedding_model = os.getenv("EMBEDDING_MODEL_NAME", "text-embedding-3-small")
if _embedding_server_url:
embeddings = OpenAIEmbeddings(
model=_embedding_model,
base_url=_embedding_server_url,
api_key="not-needed",
check_embedding_ctx_length=False,
)
else:
embeddings = OpenAIEmbeddings(model=_embedding_model)
# Convert Tortoise-style postgres:// URL to SQLAlchemy-style postgresql+psycopg://
_db_url = os.getenv(
"DATABASE_URL", "postgres://raggr:raggr_dev_password@localhost:5432/raggr"
)
_pgvector_url = _db_url.replace("postgres://", "postgresql+psycopg://")
# Lazy-initialized vector store (defers DB connection to first use)
_vector_store = None
def _get_vector_store() -> PGVector:
global _vector_store
if _vector_store is None:
_vector_store = PGVector(
embeddings=embeddings,
collection_name="simba_docs",
connection=_pgvector_url,
use_jsonb=True,
create_extension=False, # created by docker init script
async_mode=True,
)
return _vector_store
def _get_engine():
"""Get a SQLAlchemy engine for direct queries."""
if not hasattr(_get_engine, "_engine"):
_get_engine._engine = create_engine(_pgvector_url)
return _get_engine._engine
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # chunk size (characters)
@@ -27,6 +73,62 @@ text_splitter = RecursiveCharacterTextSplitter(
add_start_index=True, # track index in original document
)
md_header_splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=[("#", "h1"), ("##", "h2"), ("###", "h3")],
strip_headers=False,
)
md_chunk_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
add_start_index=True,
)
def _split_markdown_document(doc: Document) -> list[Document]:
"""Split a markdown document by headers first, then by size.
Prepends the note filename to each chunk so chunks are self-contained.
"""
note_name = (
Path(doc.metadata.get("filepath", "")).stem
if doc.metadata.get("filepath")
else ""
)
# Split by markdown headers
header_splits = md_header_splitter.split_text(doc.page_content)
# Carry over original document metadata to each header split
for split in header_splits:
split.metadata.update(doc.metadata)
# Then apply size-based splitting on large sections
sized_splits = md_chunk_splitter.split_documents(header_splits)
# Prepend note name for self-contained context
if note_name:
for split in sized_splits:
split.page_content = f"[Note: {note_name}]\n{split.page_content}"
return sized_splits
def _get_collection_id():
"""Get the UUID of our collection from the langchain_pg_collection table."""
engine = _get_engine()
try:
with engine.connect() as conn:
result = conn.execute(
text("SELECT uuid FROM langchain_pg_collection WHERE name = :name"),
{"name": "simba_docs"},
)
row = result.fetchone()
return row[0] if row else None
except Exception:
# Table doesn't exist yet (first run before any indexing)
return None
def date_to_epoch(date_str: str) -> float:
split_date = date_str.split("-")
@@ -49,6 +151,7 @@ async def fetch_documents_from_paperless_ngx() -> list[Document]:
documents = []
for doc in data:
metadata = {
"source": "paperless",
"created_date": date_to_epoch(doc["created_date"]),
"filename": doc["original_file_name"],
"document_type": doctypes.get(doc["document_type"], ""),
@@ -58,12 +161,54 @@ async def fetch_documents_from_paperless_ngx() -> list[Document]:
return documents
def _make_serializable(value):
"""Convert a value to a JSON-serializable type."""
if isinstance(value, (str, int, float, bool, type(None))):
return value
if isinstance(value, (list, tuple)):
return [_make_serializable(v) for v in value]
if isinstance(value, dict):
return {k: _make_serializable(v) for k, v in value.items()}
return str(value)
def _sanitize_text(text_content: str) -> str:
"""Strip non-printable and invalid characters that break embedding tokenizers."""
# Remove null bytes and control characters (keep newlines and tabs)
text_content = re.sub(r"[\x00-\x08\x0b\x0c\x0e-\x1f\x7f-\x9f]", "", text_content)
# Remove Unicode surrogates and other problematic Unicode
text_content = re.sub(r"[\ud800-\udfff\ufffe\uffff]", "", text_content)
# Remove replacement character clusters
text_content = text_content.replace("\ufffd", "")
# Collapse excessive whitespace
text_content = re.sub(r" {3,}", " ", text_content)
return text_content.strip()
def _sanitize_documents(documents: list[Document]) -> list[Document]:
"""Sanitize page_content of all documents for embedding compatibility."""
for doc in documents:
doc.page_content = _sanitize_text(doc.page_content)
return [doc for doc in documents if doc.page_content]
async def index_documents():
"""Index Paperless-NGX documents into vector store."""
documents = await fetch_documents_from_paperless_ngx()
splits = text_splitter.split_documents(documents)
await vector_store.aadd_documents(documents=splits)
splits = _sanitize_documents(splits)
logger.info(f"Indexing {len(splits)} chunks from {len(documents)} documents")
vector_store = _get_vector_store()
for i, split in enumerate(splits):
try:
await vector_store.aadd_documents(documents=[split])
except Exception as e:
logger.error(
f"Failed to embed chunk {i} from {split.metadata.get('filename', 'unknown')}: {e}"
)
logger.debug(f"Chunk content preview: {split.page_content[:200]!r}")
raise
async def fetch_obsidian_documents() -> list[Document]:
@@ -85,20 +230,29 @@ async def fetch_obsidian_documents() -> list[Document]:
parsed = obsidian_service.parse_markdown(content, md_path)
# Create LangChain Document with obsidian source
metadata = {
"source": "obsidian",
"filepath": parsed["filepath"],
"folder": str(Path(parsed["filepath"]).parent)
if parsed["filepath"]
else "",
"tags": parsed["tags"],
"created_at": parsed["metadata"].get("created_at"),
"indexed_at": time.time(),
**{
k: v
for k, v in parsed["metadata"].items()
if k not in ["created_at", "created_by"]
},
}
document = Document(
page_content=parsed["content"],
metadata={
"source": "obsidian",
"filepath": parsed["filepath"],
"tags": parsed["tags"],
"created_at": parsed["metadata"].get("created_at"),
**{k: v for k, v in parsed["metadata"].items() if k not in ["created_at", "created_by"]},
},
metadata=_make_serializable(metadata),
)
documents.append(document)
except Exception as e:
print(f"Error reading {md_path}: {e}")
logger.warning(f"Error reading {md_path}: {e}")
continue
return documents
@@ -109,27 +263,168 @@ async def index_obsidian_documents():
Deletes existing obsidian source chunks before re-indexing.
"""
obsidian_service = ObsidianService()
documents = await fetch_obsidian_documents()
if not documents:
print("No Obsidian documents found to index")
logger.info("No Obsidian documents found to index")
return {"indexed": 0}
# Delete existing obsidian chunks
existing_results = vector_store.get(where={"source": "obsidian"})
if existing_results.get("ids"):
await vector_store.adelete(existing_results["ids"])
delete_documents_by_metadata("source", "obsidian")
# Split and index documents
splits = text_splitter.split_documents(documents)
# Split using markdown-aware chunking, sanitize, and index
splits = []
for doc in documents:
splits.extend(_split_markdown_document(doc))
splits = _sanitize_documents(splits)
vector_store = _get_vector_store()
await vector_store.aadd_documents(documents=splits)
return {"indexed": len(documents)}
async def query_vector_store(query: str):
retrieved_docs = await vector_store.asimilarity_search(query, k=2)
# In-memory cache of indexed obsidian files: {filepath: indexed_at}
_obsidian_index_cache: dict[str, float] = {}
def _load_obsidian_index_cache() -> dict[str, float]:
"""Load indexed obsidian files from DB into cache (cold start only)."""
collection_id = _get_collection_id()
if not collection_id:
return {}
engine = _get_engine()
with engine.connect() as conn:
result = conn.execute(
text(
"SELECT DISTINCT cmetadata->>'filepath' AS filepath, "
"MAX((cmetadata->>'indexed_at')::float) AS indexed_at "
"FROM langchain_pg_embedding "
"WHERE collection_id = :cid AND cmetadata->>'source' = 'obsidian' "
"GROUP BY cmetadata->>'filepath'"
),
{"cid": collection_id},
)
return {row[0]: row[1] for row in result if row[0] is not None}
async def sync_obsidian_documents() -> dict[str, int]:
"""Incrementally sync Obsidian documents to pgvector.
Compares file mtimes against stored indexed_at timestamps to only
re-index changed/new files and remove deleted ones.
Returns:
Dict with counts of added, updated, and deleted files.
"""
global _obsidian_index_cache
obsidian_service = ObsidianService()
# Load cache from DB on first run
if not _obsidian_index_cache:
_obsidian_index_cache = _load_obsidian_index_cache()
# Build map of current vault files -> mtime
vault_files: dict[str, float] = {}
for md_path in obsidian_service.walk_vault():
vault_files[str(md_path)] = md_path.stat().st_mtime
added = 0
updated = 0
deleted = 0
# Find files to add or update
files_to_index: list[str] = []
for filepath, mtime in vault_files.items():
indexed_at = _obsidian_index_cache.get(filepath)
if indexed_at is None:
files_to_index.append(filepath)
added += 1
elif mtime > indexed_at:
# Delete old chunks first
delete_documents_by_metadata("filepath", filepath)
files_to_index.append(filepath)
updated += 1
# Find deleted files (in cache but not on disk)
for filepath in list(_obsidian_index_cache):
if filepath not in vault_files:
delete_documents_by_metadata("filepath", filepath)
del _obsidian_index_cache[filepath]
deleted += 1
# Index new/changed files
if files_to_index:
now = time.time()
documents = []
for filepath in files_to_index:
try:
with open(filepath, "r", encoding="utf-8") as f:
content = f.read()
parsed = obsidian_service.parse_markdown(content, filepath)
metadata = {
"source": "obsidian",
"filepath": parsed["filepath"],
"folder": str(Path(parsed["filepath"]).parent)
if parsed["filepath"]
else "",
"tags": parsed["tags"],
"created_at": parsed["metadata"].get("created_at"),
"indexed_at": now,
**{
k: v
for k, v in parsed["metadata"].items()
if k not in ["created_at", "created_by"]
},
}
document = Document(
page_content=parsed["content"],
metadata=_make_serializable(metadata),
)
documents.append(document)
except Exception as e:
logger.warning(f"Error reading {filepath}: {e}")
continue
if documents:
splits = []
for doc in documents:
splits.extend(_split_markdown_document(doc))
splits = _sanitize_documents(splits)
if splits:
vector_store = _get_vector_store()
await vector_store.aadd_documents(documents=splits)
# Update cache for successfully processed files
for filepath in files_to_index:
_obsidian_index_cache[filepath] = now
logger.info(
f"Obsidian sync complete: {added} added, {updated} updated, {deleted} deleted"
)
return {"added": added, "updated": updated, "deleted": deleted}
async def query_vector_store(
query: str,
source: str | None = None,
k: int = 8,
):
"""Query the vector store with optional source filtering and MMR.
Args:
query: Search query text
source: Filter by source metadata (e.g., "obsidian", "paperless")
k: Number of results to return
"""
vector_store = _get_vector_store()
filter_dict = {"source": source} if source else None
retrieved_docs = await vector_store.amax_marginal_relevance_search(
query,
k=k,
fetch_k=k * 3,
filter=filter_dict,
)
serialized = "\n\n".join(
(f"Source: {doc.metadata}\nContent: {doc.page_content}")
for doc in retrieved_docs
@@ -137,33 +432,80 @@ async def query_vector_store(query: str):
return serialized, retrieved_docs
def delete_all_documents():
"""Delete all documents from the vector store collection."""
collection_id = _get_collection_id()
if not collection_id:
return
engine = _get_engine()
with engine.connect() as conn:
conn.execute(
text("DELETE FROM langchain_pg_embedding WHERE collection_id = :cid"),
{"cid": collection_id},
)
conn.commit()
def delete_documents_by_metadata(key: str, value: str):
"""Delete documents matching a metadata key/value pair."""
collection_id = _get_collection_id()
if not collection_id:
return
engine = _get_engine()
with engine.connect() as conn:
conn.execute(
text(
"DELETE FROM langchain_pg_embedding "
"WHERE collection_id = :cid AND cmetadata->>:key = :value"
),
{"cid": collection_id, "key": key, "value": value},
)
conn.commit()
def get_vector_store_stats():
"""Get statistics about the vector store."""
collection = vector_store._collection
count = collection.count()
collection_id = _get_collection_id()
count = 0
if collection_id:
engine = _get_engine()
with engine.connect() as conn:
result = conn.execute(
text(
"SELECT COUNT(*) FROM langchain_pg_embedding WHERE collection_id = :cid"
),
{"cid": collection_id},
)
count = result.scalar()
return {
"total_documents": count,
"collection_name": collection.name,
"collection_name": "simba_docs",
}
def list_all_documents(limit: int = 10):
"""List documents in the vector store with their metadata."""
collection = vector_store._collection
results = collection.get(limit=limit, include=["metadatas", "documents"])
collection_id = _get_collection_id()
if not collection_id:
return []
documents = []
for i, doc_id in enumerate(results["ids"]):
documents.append(
{
"id": doc_id,
"metadata": results["metadatas"][i]
if results.get("metadatas")
else None,
"content_preview": results["documents"][i][:200]
if results.get("documents")
else None,
}
engine = _get_engine()
with engine.connect() as conn:
result = conn.execute(
text(
"SELECT id, document, cmetadata FROM langchain_pg_embedding "
"WHERE collection_id = :cid LIMIT :limit"
),
{"cid": collection_id, "limit": limit},
)
documents = []
for row in result:
documents.append(
{
"id": str(row[0]),
"metadata": row[2],
"content_preview": row[1][:200] if row[1] else None,
}
)
return documents
+171
View File
@@ -0,0 +1,171 @@
import logging
from datetime import datetime, timezone
from quart import Blueprint, request, jsonify
from blueprints.users.decorators import admin_required
from .models import ScheduledMessage, MessageChannel, MessageStatus, Recurrence
scheduled_messages_blueprint = Blueprint(
"scheduled_messages_api", __name__, url_prefix="/api/scheduled-messages"
)
logger = logging.getLogger(__name__)
def _serialize(msg: ScheduledMessage) -> dict:
return {
"id": str(msg.id),
"recipient": msg.recipient,
"channel": msg.channel.value,
"content": msg.content,
"subject": msg.subject,
"scheduled_at": msg.scheduled_at.isoformat(),
"status": msg.status.value,
"recurrence": msg.recurrence.value,
"use_agent": msg.use_agent,
"error_message": msg.error_message,
"created_at": msg.created_at.isoformat(),
"updated_at": msg.updated_at.isoformat(),
}
@scheduled_messages_blueprint.route("/", methods=["GET"])
@admin_required
async def list_messages():
messages = await ScheduledMessage.all().order_by("-scheduled_at")
return jsonify([_serialize(m) for m in messages])
@scheduled_messages_blueprint.route("/", methods=["POST"])
@admin_required
async def create_message():
data = await request.get_json()
if not data:
return jsonify({"error": "Invalid payload"}), 400
recipient = (data.get("recipient") or "").strip()
channel = data.get("channel")
content = (data.get("content") or "").strip()
subject = (data.get("subject") or "").strip() or None
scheduled_at_str = data.get("scheduled_at")
recurrence_str = data.get("recurrence", "none")
if not recipient or not channel or not content or not scheduled_at_str:
return jsonify(
{"error": "recipient, channel, content, and scheduled_at are required"}
), 400
try:
channel_enum = MessageChannel(channel)
except ValueError:
return jsonify(
{"error": f"Invalid channel: {channel}. Must be 'imessage' or 'email'"}
), 400
try:
recurrence_enum = Recurrence(recurrence_str)
except ValueError:
return jsonify(
{
"error": f"Invalid recurrence: {recurrence_str}. Must be 'none', 'daily', 'weekly', or 'monthly'"
}
), 400
if channel_enum == MessageChannel.EMAIL and not subject:
return jsonify({"error": "subject is required for email messages"}), 400
try:
scheduled_at = datetime.fromisoformat(scheduled_at_str)
if scheduled_at.tzinfo is None:
scheduled_at = scheduled_at.replace(tzinfo=timezone.utc)
except ValueError:
return jsonify({"error": "Invalid scheduled_at format"}), 400
if scheduled_at <= datetime.now(timezone.utc):
return jsonify({"error": "scheduled_at must be in the future"}), 400
from quart_jwt_extended import get_jwt_identity
user_id = get_jwt_identity()
use_agent = bool(data.get("use_agent", False))
msg = await ScheduledMessage.create(
recipient=recipient,
channel=channel_enum,
content=content,
subject=subject,
scheduled_at=scheduled_at,
recurrence=recurrence_enum,
use_agent=use_agent,
created_by_id=user_id,
)
return jsonify(_serialize(msg)), 201
@scheduled_messages_blueprint.route("/<msg_id>", methods=["PUT"])
@admin_required
async def update_message(msg_id: str):
msg = await ScheduledMessage.get_or_none(id=msg_id)
if not msg:
return jsonify({"error": "Not found"}), 404
if msg.status != MessageStatus.PENDING:
return jsonify({"error": "Can only update pending messages"}), 400
data = await request.get_json()
if not data:
return jsonify({"error": "Invalid payload"}), 400
if "recipient" in data:
msg.recipient = data["recipient"].strip()
if "channel" in data:
try:
msg.channel = MessageChannel(data["channel"])
except ValueError:
return jsonify({"error": f"Invalid channel: {data['channel']}"}), 400
if "content" in data:
msg.content = data["content"].strip()
if "subject" in data:
msg.subject = data["subject"].strip() or None
if "recurrence" in data:
try:
msg.recurrence = Recurrence(data["recurrence"])
except ValueError:
return jsonify({"error": f"Invalid recurrence: {data['recurrence']}"}), 400
if "use_agent" in data:
msg.use_agent = bool(data["use_agent"])
if "scheduled_at" in data:
try:
scheduled_at = datetime.fromisoformat(data["scheduled_at"])
if scheduled_at.tzinfo is None:
scheduled_at = scheduled_at.replace(tzinfo=timezone.utc)
if scheduled_at <= datetime.now(timezone.utc):
return jsonify({"error": "scheduled_at must be in the future"}), 400
msg.scheduled_at = scheduled_at
except ValueError:
return jsonify({"error": "Invalid scheduled_at format"}), 400
if "status" in data and data["status"] == "cancelled":
msg.status = MessageStatus.CANCELLED
if msg.channel == MessageChannel.EMAIL and not msg.subject:
return jsonify({"error": "subject is required for email messages"}), 400
await msg.save()
return jsonify(_serialize(msg))
@scheduled_messages_blueprint.route("/<msg_id>", methods=["DELETE"])
@admin_required
async def delete_message(msg_id: str):
msg = await ScheduledMessage.get_or_none(id=msg_id)
if not msg:
return jsonify({"error": "Not found"}), 404
if msg.status not in (MessageStatus.PENDING, MessageStatus.CANCELLED):
return jsonify({"error": "Can only delete pending or cancelled messages"}), 400
await msg.delete()
return jsonify({"status": "deleted"})
+48
View File
@@ -0,0 +1,48 @@
import enum
from tortoise import fields
from tortoise.models import Model
class MessageChannel(enum.Enum):
IMESSAGE = "imessage"
EMAIL = "email"
class MessageStatus(enum.Enum):
PENDING = "pending"
SENT = "sent"
FAILED = "failed"
CANCELLED = "cancelled"
class Recurrence(enum.Enum):
NONE = "none"
DAILY = "daily"
WEEKLY = "weekly"
MONTHLY = "monthly"
class ScheduledMessage(Model):
id = fields.UUIDField(primary_key=True)
recipient = fields.CharField(max_length=255)
channel = fields.CharEnumField(enum_type=MessageChannel, max_length=20)
content = fields.TextField()
subject = fields.CharField(max_length=255, null=True)
scheduled_at = fields.DatetimeField()
status = fields.CharEnumField(
enum_type=MessageStatus, max_length=20, default=MessageStatus.PENDING
)
recurrence = fields.CharEnumField(
enum_type=Recurrence, max_length=20, default=Recurrence.NONE
)
use_agent = fields.BooleanField(default=False)
error_message = fields.TextField(null=True)
created_by = fields.ForeignKeyField(
"models.User", related_name="scheduled_messages"
)
created_at = fields.DatetimeField(auto_now_add=True)
updated_at = fields.DatetimeField(auto_now=True)
class Meta:
table = "scheduled_messages"
+113
View File
@@ -0,0 +1,113 @@
import asyncio
import logging
from datetime import datetime, timezone
from dateutil.relativedelta import relativedelta
from .models import ScheduledMessage, MessageChannel, MessageStatus, Recurrence
logger = logging.getLogger(__name__)
POLL_INTERVAL = 15
RECURRENCE_DELTAS = {
Recurrence.DAILY: relativedelta(days=1),
Recurrence.WEEKLY: relativedelta(weeks=1),
Recurrence.MONTHLY: relativedelta(months=1),
}
async def _run_agent(prompt: str) -> str:
"""Run a prompt through the LangChain agent and return the response text."""
from blueprints.conversation.agents import main_agent
from blueprints.conversation.prompts import SIMBA_SYSTEM_PROMPT
messages_payload = [
{"role": "system", "content": SIMBA_SYSTEM_PROMPT},
{"role": "user", "content": prompt},
]
response = await main_agent.ainvoke({"messages": messages_payload})
return response.get("messages", [])[-1].content
async def _schedule_next_occurrence(msg: ScheduledMessage):
"""Create the next pending occurrence for a recurring message."""
delta = RECURRENCE_DELTAS.get(msg.recurrence)
if not delta:
return
next_at = msg.scheduled_at + delta
# If we missed several intervals, advance until we're in the future
now = datetime.now(timezone.utc)
while next_at <= now:
next_at += delta
await ScheduledMessage.create(
recipient=msg.recipient,
channel=msg.channel,
content=msg.content,
subject=msg.subject,
scheduled_at=next_at,
recurrence=msg.recurrence,
use_agent=msg.use_agent,
created_by_id=msg.created_by_id,
)
logger.info(
f"Scheduled next {msg.recurrence.value} occurrence for {msg.id} at {next_at.isoformat()}"
)
async def scheduled_messages_loop():
"""Background loop that polls for and sends due scheduled messages."""
logger.info(f"Scheduled messages loop started (interval={POLL_INTERVAL}s)")
while True:
try:
now = datetime.now(timezone.utc)
due = await ScheduledMessage.filter(
status=MessageStatus.PENDING,
scheduled_at__lte=now,
).all()
for msg in due:
try:
send_content = msg.content
if msg.use_agent:
send_content = await _run_agent(msg.content)
if msg.channel == MessageChannel.IMESSAGE:
from blueprints.imessage import send_imessage
from utils.strip_markdown import strip_markdown
await send_imessage(msg.recipient, strip_markdown(send_content))
elif msg.channel == MessageChannel.EMAIL:
from blueprints.email import send_email_reply
await send_email_reply(
to=msg.recipient,
subject=msg.subject or "(no subject)",
body=send_content,
)
msg.status = MessageStatus.SENT
msg.error_message = None
await msg.save()
logger.info(
f"Sent scheduled {msg.channel.value} message {msg.id} to {msg.recipient}"
)
# Schedule next occurrence for recurring messages
if msg.recurrence != Recurrence.NONE:
await _schedule_next_occurrence(msg)
except Exception as e:
msg.status = MessageStatus.FAILED
msg.error_message = str(e)
await msg.save()
logger.error(f"Failed to send scheduled message {msg.id}: {e}")
except Exception:
logger.exception("Error in scheduled messages loop")
await asyncio.sleep(POLL_INTERVAL)
+97 -35
View File
@@ -212,32 +212,42 @@ async def me():
user = await User.get_or_none(id=user_id)
if not user:
return jsonify({"error": "User not found"}), 404
return jsonify({
"id": str(user.id),
"username": user.username,
"email": user.email,
"is_admin": user.is_admin(),
})
return jsonify(
{
"id": str(user.id),
"username": user.username,
"email": user.email,
"is_admin": user.is_admin(),
}
)
@user_blueprint.route("/admin/users", methods=["GET"])
@admin_required
async def list_users():
from blueprints.email.helpers import get_user_email_address
users = await User.all().order_by("username")
mailgun_domain = os.getenv("MAILGUN_DOMAIN", "")
return jsonify([
{
"id": str(u.id),
"username": u.username,
"email": u.email,
"whatsapp_number": u.whatsapp_number,
"auth_provider": u.auth_provider,
"email_enabled": u.email_enabled,
"email_address": get_user_email_address(u.email_hmac_token, mailgun_domain) if u.email_hmac_token and u.email_enabled else None,
}
for u in users
])
return jsonify(
[
{
"id": str(u.id),
"username": u.username,
"email": u.email,
"whatsapp_number": u.whatsapp_number,
"imessage_number": u.imessage_number,
"auth_provider": u.auth_provider,
"email_enabled": u.email_enabled,
"email_address": get_user_email_address(
u.email_hmac_token, mailgun_domain
)
if u.email_hmac_token and u.email_enabled
else None,
}
for u in users
]
)
@user_blueprint.route("/admin/users/<user_id>/whatsapp", methods=["PUT"])
@@ -254,17 +264,21 @@ async def set_whatsapp(user_id):
conflict = await User.filter(whatsapp_number=number).exclude(id=user_id).first()
if conflict:
return jsonify({"error": "That WhatsApp number is already linked to another account"}), 409
return jsonify(
{"error": "That WhatsApp number is already linked to another account"}
), 409
user.whatsapp_number = number
await user.save()
return jsonify({
"id": str(user.id),
"username": user.username,
"email": user.email,
"whatsapp_number": user.whatsapp_number,
"auth_provider": user.auth_provider,
})
return jsonify(
{
"id": str(user.id),
"username": user.username,
"email": user.email,
"whatsapp_number": user.whatsapp_number,
"auth_provider": user.auth_provider,
}
)
@user_blueprint.route("/admin/users/<user_id>/whatsapp", methods=["DELETE"])
@@ -279,11 +293,55 @@ async def unlink_whatsapp(user_id):
return jsonify({"ok": True})
@user_blueprint.route("/admin/users/<user_id>/imessage", methods=["PUT"])
@admin_required
async def set_imessage(user_id):
data = await request.get_json()
number = (data or {}).get("imessage_number", "").strip()
if not number:
return jsonify({"error": "imessage_number is required"}), 400
user = await User.get_or_none(id=user_id)
if not user:
return jsonify({"error": "User not found"}), 404
conflict = await User.filter(imessage_number=number).exclude(id=user_id).first()
if conflict:
return jsonify(
{"error": "That iMessage number is already linked to another account"}
), 409
user.imessage_number = number
await user.save()
return jsonify(
{
"id": str(user.id),
"username": user.username,
"email": user.email,
"imessage_number": user.imessage_number,
"auth_provider": user.auth_provider,
}
)
@user_blueprint.route("/admin/users/<user_id>/imessage", methods=["DELETE"])
@admin_required
async def unlink_imessage(user_id):
user = await User.get_or_none(id=user_id)
if not user:
return jsonify({"error": "User not found"}), 404
user.imessage_number = None
await user.save()
return jsonify({"ok": True})
@user_blueprint.route("/admin/users/<user_id>/email", methods=["PUT"])
@admin_required
async def toggle_email(user_id):
"""Enable email channel for a user, generating an HMAC token."""
from blueprints.email.helpers import generate_email_token, get_user_email_address
user = await User.get_or_none(id=user_id)
if not user:
return jsonify({"error": "User not found"}), 404
@@ -299,15 +357,19 @@ async def toggle_email(user_id):
user.email_enabled = True
await user.save()
return jsonify({
"id": str(user.id),
"username": user.username,
"email": user.email,
"whatsapp_number": user.whatsapp_number,
"auth_provider": user.auth_provider,
"email_enabled": user.email_enabled,
"email_address": get_user_email_address(user.email_hmac_token, mailgun_domain),
})
return jsonify(
{
"id": str(user.id),
"username": user.username,
"email": user.email,
"whatsapp_number": user.whatsapp_number,
"auth_provider": user.auth_provider,
"email_enabled": user.email_enabled,
"email_address": get_user_email_address(
user.email_hmac_token, mailgun_domain
),
}
)
@user_blueprint.route("/admin/users/<user_id>/email", methods=["DELETE"])
+9 -2
View File
@@ -10,11 +10,18 @@ class User(Model):
username = fields.CharField(max_length=255)
password = fields.BinaryField(null=True) # Hashed - nullable for OIDC users
email = fields.CharField(max_length=100, unique=True)
whatsapp_number = fields.CharField(max_length=30, unique=True, null=True, index=True)
whatsapp_number = fields.CharField(
max_length=30, unique=True, null=True, index=True
)
imessage_number = fields.CharField(
max_length=30, unique=True, null=True, index=True
)
# Email channel fields
email_enabled = fields.BooleanField(default=False)
email_hmac_token = fields.CharField(max_length=16, unique=True, null=True, index=True)
email_hmac_token = fields.CharField(
max_length=16, unique=True, null=True, index=True
)
# OIDC fields
oidc_subject = fields.CharField(
+21 -14
View File
@@ -1,18 +1,16 @@
import os
import logging
import asyncio
import functools
import time
from collections import defaultdict
from quart import Blueprint, request, jsonify, abort
from quart import Blueprint, request, abort
from twilio.request_validator import RequestValidator
from twilio.twiml.messaging_response import MessagingResponse
from blueprints.users.models import User
from blueprints.conversation.logic import (
get_conversation_for_user,
get_conversation_for_channel,
add_message_to_conversation,
get_conversation_transcript,
)
from blueprints.conversation.agents import main_agent
from blueprints.conversation.prompts import SIMBA_SYSTEM_PROMPT
@@ -69,6 +67,7 @@ def validate_twilio_request(f):
so the validated URL matches what Twilio signed against.
Set TWILIO_SIGNATURE_VALIDATION=false to disable in development.
"""
@functools.wraps(f)
async def decorated_function(*args, **kwargs):
if os.getenv("TWILIO_SIGNATURE_VALIDATION", "true").lower() == "false":
@@ -94,6 +93,7 @@ def validate_twilio_request(f):
abort(403)
return await f(*args, **kwargs)
return decorated_function
@@ -104,11 +104,15 @@ async def webhook():
Handle incoming WhatsApp messages from Twilio.
"""
form_data = await request.form
from_number = form_data.get("From") # e.g., "whatsapp:+1234567890"
from_number = form_data.get("From") # e.g., "whatsapp:+1234567890"
body = form_data.get("Body")
if not from_number or not body:
return _twiml_response("Invalid message received.") if from_number else ("Missing From or Body", 400)
return (
_twiml_response("Invalid message received.")
if from_number
else ("Missing From or Body", 400)
)
# Strip whitespace and check for empty body
body = body.strip()
@@ -118,12 +122,16 @@ async def webhook():
# Rate limiting
if not _check_rate_limit(from_number):
logger.warning(f"Rate limit exceeded for {from_number}")
return _twiml_response("You're sending messages too quickly. Please wait a moment and try again.")
return _twiml_response(
"You're sending messages too quickly. Please wait a moment and try again."
)
# Truncate overly long messages
if len(body) > MAX_MESSAGE_LENGTH:
body = body[:MAX_MESSAGE_LENGTH]
logger.info(f"Truncated long message from {from_number} to {MAX_MESSAGE_LENGTH} chars")
logger.info(
f"Truncated long message from {from_number} to {MAX_MESSAGE_LENGTH} chars"
)
logger.info(f"Received WhatsApp message from {from_number}: {body[:100]}")
@@ -143,16 +151,18 @@ async def webhook():
username=username,
email=f"{username}@whatsapp.simbarag.local",
whatsapp_number=from_number,
auth_provider="whatsapp"
auth_provider="whatsapp",
)
logger.info(f"Created new user for WhatsApp: {username}")
except Exception as e:
logger.error(f"Failed to create user for {from_number}: {e}")
return _twiml_response("Sorry, something went wrong setting up your account. Please try again later.")
return _twiml_response(
"Sorry, something went wrong setting up your account. Please try again later."
)
# Get or create a conversation for this user
try:
conversation = await get_conversation_for_user(user=user)
conversation = await get_conversation_for_channel(user=user, channel="whatsapp")
await conversation.fetch_related("messages")
except Exception as e:
logger.error(f"Failed to get conversation for user {user.username}: {e}")
@@ -166,9 +176,6 @@ async def webhook():
user=user,
)
# Get transcript for context
transcript = await get_conversation_transcript(user=user, conversation=conversation)
# Build messages payload for LangChain agent with system prompt and conversation history
try:
# Get last 10 messages for conversation history
+1
View File
@@ -16,6 +16,7 @@ TORTOISE_CONFIG = {
"blueprints.conversation.models",
"blueprints.users.models",
"blueprints.email.models",
"blueprints.scheduled_messages.models",
"aerich.models",
],
"default_connection": "default",
+21 -5
View File
@@ -2,7 +2,7 @@ version: "3.8"
services:
postgres:
image: postgres:16-alpine
image: pgvector/pgvector:pg16
ports:
- "5432:5432"
environment:
@@ -11,6 +11,7 @@ services:
- POSTGRES_DB=${POSTGRES_DB:-raggr}
volumes:
- postgres_data:/var/lib/postgresql/data
- ./docker/init-pgvector.sql:/docker-entrypoint-initdb.d/init-pgvector.sql
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-raggr}"]
interval: 10s
@@ -29,8 +30,9 @@ services:
- PAPERLESS_TOKEN=${PAPERLESS_TOKEN}
- BASE_URL=${BASE_URL}
- OLLAMA_URL=${OLLAMA_URL:-http://localhost:11434}
- CHROMADB_PATH=/app/data/chromadb
- OPENAI_API_KEY=${OPENAI_API_KEY}
- EMBEDDING_SERVER_URL=${EMBEDDING_SERVER_URL}
- EMBEDDING_MODEL_NAME=${EMBEDDING_MODEL_NAME}
- JWT_SECRET_KEY=${JWT_SECRET_KEY}
- LLAMA_SERVER_URL=${LLAMA_SERVER_URL}
- LLAMA_MODEL_NAME=${LLAMA_MODEL_NAME}
@@ -49,7 +51,8 @@ services:
- ALLOWED_WHATSAPP_NUMBERS=${ALLOWED_WHATSAPP_NUMBERS}
- TWILIO_SIGNATURE_VALIDATION=${TWILIO_SIGNATURE_VALIDATION:-true}
- TWILIO_WEBHOOK_URL=${TWILIO_WEBHOOK_URL:-}
- OBSIDIAN_AUTH_TOKEN=${OBSIDIAN_AUTH_TOKEN}
- OBSIDIAN_EMAIL=${OBSIDIAN_EMAIL}
- OBSIDIAN_PASSWORD=${OBSIDIAN_PASSWORD}
- OBSIDIAN_VAULT_ID=${OBSIDIAN_VAULT_ID}
- OBSIDIAN_E2E_PASSWORD=${OBSIDIAN_E2E_PASSWORD}
- OBSIDIAN_DEVICE_NAME=${OBSIDIAN_DEVICE_NAME}
@@ -62,14 +65,27 @@ services:
- S3_REGION=${S3_REGION:-garage}
- OLLAMA_HOST=${OLLAMA_HOST:-http://localhost:11434}
- FERNET_KEY=${FERNET_KEY}
- GOOGLE_CALENDAR_ENABLED=${GOOGLE_CALENDAR_ENABLED:-}
- GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE=${GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE:-/app/config/gws-credentials.json}
- MAILGUN_API_KEY=${MAILGUN_API_KEY}
- MAILGUN_DOMAIN=${MAILGUN_DOMAIN}
- MAILGUN_WEBHOOK_SIGNING_KEY=${MAILGUN_WEBHOOK_SIGNING_KEY}
- MAILGUN_SIGNATURE_VALIDATION=${MAILGUN_SIGNATURE_VALIDATION:-true}
- EMAIL_RATE_LIMIT_MAX=${EMAIL_RATE_LIMIT_MAX:-5}
- EMAIL_RATE_LIMIT_WINDOW=${EMAIL_RATE_LIMIT_WINDOW:-300}
- SENDBLUE_API_KEY=${SENDBLUE_API_KEY}
- SENDBLUE_API_SECRET=${SENDBLUE_API_SECRET}
- SENDBLUE_FROM_NUMBER=${SENDBLUE_FROM_NUMBER}
- SENDBLUE_WEBHOOK_SECRET=${SENDBLUE_WEBHOOK_SECRET}
- SENDBLUE_SIGNATURE_VALIDATION=${SENDBLUE_SIGNATURE_VALIDATION:-true}
- ALLOWED_IMESSAGE_NUMBERS=${ALLOWED_IMESSAGE_NUMBERS}
depends_on:
postgres:
condition: service_healthy
volumes:
- chromadb_data:/app/data/chromadb
- ./obvault:/app/data/obsidian
- ./credentials.json:/app/config/gws-credentials.json:ro
restart: unless-stopped
volumes:
chromadb_data:
postgres_data:
+1
View File
@@ -0,0 +1 @@
CREATE EXTENSION IF NOT EXISTS vector;
-278
View File
@@ -1,278 +0,0 @@
import argparse
import datetime
import logging
import os
import sqlite3
import time
from dotenv import load_dotenv
import chromadb
from utils.chunker import Chunker
from utils.cleaner import pdf_to_image, summarize_pdf_image
from llm import LLMClient
from scripts.query import QueryGenerator
from utils.request import PaperlessNGXService
_dotenv_loaded = load_dotenv()
client = chromadb.PersistentClient(path=os.getenv("CHROMADB_PATH", ""))
simba_docs = client.get_or_create_collection(name="simba_docs2")
feline_vet_lookup = client.get_or_create_collection(name="feline_vet_lookup")
parser = argparse.ArgumentParser(
description="An LLM tool to query information about Simba <3"
)
parser.add_argument("query", type=str, help="questions about simba's health")
parser.add_argument(
"--reindex", action="store_true", help="re-index the simba documents"
)
parser.add_argument("--classify", action="store_true", help="test classification")
parser.add_argument("--index", help="index a file")
ppngx = PaperlessNGXService()
llm_client = LLMClient()
def index_using_pdf_llm(doctypes):
logging.info("reindex data...")
files = ppngx.get_data()
for file in files:
document_id: int = file["id"]
pdf_path = ppngx.download_pdf_from_id(id=document_id)
image_paths = pdf_to_image(filepath=pdf_path)
logging.info(f"summarizing {file}")
generated_summary = summarize_pdf_image(filepaths=image_paths)
file["content"] = generated_summary
chunk_data(files, simba_docs, doctypes=doctypes)
def date_to_epoch(date_str: str) -> float:
split_date = date_str.split("-")
date = datetime.datetime(
int(split_date[0]),
int(split_date[1]),
int(split_date[2]),
0,
0,
0,
)
return date.timestamp()
def chunk_data(docs, collection, doctypes):
# Step 2: Create chunks
chunker = Chunker(collection)
logging.info(f"chunking {len(docs)} documents")
texts: list[str] = [doc["content"] for doc in docs]
with sqlite3.connect("database/visited.db") as conn:
to_insert = []
c = conn.cursor()
for index, text in enumerate(texts):
metadata = {
"created_date": date_to_epoch(docs[index]["created_date"]),
"filename": docs[index]["original_file_name"],
"document_type": doctypes.get(docs[index]["document_type"], ""),
}
if doctypes:
metadata["type"] = doctypes.get(docs[index]["document_type"])
chunker.chunk_document(
document=text,
metadata=metadata,
)
to_insert.append((docs[index]["id"],))
c.executemany(
"INSERT INTO indexed_documents (paperless_id) values (?)", to_insert
)
conn.commit()
def chunk_text(texts: list[str], collection):
chunker = Chunker(collection)
for index, text in enumerate(texts):
metadata = {}
chunker.chunk_document(
document=text,
metadata=metadata,
)
def classify_query(query: str, transcript: str) -> bool:
logging.info("Starting query generation")
qg_start = time.time()
qg = QueryGenerator()
query_type = qg.get_query_type(input=query, transcript=transcript)
logging.info(query_type)
qg_end = time.time()
logging.info(f"Query generation took {qg_end - qg_start:.2f} seconds")
return query_type == "Simba"
def consult_oracle(
input: str,
collection,
transcript: str = "",
):
chunker = Chunker(collection)
start_time = time.time()
# Ask
logging.info("Starting query generation")
qg_start = time.time()
qg = QueryGenerator()
doctype_query = qg.get_doctype_query(input=input)
# metadata_filter = qg.get_query(input)
metadata_filter = {**doctype_query}
logging.info(metadata_filter)
qg_end = time.time()
logging.info(f"Query generation took {qg_end - qg_start:.2f} seconds")
logging.info("Starting embedding generation")
embedding_start = time.time()
embeddings = chunker.embedding_fx(inputs=[input])
embedding_end = time.time()
logging.info(
f"Embedding generation took {embedding_end - embedding_start:.2f} seconds"
)
logging.info("Starting collection query")
query_start = time.time()
results = collection.query(
query_texts=[input],
query_embeddings=embeddings,
where=metadata_filter,
)
query_end = time.time()
logging.info(f"Collection query took {query_end - query_start:.2f} seconds")
# Generate
logging.info("Starting LLM generation")
llm_start = time.time()
system_prompt = "You are a helpful assistant that understands veterinary terms."
transcript_prompt = f"Here is the message transcript thus far {transcript}."
prompt = f"""Using the following data, help answer the user's query by providing as many details as possible.
Using this data: {results}. {transcript_prompt if len(transcript) > 0 else ""}
Respond to this prompt: {input}"""
output = llm_client.chat(prompt=prompt, system_prompt=system_prompt)
llm_end = time.time()
logging.info(f"LLM generation took {llm_end - llm_start:.2f} seconds")
total_time = time.time() - start_time
logging.info(f"Total consult_oracle execution took {total_time:.2f} seconds")
return output
def llm_chat(input: str, transcript: str = "") -> str:
system_prompt = "You are a helpful assistant that understands veterinary terms."
transcript_prompt = f"Here is the message transcript thus far {transcript}."
prompt = f"""Answer the user in as if you were a cat named Simba. Don't act too catlike. Be assertive.
{transcript_prompt if len(transcript) > 0 else ""}
Respond to this prompt: {input}"""
output = llm_client.chat(prompt=prompt, system_prompt=system_prompt)
return output
def paperless_workflow(input):
# Step 1: Get the text
ppngx = PaperlessNGXService()
docs = ppngx.get_data()
chunk_data(docs, collection=simba_docs)
consult_oracle(input, simba_docs)
def consult_simba_oracle(input: str, transcript: str = ""):
is_simba_related = classify_query(query=input, transcript=transcript)
if is_simba_related:
logging.info("Query is related to simba")
return consult_oracle(
input=input,
collection=simba_docs,
transcript=transcript,
)
logging.info("Query is NOT related to simba")
return llm_chat(input=input, transcript=transcript)
def filter_indexed_files(docs):
with sqlite3.connect("database/visited.db") as conn:
c = conn.cursor()
c.execute(
"CREATE TABLE IF NOT EXISTS indexed_documents (id INTEGER PRIMARY KEY AUTOINCREMENT, paperless_id INTEGER)"
)
c.execute("SELECT paperless_id FROM indexed_documents")
rows = c.fetchall()
conn.commit()
visited = {row[0] for row in rows}
return [doc for doc in docs if doc["id"] not in visited]
def reindex():
with sqlite3.connect("database/visited.db") as conn:
c = conn.cursor()
# Ensure the table exists before trying to delete from it
c.execute(
"CREATE TABLE IF NOT EXISTS indexed_documents (id INTEGER PRIMARY KEY AUTOINCREMENT, paperless_id INTEGER)"
)
c.execute("DELETE FROM indexed_documents")
conn.commit()
# Delete all documents from the collection
all_docs = simba_docs.get()
if all_docs["ids"]:
simba_docs.delete(ids=all_docs["ids"])
logging.info("Fetching documents from Paperless-NGX")
ppngx = PaperlessNGXService()
docs = ppngx.get_data()
docs = filter_indexed_files(docs)
logging.info(f"Fetched {len(docs)} documents")
# Delete all chromadb data
ids = simba_docs.get(ids=None, limit=None, offset=0)
all_ids = ids["ids"]
if len(all_ids) > 0:
simba_docs.delete(ids=all_ids)
# Chunk documents
logging.info("Chunking documents now ...")
doctype_lookup = ppngx.get_doctypes()
chunk_data(docs, collection=simba_docs, doctypes=doctype_lookup)
logging.info("Done chunking documents")
if __name__ == "__main__":
args = parser.parse_args()
if args.reindex:
reindex()
if args.classify:
consult_simba_oracle(input="yohohoho testing")
consult_simba_oracle(input="write an email")
consult_simba_oracle(input="how much does simba weigh")
if args.query:
logging.info("Consulting oracle ...")
print(
consult_oracle(
input=args.query,
collection=simba_docs,
)
)
else:
logging.info("please provide a query")
+2 -2
View File
@@ -5,7 +5,8 @@ description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
"chromadb>=1.1.0",
"langchain-postgres>=0.0.13",
"psycopg[binary]>=3.1.0",
"python-dotenv>=1.0.0",
"flask>=3.1.2",
"httpx>=0.28.1",
@@ -30,7 +31,6 @@ dependencies = [
"asyncpg>=0.30.0",
"langchain-openai>=1.1.6",
"langchain>=1.2.0",
"langchain-chroma>=1.0.0",
"langchain-community>=0.4.1",
"jq>=1.10.0",
"tavily-python>=0.7.17",
+3 -38
View File
@@ -1,48 +1,13 @@
import { useState, useEffect } from "react";
import "./App.css";
import { AuthProvider } from "./contexts/AuthContext";
import { ChatScreen } from "./components/ChatScreen";
import { LoginScreen } from "./components/LoginScreen";
import { conversationService } from "./api/conversationService";
import { useAuthCheck } from "./hooks/useAuthCheck";
import catIcon from "./assets/cat.png";
const AppContainer = () => {
const [isAuthenticated, setAuthenticated] = useState<boolean>(false);
const [isChecking, setIsChecking] = useState<boolean>(true);
const { isAuthenticated, isChecking, isAdmin, setAuthenticated } = useAuthCheck();
useEffect(() => {
const checkAuth = async () => {
const accessToken = localStorage.getItem("access_token");
const refreshToken = localStorage.getItem("refresh_token");
// No tokens at all, not authenticated
if (!accessToken && !refreshToken) {
setIsChecking(false);
setAuthenticated(false);
return;
}
// Try to verify token by making a request
try {
await conversationService.getAllConversations();
// If successful, user is authenticated
setAuthenticated(true);
} catch (error) {
// Token is invalid or expired
console.error("Authentication check failed:", error);
localStorage.removeItem("access_token");
localStorage.removeItem("refresh_token");
setAuthenticated(false);
} finally {
setIsChecking(false);
}
};
checkAuth();
}, []);
// Show loading state while checking authentication
if (isChecking) {
return (
<div className="h-screen flex flex-col items-center justify-center bg-cream gap-4">
@@ -61,7 +26,7 @@ const AppContainer = () => {
return (
<>
{isAuthenticated ? (
<ChatScreen setAuthenticated={setAuthenticated} />
<ChatScreen setAuthenticated={setAuthenticated} isAdmin={isAdmin} />
) : (
<LoginScreen setAuthenticated={setAuthenticated} />
)}
@@ -0,0 +1,72 @@
import { userService } from "./userService";
export interface ScheduledMessage {
id: string;
recipient: string;
channel: "imessage" | "email";
content: string;
subject: string | null;
scheduled_at: string;
status: "pending" | "sent" | "failed" | "cancelled";
recurrence: "none" | "daily" | "weekly" | "monthly";
use_agent: boolean;
error_message: string | null;
created_at: string;
updated_at: string;
}
export interface CreateScheduledMessage {
recipient: string;
channel: "imessage" | "email";
content: string;
subject?: string;
scheduled_at: string;
recurrence?: "none" | "daily" | "weekly" | "monthly";
use_agent?: boolean;
}
class ScheduledMessageService {
private baseUrl = "/api/scheduled-messages";
async list(): Promise<ScheduledMessage[]> {
const response = await userService.fetchWithRefreshToken(`${this.baseUrl}/`);
if (!response.ok) throw new Error("Failed to list scheduled messages");
return response.json();
}
async create(data: CreateScheduledMessage): Promise<ScheduledMessage> {
const response = await userService.fetchWithRefreshToken(`${this.baseUrl}/`, {
method: "POST",
body: JSON.stringify(data),
});
if (!response.ok) {
const err = await response.json();
throw new Error(err.error ?? "Failed to create scheduled message");
}
return response.json();
}
async update(id: string, data: Partial<CreateScheduledMessage> & { status?: string }): Promise<ScheduledMessage> {
const response = await userService.fetchWithRefreshToken(`${this.baseUrl}/${id}`, {
method: "PUT",
body: JSON.stringify(data),
});
if (!response.ok) {
const err = await response.json();
throw new Error(err.error ?? "Failed to update scheduled message");
}
return response.json();
}
async remove(id: string): Promise<void> {
const response = await userService.fetchWithRefreshToken(`${this.baseUrl}/${id}`, {
method: "DELETE",
});
if (!response.ok) {
const err = await response.json();
throw new Error(err.error ?? "Failed to delete scheduled message");
}
}
}
export const scheduledMessageService = new ScheduledMessageService();
+15 -29
View File
@@ -1,4 +1,4 @@
import { useEffect, useState } from "react";
import { useState } from "react";
import { X, Phone, PhoneOff, Pencil, Check, Mail, Copy } from "lucide-react";
import { userService, type AdminUserRecord } from "../api/userService";
import { cn } from "../lib/utils";
@@ -12,27 +12,19 @@ import {
TableHeader,
TableRow,
} from "./ui/table";
import { useAdminUsers } from "../hooks/useAdminUsers";
type Props = {
onClose: () => void;
};
export const AdminPanel = ({ onClose }: Props) => {
const [users, setUsers] = useState<AdminUserRecord[]>([]);
const [loading, setLoading] = useState(true);
const { users, loading, updateUser } = useAdminUsers();
const [editingId, setEditingId] = useState<string | null>(null);
const [editValue, setEditValue] = useState("");
const [rowError, setRowError] = useState<Record<string, string>>({});
const [rowSuccess, setRowSuccess] = useState<Record<string, string>>({});
useEffect(() => {
userService
.adminListUsers()
.then(setUsers)
.catch(() => {})
.finally(() => setLoading(false));
}, []);
const startEdit = (user: AdminUserRecord) => {
setEditingId(user.id);
setEditValue(user.whatsapp_number ?? "");
@@ -49,8 +41,8 @@ export const AdminPanel = ({ onClose }: Props) => {
setRowError((p) => ({ ...p, [userId]: "" }));
try {
const updated = await userService.adminSetWhatsapp(userId, editValue);
setUsers((p) => p.map((u) => (u.id === userId ? updated : u)));
setRowSuccess((p) => ({ ...p, [userId]: "Saved" }));
updateUser(userId, () => updated);
setRowSuccess((p) => ({ ...p, [userId]: "Saved" }));
setEditingId(null);
setTimeout(() => setRowSuccess((p) => ({ ...p, [userId]: "" })), 2000);
} catch (err) {
@@ -65,10 +57,8 @@ export const AdminPanel = ({ onClose }: Props) => {
setRowError((p) => ({ ...p, [userId]: "" }));
try {
await userService.adminUnlinkWhatsapp(userId);
setUsers((p) =>
p.map((u) => (u.id === userId ? { ...u, whatsapp_number: null } : u)),
);
setRowSuccess((p) => ({ ...p, [userId]: "Unlinked ✓" }));
updateUser(userId, (u) => ({ ...u, whatsapp_number: null }));
setRowSuccess((p) => ({ ...p, [userId]: "Unlinked" }));
setTimeout(() => setRowSuccess((p) => ({ ...p, [userId]: "" })), 2000);
} catch (err) {
setRowError((p) => ({
@@ -82,8 +72,8 @@ export const AdminPanel = ({ onClose }: Props) => {
setRowError((p) => ({ ...p, [userId]: "" }));
try {
const updated = await userService.adminToggleEmail(userId);
setUsers((p) => p.map((u) => (u.id === userId ? updated : u)));
setRowSuccess((p) => ({ ...p, [userId]: "Email enabled" }));
updateUser(userId, () => updated);
setRowSuccess((p) => ({ ...p, [userId]: "Email enabled" }));
setTimeout(() => setRowSuccess((p) => ({ ...p, [userId]: "" })), 2000);
} catch (err) {
setRowError((p) => ({
@@ -97,10 +87,8 @@ export const AdminPanel = ({ onClose }: Props) => {
setRowError((p) => ({ ...p, [userId]: "" }));
try {
await userService.adminDisableEmail(userId);
setUsers((p) =>
p.map((u) => (u.id === userId ? { ...u, email_enabled: false, email_address: null } : u)),
);
setRowSuccess((p) => ({ ...p, [userId]: "Email disabled ✓" }));
updateUser(userId, (u) => ({ ...u, email_enabled: false, email_address: null }));
setRowSuccess((p) => ({ ...p, [userId]: "Email disabled" }));
setTimeout(() => setRowSuccess((p) => ({ ...p, [userId]: "" })), 2000);
} catch (err) {
setRowError((p) => ({
@@ -112,7 +100,7 @@ export const AdminPanel = ({ onClose }: Props) => {
const copyToClipboard = (text: string, userId: string) => {
navigator.clipboard.writeText(text);
setRowSuccess((p) => ({ ...p, [userId]: "Copied" }));
setRowSuccess((p) => ({ ...p, [userId]: "Copied" }));
setTimeout(() => setRowSuccess((p) => ({ ...p, [userId]: "" })), 2000);
};
@@ -128,7 +116,6 @@ export const AdminPanel = ({ onClose }: Props) => {
"border border-sand-light/60",
)}
>
{/* Header */}
<div className="flex items-center justify-between px-6 py-4 border-b border-sand-light/60">
<div className="flex items-center gap-2.5">
<div className="w-8 h-8 rounded-xl bg-leaf-pale flex items-center justify-center">
@@ -146,7 +133,6 @@ export const AdminPanel = ({ onClose }: Props) => {
</button>
</div>
{/* Body */}
<div className="overflow-y-auto flex-1 rounded-b-3xl">
{loading ? (
<div className="px-6 py-12 text-center text-warm-gray text-sm">
@@ -155,7 +141,7 @@ export const AdminPanel = ({ onClose }: Props) => {
<span className="loading-dot w-2 h-2 rounded-full bg-amber-soft inline-block" />
<span className="loading-dot w-2 h-2 rounded-full bg-amber-soft inline-block" />
</div>
Loading users
Loading users...
</div>
) : (
<Table>
@@ -204,7 +190,7 @@ export const AdminPanel = ({ onClose }: Props) => {
: "text-warm-gray/40 italic",
)}
>
{user.whatsapp_number ?? ""}
{user.whatsapp_number ?? "\u2014"}
</span>
{rowSuccess[user.id] && (
<span className="text-xs text-leaf-dark">
@@ -235,7 +221,7 @@ export const AdminPanel = ({ onClose }: Props) => {
</button>
</div>
) : (
<span className="text-sm text-warm-gray/40 italic"></span>
<span className="text-sm text-warm-gray/40 italic">\u2014</span>
)}
</div>
</TableCell>
@@ -1,3 +1,4 @@
import React from "react";
import ReactMarkdown from "react-markdown";
import { cn } from "../lib/utils";
@@ -6,7 +7,7 @@ type AnswerBubbleProps = {
loading?: boolean;
};
export const AnswerBubble = ({ text, loading }: AnswerBubbleProps) => {
export const AnswerBubble = React.memo(({ text, loading }: AnswerBubbleProps) => {
return (
<div className="flex justify-start message-enter">
<div
@@ -17,7 +18,6 @@ export const AnswerBubble = ({ text, loading }: AnswerBubbleProps) => {
"overflow-hidden",
)}
>
{/* amber accent bar */}
<div className="h-0.5 w-full bg-gradient-to-r from-amber-soft via-amber-glow/50 to-transparent" />
<div className="px-4 py-3">
@@ -36,4 +36,4 @@ export const AnswerBubble = ({ text, loading }: AnswerBubbleProps) => {
</div>
</div>
);
};
});
+76 -204
View File
@@ -1,213 +1,87 @@
import { useCallback, useEffect, useState, useRef } from "react";
import { LogOut, Shield, PanelLeftClose, PanelLeftOpen, Menu, X } from "lucide-react";
import { conversationService } from "../api/conversationService";
import { userService } from "../api/userService";
import { useCallback, useState, useRef } from "react";
import { LogOut, Shield, Clock, PanelLeftClose, PanelLeftOpen, Menu, X } from "lucide-react";
import { QuestionBubble } from "./QuestionBubble";
import { AnswerBubble } from "./AnswerBubble";
import { ToolBubble } from "./ToolBubble";
import { MessageInput } from "./MessageInput";
import { ConversationList } from "./ConversationList";
import { AdminPanel } from "./AdminPanel";
import { ScheduledMessagesPanel } from "./ScheduledMessagesPanel";
import { cn } from "../lib/utils";
import { useConversations } from "../hooks/useConversations";
import { useChat } from "../hooks/useChat";
import catIcon from "../assets/cat.png";
type Message = {
text: string;
speaker: "simba" | "user" | "tool";
image_key?: string | null;
};
type Conversation = {
title: string;
id: string;
};
type ChatScreenProps = {
setAuthenticated: (isAuth: boolean) => void;
isAdmin: boolean;
};
const TOOL_MESSAGES: Record<string, string> = {
simba_search: "🔍 Searching Simba's records...",
web_search: "🌐 Searching the web...",
get_current_date: "📅 Checking today's date...",
ynab_budget_summary: "💰 Checking budget summary...",
ynab_search_transactions: "💳 Looking up transactions...",
ynab_category_spending: "📊 Analyzing category spending...",
ynab_insights: "📈 Generating budget insights...",
obsidian_search_notes: "📝 Searching notes...",
obsidian_read_note: "📖 Reading note...",
obsidian_create_note: "✏️ Saving note...",
obsidian_create_task: "✅ Creating task...",
journal_get_today: "📔 Reading today's journal...",
journal_get_tasks: "📋 Getting tasks...",
journal_add_task: " Adding task...",
journal_complete_task: "✔️ Completing task...",
};
export const ChatScreen = ({ setAuthenticated }: ChatScreenProps) => {
const [query, setQuery] = useState<string>("");
const [simbaMode, setSimbaMode] = useState<boolean>(false);
const [messages, setMessages] = useState<Message[]>([]);
const [conversations, setConversations] = useState<Conversation[]>([]);
const [showConversations, setShowConversations] = useState<boolean>(false);
const [selectedConversation, setSelectedConversation] =
useState<Conversation | null>(null);
const [sidebarCollapsed, setSidebarCollapsed] = useState<boolean>(false);
const [isLoading, setIsLoading] = useState<boolean>(false);
const [isAdmin, setIsAdmin] = useState<boolean>(false);
const [showAdminPanel, setShowAdminPanel] = useState<boolean>(false);
const [pendingImage, setPendingImage] = useState<File | null>(null);
export const ChatScreen = ({ setAuthenticated, isAdmin }: ChatScreenProps) => {
const [query, setQuery] = useState("");
const [simbaMode, setSimbaMode] = useState(false);
const [showConversations, setShowConversations] = useState(false);
const [sidebarCollapsed, setSidebarCollapsed] = useState(false);
const [showAdminPanel, setShowAdminPanel] = useState(false);
const [showScheduler, setShowScheduler] = useState(false);
const messagesEndRef = useRef<HTMLDivElement>(null);
const isMountedRef = useRef<boolean>(true);
const abortControllerRef = useRef<AbortController | null>(null);
const simbaAnswers = ["meow.", "hiss...", "purrrrrr", "yowOWROWWowowr"];
const isLoadingRef = useRef(false);
const scrollToBottom = useCallback(() => {
requestAnimationFrame(() => {
messagesEndRef.current?.scrollIntoView({
behavior: isLoading ? "instant" : "smooth",
behavior: isLoadingRef.current ? "instant" : "smooth",
});
});
}, [isLoading]);
useEffect(() => {
isMountedRef.current = true;
return () => {
isMountedRef.current = false;
abortControllerRef.current?.abort();
};
}, []);
const handleSelectConversation = (conversation: Conversation) => {
setShowConversations(false);
setSelectedConversation(conversation);
const load = async () => {
try {
const fetched = await conversationService.getConversation(conversation.id);
setMessages(
fetched.messages.map((m) => ({ text: m.text, speaker: m.speaker, image_key: m.image_key })),
);
} catch (err) {
console.error("Failed to load messages:", err);
}
};
load();
};
const {
conversations,
selectedConversation,
selectConversation,
createConversation,
refreshConversations,
} = useConversations();
const loadConversations = async () => {
try {
const fetched = await conversationService.getAllConversations();
const parsed = fetched.map((c) => ({ id: c.id, title: c.name }));
setConversations(parsed);
} catch (err) {
console.error("Failed to load conversations:", err);
}
};
const onSessionExpired = useCallback(() => setAuthenticated(false), [setAuthenticated]);
const handleCreateNewConversation = async () => {
const newConv = await conversationService.createConversation();
await loadConversations();
setSelectedConversation({ title: newConv.name, id: newConv.id });
};
const {
messages,
setMessages,
isLoading,
pendingImage,
setPendingImage,
sendMessage,
} = useChat({
selectedConversation,
createConversation,
refreshConversations,
onSessionExpired,
scrollToBottom,
});
useEffect(() => {
loadConversations();
userService.getMe().then((me) => setIsAdmin(me.is_admin)).catch(() => {});
}, []);
// Keep ref in sync for scrollToBottom behavior
isLoadingRef.current = isLoading;
useEffect(() => {
scrollToBottom();
}, [messages]);
const handleSelectConversation = useCallback(
async (conversation: { title: string; id: string }) => {
setShowConversations(false);
const loaded = await selectConversation(conversation);
setMessages(loaded);
},
[selectConversation, setMessages],
);
const handleQuestionSubmit = useCallback(async () => {
if ((!query.trim() && !pendingImage) || isLoading) return;
const handleCreateNewConversation = useCallback(async () => {
await createConversation();
setMessages([]);
}, [createConversation, setMessages]);
let activeConversation = selectedConversation;
if (!activeConversation) {
const newConv = await conversationService.createConversation();
activeConversation = { title: newConv.name, id: newConv.id };
setSelectedConversation(activeConversation);
setConversations((prev) => [activeConversation!, ...prev]);
}
// Capture pending image before clearing state
const imageFile = pendingImage;
const currMessages = messages.concat([{ text: query, speaker: "user" }]);
setMessages(currMessages);
const handleQuestionSubmit = useCallback(() => {
sendMessage(query, simbaMode);
setQuery("");
setPendingImage(null);
setIsLoading(true);
if (simbaMode) {
const randomElement = simbaAnswers[Math.floor(Math.random() * simbaAnswers.length)];
setMessages((prev) => prev.concat([{ text: randomElement, speaker: "simba" }]));
setIsLoading(false);
return;
}
const abortController = new AbortController();
abortControllerRef.current = abortController;
try {
// Upload image first if present
let imageKey: string | undefined;
if (imageFile) {
const uploadResult = await conversationService.uploadImage(
imageFile,
activeConversation.id,
);
imageKey = uploadResult.image_key;
// Update the user message with the image key
setMessages((prev) => {
const updated = [...prev];
// Find the last user message we just added
for (let i = updated.length - 1; i >= 0; i--) {
if (updated[i].speaker === "user") {
updated[i] = { ...updated[i], image_key: imageKey };
break;
}
}
return updated;
});
}
await conversationService.streamQuery(
query,
activeConversation.id,
(event) => {
if (!isMountedRef.current) return;
if (event.type === "tool_start") {
const friendly = TOOL_MESSAGES[event.tool] ?? `🔧 Using ${event.tool}...`;
setMessages((prev) => prev.concat([{ text: friendly, speaker: "tool" }]));
} else if (event.type === "response") {
setMessages((prev) => prev.concat([{ text: event.message, speaker: "simba" }]));
} else if (event.type === "error") {
console.error("Stream error:", event.message);
}
},
abortController.signal,
imageKey,
);
} catch (error) {
if (error instanceof Error && error.name === "AbortError") {
console.log("Request was aborted");
} else {
console.error("Failed to send query:", error);
if (error instanceof Error && error.message.includes("Session expired")) {
setAuthenticated(false);
}
}
} finally {
if (isMountedRef.current) {
setIsLoading(false);
loadConversations();
}
abortControllerRef.current = null;
}
}, [query, pendingImage, isLoading, selectedConversation, simbaMode, messages, setAuthenticated]);
}, [query, simbaMode, sendMessage]);
const handleQueryChange = useCallback((event: React.ChangeEvent<HTMLTextAreaElement>) => {
setQuery(event.target.value);
@@ -221,8 +95,8 @@ export const ChatScreen = ({ setAuthenticated }: ChatScreenProps) => {
}
}, [handleQuestionSubmit]);
const handleImageSelect = useCallback((file: File) => setPendingImage(file), []);
const handleClearImage = useCallback(() => setPendingImage(null), []);
const handleImageSelect = useCallback((file: File) => setPendingImage(file), [setPendingImage]);
const handleClearImage = useCallback(() => setPendingImage(null), [setPendingImage]);
const handleLogout = () => {
localStorage.removeItem("access_token");
@@ -232,7 +106,7 @@ export const ChatScreen = ({ setAuthenticated }: ChatScreenProps) => {
return (
<div className="h-screen h-[100dvh] flex flex-row bg-cream overflow-hidden">
{/* ── Desktop Sidebar ─────────────────────────────── */}
{/* Desktop Sidebar */}
<aside
className={cn(
"hidden md:flex md:flex-col",
@@ -241,7 +115,6 @@ export const ChatScreen = ({ setAuthenticated }: ChatScreenProps) => {
)}
>
{sidebarCollapsed ? (
/* Collapsed state */
<div className="flex flex-col items-center py-4 gap-4 h-full">
<button
onClick={() => setSidebarCollapsed(false)}
@@ -256,9 +129,7 @@ export const ChatScreen = ({ setAuthenticated }: ChatScreenProps) => {
/>
</div>
) : (
/* Expanded state */
<div className="flex flex-col h-full">
{/* Header */}
<div className="flex items-center justify-between px-4 py-4 border-b border-white/8">
<div className="flex items-center gap-2.5">
<img src={catIcon} alt="Simba" className="w-12 h-12" />
@@ -277,7 +148,6 @@ export const ChatScreen = ({ setAuthenticated }: ChatScreenProps) => {
</button>
</div>
{/* Conversations */}
<div className="flex-1 overflow-y-auto px-2 py-3">
<ConversationList
conversations={conversations}
@@ -287,16 +157,24 @@ export const ChatScreen = ({ setAuthenticated }: ChatScreenProps) => {
/>
</div>
{/* Footer */}
<div className="px-2 pb-3 pt-2 border-t border-white/8 flex flex-col gap-0.5">
{isAdmin && (
<button
onClick={() => setShowAdminPanel(true)}
className="flex items-center gap-2 w-full px-3 py-2 rounded-xl text-sm text-cream/50 hover:text-cream hover:bg-white/8 transition-all cursor-pointer"
>
<Shield size={14} />
<span>Admin</span>
</button>
<>
<button
onClick={() => setShowAdminPanel(true)}
className="flex items-center gap-2 w-full px-3 py-2 rounded-xl text-sm text-cream/50 hover:text-cream hover:bg-white/8 transition-all cursor-pointer"
>
<Shield size={14} />
<span>Admin</span>
</button>
<button
onClick={() => setShowScheduler(true)}
className="flex items-center gap-2 w-full px-3 py-2 rounded-xl text-sm text-cream/50 hover:text-cream hover:bg-white/8 transition-all cursor-pointer"
>
<Clock size={14} />
<span>Scheduler</span>
</button>
</>
)}
<button
onClick={handleLogout}
@@ -310,12 +188,10 @@ export const ChatScreen = ({ setAuthenticated }: ChatScreenProps) => {
)}
</aside>
{/* Admin Panel modal */}
{showAdminPanel && <AdminPanel onClose={() => setShowAdminPanel(false)} />}
{showScheduler && <ScheduledMessagesPanel onClose={() => setShowScheduler(false)} />}
{/* ── Main chat area ──────────────────────────────── */}
<div className="flex-1 flex flex-col h-full overflow-hidden min-w-0">
{/* Mobile header */}
<header className="md:hidden flex items-center justify-between px-4 py-3 bg-warm-white border-b border-sand-light/60">
<div className="flex items-center gap-2">
<img src={catIcon} alt="Simba" className="w-12 h-12" />
@@ -343,9 +219,7 @@ export const ChatScreen = ({ setAuthenticated }: ChatScreenProps) => {
</header>
{messages.length === 0 ? (
/* ── Empty / homepage state ── */
<div className="flex-1 flex flex-col items-center justify-center px-4 gap-6">
{/* Mobile conversation drawer */}
{showConversations && (
<div className="md:hidden w-full max-w-2xl bg-warm-white rounded-2xl border border-sand-light p-3 shadow-sm">
<ConversationList
@@ -382,11 +256,9 @@ export const ChatScreen = ({ setAuthenticated }: ChatScreenProps) => {
</div>
</div>
) : (
/* ── Active chat state ── */
<>
<div className="flex-1 overflow-y-auto px-4 py-6">
<div className="max-w-2xl mx-auto flex flex-col gap-3">
{/* Mobile conversation drawer */}
{showConversations && (
<div className="md:hidden mb-3 bg-warm-white rounded-2xl border border-sand-light p-3 shadow-sm">
<ConversationList
@@ -422,8 +294,8 @@ export const ChatScreen = ({ setAuthenticated }: ChatScreenProps) => {
setSimbaMode={setSimbaMode}
isLoading={isLoading}
pendingImage={pendingImage}
onImageSelect={(file) => setPendingImage(file)}
onClearImage={() => setPendingImage(null)}
onImageSelect={handleImageSelect}
onClearImage={handleClearImage}
/>
</div>
</footer>
@@ -1,7 +1,5 @@
import { useState, useEffect } from "react";
import { Plus } from "lucide-react";
import { cn } from "../lib/utils";
import { conversationService } from "../api/conversationService";
type Conversation = {
title: string;
@@ -23,32 +21,8 @@ export const ConversationList = ({
selectedId,
variant = "dark",
}: ConversationProps) => {
const [items, setItems] = useState(conversations);
useEffect(() => {
const load = async () => {
try {
let fetched = await conversationService.getAllConversations();
if (fetched.length === 0) {
await conversationService.createConversation();
fetched = await conversationService.getAllConversations();
}
setItems(fetched.map((c) => ({ id: c.id, title: c.name })));
} catch (err) {
console.error("Failed to load conversations:", err);
}
};
load();
}, []);
// Keep in sync when parent updates conversations
useEffect(() => {
setItems(conversations);
}, [conversations]);
return (
<div className="flex flex-col gap-1">
{/* New thread button */}
<button
onClick={onCreateNewConversation}
className={cn(
@@ -63,8 +37,7 @@ export const ConversationList = ({
<span>New thread</span>
</button>
{/* Conversation items */}
{items.map((conv) => {
{conversations.map((conv) => {
const isActive = conv.id === selectedId;
return (
<button
+6 -57
View File
@@ -1,66 +1,19 @@
import { useState, useEffect } from "react";
import { userService } from "../api/userService";
import { oidcService } from "../api/oidcService";
import catIcon from "../assets/cat.png";
import { cn } from "../lib/utils";
import { useOIDCAuth } from "../hooks/useOIDCAuth";
type LoginScreenProps = {
setAuthenticated: (isAuth: boolean) => void;
};
export const LoginScreen = ({ setAuthenticated }: LoginScreenProps) => {
const [error, setError] = useState<string>("");
const [isChecking, setIsChecking] = useState<boolean>(true);
const [isLoggingIn, setIsLoggingIn] = useState<boolean>(false);
useEffect(() => {
const initAuth = async () => {
const callbackParams = oidcService.getCallbackParamsFromURL();
if (callbackParams) {
try {
setIsLoggingIn(true);
const result = await oidcService.handleCallback(
callbackParams.code,
callbackParams.state,
);
localStorage.setItem("access_token", result.access_token);
localStorage.setItem("refresh_token", result.refresh_token);
oidcService.clearCallbackParams();
setAuthenticated(true);
setIsChecking(false);
return;
} catch (err) {
console.error("OIDC callback error:", err);
setError("Login failed. Please try again.");
oidcService.clearCallbackParams();
setIsLoggingIn(false);
setIsChecking(false);
return;
}
}
const isValid = await userService.validateToken();
if (isValid) setAuthenticated(true);
setIsChecking(false);
};
initAuth();
}, [setAuthenticated]);
const handleOIDCLogin = async () => {
try {
setIsLoggingIn(true);
setError("");
const authUrl = await oidcService.initiateLogin();
window.location.href = authUrl;
} catch {
setError("Failed to initiate login. Please try again.");
setIsLoggingIn(false);
}
};
const { isChecking, isLoggingIn, error, handleLogin } = useOIDCAuth({
setAuthenticated,
});
if (isChecking || isLoggingIn) {
return (
<div className="h-screen flex flex-col items-center justify-center bg-cream gap-4">
{/* Subtle dot grid */}
<div
className="fixed inset-0 pointer-events-none opacity-[0.035]"
style={{
@@ -85,7 +38,6 @@ export const LoginScreen = ({ setAuthenticated }: LoginScreenProps) => {
return (
<div className="h-screen bg-cream flex items-center justify-center p-4 relative overflow-hidden">
{/* Background dot texture */}
<div
className="fixed inset-0 pointer-events-none opacity-[0.04]"
style={{
@@ -94,12 +46,10 @@ export const LoginScreen = ({ setAuthenticated }: LoginScreenProps) => {
}}
/>
{/* Decorative background blobs */}
<div className="absolute top-1/4 -left-20 w-72 h-72 rounded-full bg-leaf-pale/60 blur-3xl pointer-events-none" />
<div className="absolute bottom-1/4 -right-20 w-64 h-64 rounded-full bg-amber-pale/70 blur-3xl pointer-events-none" />
<div className="relative w-full max-w-sm">
{/* Branding */}
<div className="flex flex-col items-center mb-8">
<div className="relative mb-5">
<div className="absolute -inset-5 bg-amber-soft/30 rounded-full blur-2xl" />
@@ -120,7 +70,6 @@ export const LoginScreen = ({ setAuthenticated }: LoginScreenProps) => {
</p>
</div>
{/* Card */}
<div
className={cn(
"bg-warm-white rounded-3xl border border-sand-light",
@@ -138,7 +87,7 @@ export const LoginScreen = ({ setAuthenticated }: LoginScreenProps) => {
</p>
<button
onClick={handleOIDCLogin}
onClick={handleLogin}
disabled={isLoggingIn}
className={cn(
"w-full py-3.5 px-4 rounded-2xl text-sm font-semibold tracking-wide",
@@ -154,7 +103,7 @@ export const LoginScreen = ({ setAuthenticated }: LoginScreenProps) => {
</div>
<p className="text-center text-sand mt-5 text-xs tracking-widest select-none">
meow
* meow *
</p>
</div>
</div>
@@ -1,26 +1,14 @@
import { useEffect, useState } from "react";
import React from "react";
import { cn } from "../lib/utils";
import { conversationService } from "../api/conversationService";
import { usePresignedUrl } from "../hooks/usePresignedUrl";
type QuestionBubbleProps = {
text: string;
image_key?: string | null;
};
export const QuestionBubble = ({ text, image_key }: QuestionBubbleProps) => {
const [imageUrl, setImageUrl] = useState<string | null>(null);
const [imageError, setImageError] = useState(false);
useEffect(() => {
if (!image_key) return;
conversationService
.getPresignedImageUrl(image_key)
.then(setImageUrl)
.catch((err) => {
console.error("Failed to load image:", err);
setImageError(true);
});
}, [image_key]);
export const QuestionBubble = React.memo(({ text, image_key }: QuestionBubbleProps) => {
const { imageUrl, imageError } = usePresignedUrl(image_key);
return (
<div className="flex justify-end message-enter">
@@ -34,7 +22,6 @@ export const QuestionBubble = ({ text, image_key }: QuestionBubbleProps) => {
>
{imageError && (
<div className="flex items-center gap-2 text-xs text-charcoal/50 bg-charcoal/5 rounded-xl px-3 py-2 mb-2">
<span>🖼</span>
<span>Image failed to load</span>
</div>
)}
@@ -49,4 +36,4 @@ export const QuestionBubble = ({ text, image_key }: QuestionBubbleProps) => {
</div>
</div>
);
};
});
@@ -0,0 +1,335 @@
import { useState } from "react";
import { X, Clock, Send, Trash2, XCircle, RotateCcw, Repeat, Bot } from "lucide-react";
import { cn } from "../lib/utils";
import { Button } from "./ui/button";
import { Input } from "./ui/input";
import { Badge } from "./ui/badge";
import {
Table,
TableBody,
TableCell,
TableHead,
TableHeader,
TableRow,
} from "./ui/table";
import { useScheduledMessages } from "../hooks/useScheduledMessages";
import {
scheduledMessageService,
type CreateScheduledMessage,
} from "../api/scheduledMessageService";
type Props = {
onClose: () => void;
};
const STATUS_BADGE: Record<string, "amber" | "default" | "destructive" | "muted"> = {
pending: "amber",
sent: "default",
failed: "destructive",
cancelled: "muted",
};
export const ScheduledMessagesPanel = ({ onClose }: Props) => {
const { messages, loading, refresh } = useScheduledMessages();
const [channel, setChannel] = useState<"imessage" | "email">("imessage");
const [recipient, setRecipient] = useState("");
const [subject, setSubject] = useState("");
const [content, setContent] = useState("");
const [scheduledAt, setScheduledAt] = useState("");
const [recurrence, setRecurrence] = useState<"none" | "daily" | "weekly" | "monthly">("none");
const [useAgent, setUseAgent] = useState(false);
const [error, setError] = useState("");
const [submitting, setSubmitting] = useState(false);
const handleCreate = async () => {
setError("");
if (!recipient || !content || !scheduledAt) {
setError("Recipient, content, and scheduled time are required.");
return;
}
if (channel === "email" && !subject) {
setError("Subject is required for email.");
return;
}
setSubmitting(true);
try {
const data: CreateScheduledMessage = {
recipient,
channel,
content,
scheduled_at: new Date(scheduledAt).toISOString(),
recurrence,
use_agent: useAgent,
};
if (channel === "email") data.subject = subject;
await scheduledMessageService.create(data);
setRecipient("");
setSubject("");
setContent("");
setScheduledAt("");
setRecurrence("none");
setUseAgent(false);
refresh();
} catch (err) {
setError(err instanceof Error ? err.message : "Failed to schedule message");
} finally {
setSubmitting(false);
}
};
const handleCancel = async (id: string) => {
try {
await scheduledMessageService.update(id, { status: "cancelled" });
refresh();
} catch {}
};
const handleDelete = async (id: string) => {
try {
await scheduledMessageService.remove(id);
refresh();
} catch {}
};
const handleRetry = async (id: string) => {
try {
const futureTime = new Date(Date.now() + 30_000).toISOString();
await scheduledMessageService.update(id, { scheduled_at: futureTime });
refresh();
} catch {}
};
return (
<div
className="fixed inset-0 z-50 flex items-center justify-center bg-charcoal/40 backdrop-blur-sm"
onClick={(e) => e.target === e.currentTarget && onClose()}
>
<div
className={cn(
"bg-warm-white rounded-3xl shadow-2xl shadow-charcoal/20",
"w-full max-w-3xl mx-4 max-h-[85vh] flex flex-col",
"border border-sand-light/60",
)}
>
{/* Header */}
<div className="flex items-center justify-between px-6 py-4 border-b border-sand-light/60">
<div className="flex items-center gap-2.5">
<div className="w-8 h-8 rounded-xl bg-amber-pale flex items-center justify-center">
<Clock size={14} className="text-amber-glow" />
</div>
<h2 className="text-sm font-semibold text-charcoal">
Scheduled Messages
</h2>
</div>
<button
onClick={onClose}
className="w-7 h-7 rounded-lg flex items-center justify-center text-warm-gray hover:text-charcoal hover:bg-cream-dark transition-colors cursor-pointer"
>
<X size={15} />
</button>
</div>
<div className="overflow-y-auto flex-1 rounded-b-3xl">
{/* Create form */}
<div className="px-6 py-5 border-b border-sand-light/60 space-y-3">
<div className="flex items-center gap-2">
<button
onClick={() => setChannel("imessage")}
className={cn(
"px-3 py-1.5 rounded-lg text-xs font-medium transition-colors cursor-pointer",
channel === "imessage"
? "bg-leaf-pale text-leaf-dark"
: "bg-sand-light/40 text-warm-gray hover:text-charcoal",
)}
>
iMessage
</button>
<button
onClick={() => setChannel("email")}
className={cn(
"px-3 py-1.5 rounded-lg text-xs font-medium transition-colors cursor-pointer",
channel === "email"
? "bg-leaf-pale text-leaf-dark"
: "bg-sand-light/40 text-warm-gray hover:text-charcoal",
)}
>
Email
</button>
</div>
<div className="flex gap-2">
<Input
value={recipient}
onChange={(e) => setRecipient(e.target.value)}
placeholder={channel === "imessage" ? "+15551234567" : "user@example.com"}
className="flex-1"
/>
<Input
type="datetime-local"
value={scheduledAt}
onChange={(e) => setScheduledAt(e.target.value)}
className="w-52"
/>
</div>
<div className="flex items-center gap-2">
<Repeat size={12} className="text-warm-gray" />
<span className="text-xs text-warm-gray">Repeat:</span>
{(["none", "daily", "weekly", "monthly"] as const).map((r) => (
<button
key={r}
onClick={() => setRecurrence(r)}
className={cn(
"px-2.5 py-1 rounded-lg text-xs font-medium transition-colors cursor-pointer",
recurrence === r
? "bg-amber-pale text-amber-glow"
: "bg-sand-light/40 text-warm-gray hover:text-charcoal",
)}
>
{r === "none" ? "Once" : r.charAt(0).toUpperCase() + r.slice(1)}
</button>
))}
</div>
<div className="flex items-center gap-2">
<button
onClick={() => setUseAgent(!useAgent)}
className={cn(
"flex items-center gap-1.5 px-3 py-1.5 rounded-lg text-xs font-medium transition-colors cursor-pointer",
useAgent
? "bg-leaf-pale text-leaf-dark"
: "bg-sand-light/40 text-warm-gray hover:text-charcoal",
)}
>
<Bot size={12} />
Ask Simba
</button>
<span className="text-xs text-warm-gray">
{useAgent ? "Content is a prompt — Simba's response will be sent" : "Content sent as-is"}
</span>
</div>
{channel === "email" && (
<Input
value={subject}
onChange={(e) => setSubject(e.target.value)}
placeholder="Subject"
/>
)}
<textarea
value={content}
onChange={(e) => setContent(e.target.value)}
placeholder={useAgent ? "Enter a prompt for Simba..." : "Message content..."}
rows={3}
className="w-full rounded-xl border border-sand bg-cream-light px-3 py-2 text-sm text-charcoal placeholder:text-warm-gray/50 focus:outline-none focus:ring-2 focus:ring-leaf/30 resize-none"
/>
{error && <p className="text-xs text-red-500">{error}</p>}
<Button onClick={handleCreate} disabled={submitting} size="sm">
<Send size={12} />
{submitting ? "Scheduling..." : "Schedule"}
</Button>
</div>
{/* Message list */}
{loading ? (
<div className="px-6 py-12 text-center text-warm-gray text-sm">
<div className="flex justify-center gap-1.5 mb-3">
<span className="loading-dot w-2 h-2 rounded-full bg-amber-soft inline-block" />
<span className="loading-dot w-2 h-2 rounded-full bg-amber-soft inline-block" />
<span className="loading-dot w-2 h-2 rounded-full bg-amber-soft inline-block" />
</div>
Loading...
</div>
) : messages.length === 0 ? (
<div className="px-6 py-12 text-center text-warm-gray text-sm">
No scheduled messages yet.
</div>
) : (
<Table>
<TableHeader>
<TableRow>
<TableHead>Channel</TableHead>
<TableHead>Recipient</TableHead>
<TableHead>Content</TableHead>
<TableHead>Scheduled</TableHead>
<TableHead>Repeat</TableHead>
<TableHead>Status</TableHead>
<TableHead className="w-28">Actions</TableHead>
</TableRow>
</TableHeader>
<TableBody>
{messages.map((msg) => (
<TableRow key={msg.id}>
<TableCell className="capitalize text-xs">
{msg.channel}
</TableCell>
<TableCell className="text-xs truncate max-w-[140px]" title={msg.recipient}>
{msg.recipient}
</TableCell>
<TableCell className="text-xs max-w-[180px]" title={msg.content}>
<div className="flex items-center gap-1">
{msg.use_agent && <Bot size={10} className="text-leaf-dark shrink-0" />}
<span className="truncate">
{msg.content.length > 60
? msg.content.slice(0, 60) + "..."
: msg.content}
</span>
</div>
</TableCell>
<TableCell className="text-xs text-warm-gray">
{new Date(msg.scheduled_at).toLocaleString()}
</TableCell>
<TableCell className="text-xs text-warm-gray capitalize">
{msg.recurrence === "none" ? "—" : msg.recurrence}
</TableCell>
<TableCell>
<Badge variant={STATUS_BADGE[msg.status]}>{msg.status}</Badge>
</TableCell>
<TableCell>
<div className="flex gap-1">
{msg.status === "pending" && (
<Button
size="sm"
variant="ghost-dark"
onClick={() => handleCancel(msg.id)}
title="Cancel"
>
<XCircle size={11} />
</Button>
)}
{msg.status === "failed" && (
<Button
size="sm"
variant="ghost-dark"
onClick={() => handleRetry(msg.id)}
title="Retry"
>
<RotateCcw size={11} />
</Button>
)}
{(msg.status === "pending" || msg.status === "cancelled") && (
<Button
size="sm"
variant="destructive"
onClick={() => handleDelete(msg.id)}
title="Delete"
>
<Trash2 size={11} />
</Button>
)}
</div>
</TableCell>
</TableRow>
))}
</TableBody>
</Table>
)}
</div>
</div>
</div>
);
};
+3 -2
View File
@@ -1,6 +1,7 @@
import React from "react";
import { cn } from "../lib/utils";
export const ToolBubble = ({ text }: { text: string }) => (
export const ToolBubble = React.memo(({ text }: { text: string }) => (
<div className="flex justify-center message-enter">
<div
className={cn(
@@ -12,4 +13,4 @@ export const ToolBubble = ({ text }: { text: string }) => (
{text}
</div>
</div>
);
));
@@ -9,6 +9,7 @@ const badgeVariants = cva(
default: "bg-leaf-pale text-leaf-dark border border-leaf-light/50",
amber: "bg-amber-pale text-amber-glow border border-amber-soft/40",
muted: "bg-sand-light/60 text-warm-gray border border-sand/40",
destructive: "bg-red-50 text-red-600 border border-red-200/50",
},
},
defaultVariants: {
+21
View File
@@ -0,0 +1,21 @@
import { useState, useEffect } from "react";
import { userService, type AdminUserRecord } from "../api/userService";
export function useAdminUsers() {
const [users, setUsers] = useState<AdminUserRecord[]>([]);
const [loading, setLoading] = useState(true);
useEffect(() => {
userService
.adminListUsers()
.then(setUsers)
.catch(() => {})
.finally(() => setLoading(false));
}, []);
const updateUser = (userId: string, updater: (u: AdminUserRecord) => AdminUserRecord) => {
setUsers((prev) => prev.map((u) => (u.id === userId ? updater(u) : u)));
};
return { users, loading, updateUser };
}
+37
View File
@@ -0,0 +1,37 @@
import { useState, useEffect } from "react";
import { userService } from "../api/userService";
export function useAuthCheck() {
const [isAuthenticated, setAuthenticated] = useState(false);
const [isChecking, setIsChecking] = useState(true);
const [isAdmin, setIsAdmin] = useState(false);
useEffect(() => {
const checkAuth = async () => {
const accessToken = localStorage.getItem("access_token");
const refreshToken = localStorage.getItem("refresh_token");
if (!accessToken && !refreshToken) {
setIsChecking(false);
setAuthenticated(false);
return;
}
try {
const me = await userService.getMe();
setAuthenticated(true);
setIsAdmin(me.is_admin);
} catch {
localStorage.removeItem("access_token");
localStorage.removeItem("refresh_token");
setAuthenticated(false);
} finally {
setIsChecking(false);
}
};
checkAuth();
}, []);
return { isAuthenticated, isChecking, isAdmin, setAuthenticated };
}
+183
View File
@@ -0,0 +1,183 @@
import { useState, useCallback, useEffect, useRef } from "react";
import { conversationService } from "../api/conversationService";
import type { Conversation } from "./useConversations";
type Message = {
text: string;
speaker: "simba" | "user" | "tool";
image_key?: string | null;
};
const TOOL_MESSAGES: Record<string, string> = {
simba_search: "Searching Simba's records...",
web_search: "Searching the web...",
get_current_date: "Checking today's date...",
ynab_budget_summary: "Checking budget summary...",
ynab_search_transactions: "Looking up transactions...",
ynab_category_spending: "Analyzing category spending...",
ynab_insights: "Generating budget insights...",
obsidian_search_notes: "Searching notes...",
obsidian_read_note: "Reading note...",
obsidian_create_note: "Saving note...",
obsidian_create_task: "Creating task...",
journal_get_today: "Reading today's journal...",
journal_get_tasks: "Getting tasks...",
journal_add_task: "Adding task...",
journal_complete_task: "Completing task...",
};
const simbaAnswers = ["meow.", "hiss...", "purrrrrr", "yowOWROWWowowr"];
type UseChatOptions = {
selectedConversation: Conversation | null;
createConversation: () => Promise<Conversation>;
refreshConversations: () => Promise<void>;
onSessionExpired: () => void;
scrollToBottom: () => void;
};
export function useChat({
selectedConversation,
createConversation,
refreshConversations,
onSessionExpired,
scrollToBottom,
}: UseChatOptions) {
const [messages, setMessages] = useState<Message[]>([]);
const [isLoading, setIsLoading] = useState(false);
const [pendingImage, setPendingImage] = useState<File | null>(null);
const isMountedRef = useRef(true);
const abortControllerRef = useRef<AbortController | null>(null);
useEffect(() => {
isMountedRef.current = true;
return () => {
isMountedRef.current = false;
abortControllerRef.current?.abort();
};
}, []);
const updateMessages = useCallback(
(updater: Message[] | ((prev: Message[]) => Message[])) => {
setMessages(updater);
scrollToBottom();
},
[scrollToBottom],
);
const sendMessage = useCallback(
async (query: string, simbaMode: boolean) => {
if ((!query.trim() && !pendingImage) || isLoading) return;
let activeConversation = selectedConversation;
let createdNew = false;
if (!activeConversation) {
activeConversation = await createConversation();
createdNew = true;
}
const imageFile = pendingImage;
updateMessages((prev) => prev.concat([{ text: query, speaker: "user" }]));
setPendingImage(null);
setIsLoading(true);
if (simbaMode) {
const randomElement =
simbaAnswers[Math.floor(Math.random() * simbaAnswers.length)];
updateMessages((prev) =>
prev.concat([{ text: randomElement, speaker: "simba" }]),
);
setIsLoading(false);
return;
}
const abortController = new AbortController();
abortControllerRef.current = abortController;
try {
let imageKey: string | undefined;
if (imageFile) {
const uploadResult = await conversationService.uploadImage(
imageFile,
activeConversation.id,
);
imageKey = uploadResult.image_key;
updateMessages((prev) => {
const updated = [...prev];
for (let i = updated.length - 1; i >= 0; i--) {
if (updated[i].speaker === "user") {
updated[i] = { ...updated[i], image_key: imageKey };
break;
}
}
return updated;
});
}
await conversationService.streamQuery(
query,
activeConversation.id,
(event) => {
if (!isMountedRef.current) return;
if (event.type === "tool_start") {
const friendly =
TOOL_MESSAGES[event.tool] ?? `Using ${event.tool}...`;
updateMessages((prev) =>
prev.concat([{ text: friendly, speaker: "tool" }]),
);
} else if (event.type === "response") {
updateMessages((prev) =>
prev.concat([{ text: event.message, speaker: "simba" }]),
);
} else if (event.type === "error") {
console.error("Stream error:", event.message);
}
},
abortController.signal,
imageKey,
);
} catch (error) {
if (error instanceof Error && error.name === "AbortError") {
console.log("Request was aborted");
} else {
console.error("Failed to send query:", error);
if (
error instanceof Error &&
error.message.includes("Session expired")
) {
onSessionExpired();
}
}
} finally {
if (isMountedRef.current) {
setIsLoading(false);
if (createdNew) {
refreshConversations();
}
}
abortControllerRef.current = null;
}
},
[
pendingImage,
isLoading,
selectedConversation,
createConversation,
refreshConversations,
onSessionExpired,
updateMessages,
],
);
return {
messages,
setMessages: updateMessages,
isLoading,
pendingImage,
setPendingImage,
sendMessage,
};
}
@@ -0,0 +1,69 @@
import { useState, useCallback, useEffect } from "react";
import { conversationService } from "../api/conversationService";
export type Conversation = {
title: string;
id: string;
};
type Message = {
text: string;
speaker: "simba" | "user" | "tool";
image_key?: string | null;
};
export function useConversations() {
const [conversations, setConversations] = useState<Conversation[]>([]);
const [selectedConversation, setSelectedConversation] =
useState<Conversation | null>(null);
const refreshConversations = useCallback(async () => {
try {
const fetched = await conversationService.getAllConversations();
setConversations(fetched.map((c) => ({ id: c.id, title: c.name })));
} catch (err) {
console.error("Failed to load conversations:", err);
}
}, []);
useEffect(() => {
refreshConversations();
}, [refreshConversations]);
const selectConversation = useCallback(
async (conversation: Conversation): Promise<Message[]> => {
setSelectedConversation(conversation);
try {
const fetched = await conversationService.getConversation(
conversation.id,
);
return fetched.messages.map((m) => ({
text: m.text,
speaker: m.speaker,
image_key: m.image_key,
}));
} catch (err) {
console.error("Failed to load messages:", err);
return [];
}
},
[],
);
const createConversation = useCallback(async (): Promise<Conversation> => {
const newConv = await conversationService.createConversation();
const conversation = { title: newConv.name, id: newConv.id };
setConversations((prev) => [conversation, ...prev]);
setSelectedConversation(conversation);
return conversation;
}, []);
return {
conversations,
selectedConversation,
setSelectedConversation,
selectConversation,
createConversation,
refreshConversations,
};
}
+59
View File
@@ -0,0 +1,59 @@
import { useState, useEffect } from "react";
import { userService } from "../api/userService";
import { oidcService } from "../api/oidcService";
type UseOIDCAuthOptions = {
setAuthenticated: (isAuth: boolean) => void;
};
export function useOIDCAuth({ setAuthenticated }: UseOIDCAuthOptions) {
const [isChecking, setIsChecking] = useState(true);
const [isLoggingIn, setIsLoggingIn] = useState(false);
const [error, setError] = useState("");
useEffect(() => {
const initAuth = async () => {
const callbackParams = oidcService.getCallbackParamsFromURL();
if (callbackParams) {
try {
setIsLoggingIn(true);
const result = await oidcService.handleCallback(
callbackParams.code,
callbackParams.state,
);
localStorage.setItem("access_token", result.access_token);
localStorage.setItem("refresh_token", result.refresh_token);
oidcService.clearCallbackParams();
setAuthenticated(true);
setIsChecking(false);
return;
} catch (err) {
console.error("OIDC callback error:", err);
setError("Login failed. Please try again.");
oidcService.clearCallbackParams();
setIsLoggingIn(false);
setIsChecking(false);
return;
}
}
const isValid = await userService.validateToken();
if (isValid) setAuthenticated(true);
setIsChecking(false);
};
initAuth();
}, [setAuthenticated]);
const handleLogin = async () => {
try {
setIsLoggingIn(true);
setError("");
const authUrl = await oidcService.initiateLogin();
window.location.href = authUrl;
} catch {
setError("Failed to initiate login. Please try again.");
setIsLoggingIn(false);
}
};
return { isChecking, isLoggingIn, error, handleLogin };
}
@@ -0,0 +1,34 @@
import { useState, useEffect } from "react";
import { conversationService } from "../api/conversationService";
const urlCache = new Map<string, string>();
export function usePresignedUrl(imageKey: string | null | undefined) {
const [imageUrl, setImageUrl] = useState<string | null>(
imageKey ? (urlCache.get(imageKey) ?? null) : null,
);
const [imageError, setImageError] = useState(false);
useEffect(() => {
if (!imageKey) return;
const cached = urlCache.get(imageKey);
if (cached) {
setImageUrl(cached);
return;
}
conversationService
.getPresignedImageUrl(imageKey)
.then((url) => {
urlCache.set(imageKey, url);
setImageUrl(url);
})
.catch((err) => {
console.error("Failed to load image:", err);
setImageError(true);
});
}, [imageKey]);
return { imageUrl, imageError };
}
@@ -0,0 +1,25 @@
import { useState, useEffect, useCallback } from "react";
import {
scheduledMessageService,
type ScheduledMessage,
} from "../api/scheduledMessageService";
export function useScheduledMessages() {
const [messages, setMessages] = useState<ScheduledMessage[]>([]);
const [loading, setLoading] = useState(true);
const refresh = useCallback(() => {
setLoading(true);
scheduledMessageService
.list()
.then(setMessages)
.catch(() => {})
.finally(() => setLoading(false));
}, []);
useEffect(() => {
refresh();
}, [refresh]);
return { messages, loading, refresh };
}
+70
View File
@@ -0,0 +1,70 @@
"""Link an iMessage phone number to an existing user account.
Usage:
python scripts/link_imessage.py <username_or_email> <phone_number>
python scripts/link_imessage.py ryan +15551234567
Run inside Docker:
docker compose exec raggr python scripts/link_imessage.py ryan +15551234567
"""
import os
import sys
import asyncio
from dotenv import load_dotenv
from tortoise import Tortoise
from blueprints.users.models import User
load_dotenv()
DATABASE_URL = os.getenv("DATABASE_URL", "sqlite://database/raggr.db")
async def link_imessage(identifier: str, phone_number: str):
await Tortoise.init(
db_url=DATABASE_URL,
modules={
"models": [
"blueprints.users.models",
"blueprints.conversation.models",
]
},
)
try:
user = await User.filter(username=identifier).first()
if not user:
user = await User.filter(email=identifier).first()
if not user:
print(f"Error: No user found with username or email '{identifier}'")
return False
conflict = (
await User.filter(imessage_number=phone_number).exclude(id=user.id).first()
)
if conflict:
print(
f"Error: {phone_number} is already linked to user '{conflict.username}'"
)
return False
user.imessage_number = phone_number
await user.save()
print(f"Linked {phone_number} to user '{user.username}' ({user.email})")
return True
finally:
await Tortoise.close_connections()
if __name__ == "__main__":
if len(sys.argv) != 3:
print(
"Usage: python scripts/link_imessage.py <username_or_email> <phone_number>"
)
sys.exit(1)
success = asyncio.run(link_imessage(sys.argv[1], sys.argv[2]))
sys.exit(0 if success else 1)
+8 -16
View File
@@ -6,19 +6,19 @@ import asyncio
import sys
from blueprints.rag.logic import (
delete_all_documents,
get_vector_store_stats,
index_documents,
list_all_documents,
vector_store,
)
def stats():
"""Show vector store statistics."""
stats = get_vector_store_stats()
s = get_vector_store_stats()
print("=== Vector Store Statistics ===")
print(f"Collection: {stats['collection_name']}")
print(f"Total Documents: {stats['total_documents']}")
print(f"Collection: {s['collection_name']}")
print(f"Total Documents: {s['total_documents']}")
async def index():
@@ -26,23 +26,15 @@ async def index():
print("Starting indexing process...")
print("Fetching documents from Paperless-NGX...")
await index_documents()
print("Indexing complete!")
print("Indexing complete!")
stats()
async def reindex():
"""Clear and reindex all documents."""
print("Clearing existing documents...")
collection = vector_store._collection
all_docs = collection.get()
if all_docs["ids"]:
print(f"Deleting {len(all_docs['ids'])} existing documents...")
collection.delete(ids=all_docs["ids"])
print("✓ Cleared")
else:
print("Collection is already empty")
delete_all_documents()
print("Cleared")
await index()
@@ -113,7 +105,7 @@ Examples:
print("\n\nOperation cancelled by user")
sys.exit(1)
except Exception as e:
print(f"\nError: {e}", file=sys.stderr)
print(f"\nError: {e}", file=sys.stderr)
sys.exit(1)
-24
View File
@@ -1,24 +0,0 @@
from bs4 import BeautifulSoup
import chromadb
import httpx
client = chromadb.PersistentClient(path="/Users/ryanchen/Programs/raggr/chromadb")
# Scrape
BASE_URL = "https://www.vet.cornell.edu"
LIST_URL = "/departments-centers-and-institutes/cornell-feline-health-center/health-information/feline-health-topics"
QUERY_URL = BASE_URL + LIST_URL
r = httpx.get(QUERY_URL)
soup = BeautifulSoup(r.text)
container = soup.find("div", class_="field-body")
a_s = container.find_all("a", href=True)
new_texts = []
for link in a_s:
endpoint = link["href"]
query_url = BASE_URL + endpoint
r2 = httpx.get(query_url)
article_soup = BeautifulSoup(r2.text)
-3
View File
@@ -1,9 +1,6 @@
#!/bin/bash
set -e
echo "Initializing directories..."
mkdir -p /app/data/chromadb
echo "Rebuilding frontend..."
cd /app/raggr-frontend
yarn build
+25 -2
View File
@@ -8,8 +8,31 @@ mkdir -p /app/data/obsidian
# Start continuous Obsidian sync if enabled
if [ "${OBSIDIAN_CONTINUOUS_SYNC}" = "true" ]; then
echo "Starting Obsidian continuous sync in background..."
ob sync --continuous &
if [ -z "${OBSIDIAN_EMAIL}" ] || [ -z "${OBSIDIAN_PASSWORD}" ] || [ -z "${OBSIDIAN_VAULT_ID}" ]; then
echo "WARNING: OBSIDIAN_EMAIL, OBSIDIAN_PASSWORD, or OBSIDIAN_VAULT_ID not set. Skipping sync."
else
echo "Setting up Obsidian sync..."
VAULT_PATH="${OBSIDIAN_VAULT_PATH:-/app/data/obsidian}"
# Login and setup sync (foreground, must complete before sync starts)
if ob login --email "${OBSIDIAN_EMAIL}" --password "${OBSIDIAN_PASSWORD}" && \
ob sync-setup \
--vault "${OBSIDIAN_VAULT_ID}" \
--path "${VAULT_PATH}" \
--password "${OBSIDIAN_E2E_PASSWORD}" \
--device-name "${OBSIDIAN_DEVICE_NAME:-simbarag}"; then
# Remove stale lock from previous container run
rm -rf "${VAULT_PATH}/.obsidian/.sync.lock"
# Set sync to pull-only (read-only) mode
ob sync-config --mode pull-only --path "${VAULT_PATH}"
# Start continuous sync in background
echo "Starting Obsidian continuous sync (pull-only)..."
ob sync --continuous --path "${VAULT_PATH}" &
else
echo "WARNING: Obsidian sync setup failed. Continuing without sync."
fi
fi
fi
echo "Starting application..."
-139
View File
@@ -1,139 +0,0 @@
"""Tests for text preprocessing functions in utils/chunker.py."""
from utils.chunker import (
remove_headers_footers,
remove_special_characters,
remove_repeated_substrings,
remove_extra_spaces,
preprocess_text,
)
class TestRemoveHeadersFooters:
def test_removes_default_header(self):
text = "Header Line\nActual content here"
result = remove_headers_footers(text)
assert "Header" not in result
assert "Actual content here" in result
def test_removes_default_footer(self):
text = "Actual content\nFooter Line"
result = remove_headers_footers(text)
assert "Footer" not in result
assert "Actual content" in result
def test_custom_patterns(self):
text = "PAGE 1\nContent\nCopyright 2024"
result = remove_headers_footers(
text,
header_patterns=[r"^PAGE \d+$"],
footer_patterns=[r"^Copyright.*$"],
)
assert "PAGE 1" not in result
assert "Copyright" not in result
assert "Content" in result
def test_no_match_preserves_text(self):
text = "Just normal content"
result = remove_headers_footers(text)
assert result == "Just normal content"
def test_empty_string(self):
assert remove_headers_footers("") == ""
class TestRemoveSpecialCharacters:
def test_removes_special_chars(self):
text = "Hello @world #test $100"
result = remove_special_characters(text)
assert "@" not in result
assert "#" not in result
assert "$" not in result
def test_preserves_allowed_chars(self):
text = "Hello, world! How's it going? Yes-no."
result = remove_special_characters(text)
assert "," in result
assert "!" in result
assert "'" in result
assert "?" in result
assert "-" in result
assert "." in result
def test_custom_pattern(self):
text = "keep @this but not #that"
result = remove_special_characters(text, special_chars=r"[#]")
assert "@this" in result
assert "#" not in result
def test_empty_string(self):
assert remove_special_characters("") == ""
class TestRemoveRepeatedSubstrings:
def test_collapses_dots(self):
text = "Item.....Value"
result = remove_repeated_substrings(text)
assert result == "Item.Value"
def test_single_dot_preserved(self):
text = "End of sentence."
result = remove_repeated_substrings(text)
assert result == "End of sentence."
def test_custom_pattern(self):
text = "hello---world"
result = remove_repeated_substrings(text, pattern=r"-{2,}")
# Function always replaces matched pattern with "."
assert result == "hello.world"
def test_empty_string(self):
assert remove_repeated_substrings("") == ""
class TestRemoveExtraSpaces:
def test_collapses_multiple_blank_lines(self):
text = "Line 1\n\n\n\nLine 2"
result = remove_extra_spaces(text)
# After collapsing newlines to \n\n, then \s+ collapses everything to single spaces
assert "\n\n\n" not in result
def test_collapses_multiple_spaces(self):
text = "Hello world"
result = remove_extra_spaces(text)
assert result == "Hello world"
def test_strips_whitespace(self):
text = " Hello world "
result = remove_extra_spaces(text)
assert result == "Hello world"
def test_empty_string(self):
assert remove_extra_spaces("") == ""
class TestPreprocessText:
def test_full_pipeline(self):
text = "Header Info\nHello @world... with spaces\nFooter Info"
result = preprocess_text(text)
assert "Header" not in result
assert "Footer" not in result
assert "@" not in result
assert "..." not in result
assert " " not in result
def test_preserves_meaningful_content(self):
text = "The cat weighs 10 pounds."
result = preprocess_text(text)
assert "cat" in result
assert "10" in result
assert "pounds" in result
def test_empty_string(self):
assert preprocess_text("") == ""
def test_already_clean(self):
text = "Simple clean text here."
result = preprocess_text(text)
assert "Simple" in result
assert "clean" in result
+4 -2
View File
@@ -93,13 +93,15 @@ class TestGetDailyNotePath:
def test_formats_path_correctly(self, service):
date = datetime(2026, 3, 15)
path = service.get_daily_note_path(date)
assert path == "journal/2026/2026-03-15.md"
assert path == "50 - Journal/2026/03/2026-03-15.md"
def test_defaults_to_today(self, service):
path = service.get_daily_note_path()
today = datetime.now()
assert today.strftime("%Y-%m-%d") in path
assert path.startswith(f"journal/{today.strftime('%Y')}/")
assert path.startswith(
f"50 - Journal/{today.strftime('%Y')}/{today.strftime('%m')}/"
)
class TestWalkVault:
-137
View File
@@ -1,137 +0,0 @@
import os
from math import ceil
import re
from typing import Union
from uuid import UUID, uuid4
from chromadb.utils.embedding_functions.openai_embedding_function import (
OpenAIEmbeddingFunction,
)
from dotenv import load_dotenv
from llm import LLMClient
load_dotenv()
def remove_headers_footers(text, header_patterns=None, footer_patterns=None):
if header_patterns is None:
header_patterns = [r"^.*Header.*$"]
if footer_patterns is None:
footer_patterns = [r"^.*Footer.*$"]
for pattern in header_patterns + footer_patterns:
text = re.sub(pattern, "", text, flags=re.MULTILINE)
return text.strip()
def remove_special_characters(text, special_chars=None):
if special_chars is None:
special_chars = r"[^A-Za-z0-9\s\.,;:\'\"\?\!\-]"
text = re.sub(special_chars, "", text)
return text.strip()
def remove_repeated_substrings(text, pattern=r"\.{2,}"):
text = re.sub(pattern, ".", text)
return text.strip()
def remove_extra_spaces(text):
text = re.sub(r"\n\s*\n", "\n\n", text)
text = re.sub(r"\s+", " ", text)
return text.strip()
def preprocess_text(text):
# Remove headers and footers
text = remove_headers_footers(text)
# Remove special characters
text = remove_special_characters(text)
# Remove repeated substrings like dots
text = remove_repeated_substrings(text)
# Remove extra spaces between lines and within lines
text = remove_extra_spaces(text)
# Additional cleaning steps can be added here
return text.strip()
class Chunk:
def __init__(
self,
text: str,
size: int,
document_id: UUID,
chunk_id: int,
embedding,
):
self.text = text
self.size = size
self.document_id = document_id
self.chunk_id = chunk_id
self.embedding = embedding
class Chunker:
def __init__(self, collection) -> None:
self.collection = collection
self.llm_client = LLMClient()
def embedding_fx(self, inputs):
openai_embedding_fx = OpenAIEmbeddingFunction(
api_key=os.getenv("OPENAI_API_KEY"),
model_name="text-embedding-3-small",
)
return openai_embedding_fx(inputs)
def chunk_document(
self,
document: str,
chunk_size: int = 1000,
metadata: dict[str, Union[str, float]] = {},
) -> list[Chunk]:
doc_uuid = uuid4()
chunk_size = min(chunk_size, len(document)) or 1
chunks = []
num_chunks = ceil(len(document) / chunk_size)
document_length = len(document)
for i in range(num_chunks):
curr_pos = i * num_chunks
to_pos = (
curr_pos + chunk_size
if curr_pos + chunk_size < document_length
else document_length
)
text_chunk = self.clean_document(document[curr_pos:to_pos])
embedding = self.embedding_fx([text_chunk])
self.collection.add(
ids=[str(doc_uuid) + ":" + str(i)],
documents=[text_chunk],
embeddings=embedding,
metadatas=[metadata],
)
return chunks
def clean_document(self, document: str) -> str:
"""This function will remove information that is noise or already known.
Example: We already know all the things in here are Simba-related, so we don't need things like
"Sumamry of simba's visit"
"""
document = document.replace("\\n", "")
document = document.strip()
return preprocess_text(document)
+55 -15
View File
@@ -61,7 +61,9 @@ class ObsidianService:
return md_files
def parse_markdown(self, content: str, filepath: Optional[Path] = None) -> dict[str, Any]:
def parse_markdown(
self, content: str, filepath: Optional[Path] = None
) -> dict[str, Any]:
"""Parse Obsidian markdown to extract metadata and clean content.
Args:
@@ -85,7 +87,7 @@ class ObsidianService:
if match:
frontmatter = match.group(1)
body_content = content[match.end():].strip()
body_content = content[match.end() :].strip()
try:
metadata = yaml.safe_load(frontmatter) or {}
except yaml.YAMLError:
@@ -104,8 +106,12 @@ class ObsidianService:
embeds = [e.split(":")[0].strip() if ":" in e else e.strip() for e in embeds]
# Clean body content
# Remove wikilinks [[...]] and embeds [[!...]]
cleaned_content = re.sub(r"\[\[.*?\]\]", "", body_content)
# Remove embeds ![[...]]
cleaned_content = re.sub(r"!\[\[.*?\]\]", "", body_content)
# Convert wikilinks to display text: [[target|display]] → display, [[target]] → target
cleaned_content = re.sub(
r"\[\[([^\]|]+\|)?([^\]]+)\]\]", r"\2", cleaned_content
)
cleaned_content = re.sub(r"\n{3,}", "\n\n", cleaned_content).strip()
return {
@@ -187,7 +193,9 @@ class ObsidianService:
default_frontmatter.setdefault("tags", []).extend(tags)
# Write note
frontmatter_yaml = yaml.dump(default_frontmatter, allow_unicode=True, default_flow_style=False)
frontmatter_yaml = yaml.dump(
default_frontmatter, allow_unicode=True, default_flow_style=False
)
full_content = f"---\n{frontmatter_yaml}---\n\n{content}"
with open(note_path, "w", encoding="utf-8") as f:
@@ -247,7 +255,7 @@ class ObsidianService:
"""
if date is None:
date = datetime.now()
return f"journal/{date.strftime('%Y')}/{date.strftime('%Y-%m-%d')}.md"
return f"50 - Journal/{date.strftime('%Y')}/{date.strftime('%m')}/{date.strftime('%Y-%m-%d')}.md"
def get_daily_note(self, date: Optional[datetime] = None) -> dict[str, Any]:
"""Read a daily note from the vault.
@@ -264,12 +272,22 @@ class ObsidianService:
note_path = Path(self.vault_path) / relative_path
if not note_path.exists():
return {"found": False, "path": relative_path, "content": None, "date": date.strftime("%Y-%m-%d")}
return {
"found": False,
"path": relative_path,
"content": None,
"date": date.strftime("%Y-%m-%d"),
}
with open(note_path, "r", encoding="utf-8") as f:
content = f.read()
return {"found": True, "path": relative_path, "content": content, "date": date.strftime("%Y-%m-%d")}
return {
"found": True,
"path": relative_path,
"content": content,
"date": date.strftime("%Y-%m-%d"),
}
def get_daily_tasks(self, date: Optional[datetime] = None) -> dict[str, Any]:
"""Extract tasks from a daily note's tasks section.
@@ -284,7 +302,12 @@ class ObsidianService:
date = datetime.now()
note = self.get_daily_note(date)
if not note["found"]:
return {"found": False, "tasks": [], "date": note["date"], "path": note["path"]}
return {
"found": False,
"tasks": [],
"date": note["date"],
"path": note["path"],
}
tasks = []
in_tasks = False
@@ -302,9 +325,16 @@ class ObsidianService:
elif todo_match:
tasks.append({"text": todo_match.group(1), "done": False})
return {"found": True, "tasks": tasks, "date": note["date"], "path": note["path"]}
return {
"found": True,
"tasks": tasks,
"date": note["date"],
"path": note["path"],
}
def add_task_to_daily_note(self, task_text: str, date: Optional[datetime] = None) -> dict[str, Any]:
def add_task_to_daily_note(
self, task_text: str, date: Optional[datetime] = None
) -> dict[str, Any]:
"""Add a task checkbox to a daily note, creating the note if needed.
Args:
@@ -336,7 +366,9 @@ class ObsidianService:
log_match = re.search(r"\n(### log)", content, re.IGNORECASE)
if log_match:
insert_pos = log_match.start()
content = content[:insert_pos] + f"\n- [ ] {task_text}" + content[insert_pos:]
content = (
content[:insert_pos] + f"\n- [ ] {task_text}" + content[insert_pos:]
)
else:
content = content.rstrip() + f"\n- [ ] {task_text}\n"
@@ -345,7 +377,9 @@ class ObsidianService:
return {"success": True, "created_note": False, "path": relative_path}
def complete_task_in_daily_note(self, task_text: str, date: Optional[datetime] = None) -> dict[str, Any]:
def complete_task_in_daily_note(
self, task_text: str, date: Optional[datetime] = None
) -> dict[str, Any]:
"""Mark a task as complete in a daily note by matching task text.
Searches for a task matching the given text (exact or partial) and
@@ -374,9 +408,15 @@ class ObsidianService:
if exact in content:
content = content.replace(exact, f"- [x] {task_text}", 1)
else:
match = re.search(r"- \[ \] .*" + re.escape(task_text) + r".*", content, re.IGNORECASE)
match = re.search(
r"- \[ \] .*" + re.escape(task_text) + r".*", content, re.IGNORECASE
)
if not match:
return {"success": False, "error": f"Task '{task_text}' not found", "path": relative_path}
return {
"success": False,
"error": f"Task '{task_text}' not found",
"path": relative_path,
}
completed = match.group(0).replace("- [ ]", "- [x]", 1)
content = content.replace(match.group(0), completed, 1)
task_text = match.group(0).replace("- [ ] ", "")
+35
View File
@@ -0,0 +1,35 @@
import re
def strip_markdown(text: str) -> str:
"""Strip markdown formatting from text for plain-text channels like iMessage."""
# Code blocks (fenced)
text = re.sub(
r"```[\s\S]*?```", lambda m: re.sub(r"```\w*\n?", "", m.group()), text
)
# Inline code
text = re.sub(r"`([^`]+)`", r"\1", text)
# Images
text = re.sub(r"!\[([^\]]*)\]\([^)]+\)", r"\1", text)
# Links — keep the link text
text = re.sub(r"\[([^\]]+)\]\([^)]+\)", r"\1", text)
# Bold/italic (order matters: bold+italic first)
text = re.sub(r"\*\*\*(.+?)\*\*\*", r"\1", text)
text = re.sub(r"\*\*(.+?)\*\*", r"\1", text)
text = re.sub(r"\*(.+?)\*", r"\1", text)
text = re.sub(r"___(.+?)___", r"\1", text)
text = re.sub(r"__(.+?)__", r"\1", text)
text = re.sub(r"_(.+?)_", r"\1", text)
# Headers
text = re.sub(r"^#{1,6}\s+", "", text, flags=re.MULTILINE)
# Horizontal rules
text = re.sub(r"^[-*_]{3,}\s*$", "", text, flags=re.MULTILINE)
# Bullet lists — remove the bullet marker
text = re.sub(r"^[\s]*[-*+]\s+", "", text, flags=re.MULTILINE)
# Numbered lists — remove the number marker
text = re.sub(r"^[\s]*\d+\.\s+", "", text, flags=re.MULTILINE)
# Blockquotes
text = re.sub(r"^>\s?", "", text, flags=re.MULTILINE)
# Collapse multiple blank lines
text = re.sub(r"\n{3,}", "\n\n", text)
return text.strip()
Generated
+92 -995
View File
File diff suppressed because it is too large Load Diff