docs(01): create phase plan

Phase 01: Foundation
- 2 plan(s) in 2 wave(s)
- 1 parallel, 1 sequential
- Ready for execution
This commit is contained in:
2026-02-07 13:35:48 -05:00
parent 126b53f17d
commit 800c6fef7f
3 changed files with 507 additions and 3 deletions

View File

@@ -0,0 +1,208 @@
---
phase: 01-foundation
plan: 01
type: execute
wave: 1
depends_on: []
files_modified:
- blueprints/email/__init__.py
- blueprints/email/models.py
- blueprints/email/crypto_service.py
- .env.example
- migrations/models/XX_YYYYMMDDHHMMSS_add_email_tables.py
autonomous: true
must_haves:
truths:
- "Database tables for email_accounts, email_sync_status, and emails exist in PostgreSQL"
- "IMAP credentials are encrypted when stored and decrypted when retrieved"
- "Fernet encryption key can be generated and validated on app startup"
artifacts:
- path: "blueprints/email/models.py"
provides: "EmailAccount, EmailSyncStatus, Email Tortoise ORM models"
min_lines: 80
contains: "class EmailAccount(Model)"
- path: "blueprints/email/crypto_service.py"
provides: "EncryptedTextField and Fernet key validation"
min_lines: 40
exports: ["EncryptedTextField", "validate_fernet_key"]
- path: ".env.example"
provides: "FERNET_KEY environment variable example"
contains: "FERNET_KEY="
- path: "migrations/models/"
provides: "Database migration for email tables"
pattern: "*_add_email_tables.py"
key_links:
- from: "blueprints/email/models.py"
to: "blueprints/email/crypto_service.py"
via: "EncryptedTextField import"
pattern: "from.*crypto_service import EncryptedTextField"
- from: "blueprints/email/models.py"
to: "blueprints/users/models.py"
via: "ForeignKeyField to User"
pattern: 'fields\\.ForeignKeyField\\("models\\.User"'
---
<objective>
Establish database foundation and credential encryption for email ingestion system.
Purpose: Create the data layer that stores email account configuration, sync tracking, and email metadata. Implement secure credential storage using Fernet symmetric encryption so IMAP passwords can be safely stored and retrieved.
Output: Tortoise ORM models for email entities, encrypted password field implementation, database migration, and environment configuration.
</objective>
<execution_context>
@/Users/ryanchen/.claude/get-shit-done/workflows/execute-plan.md
@/Users/ryanchen/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/01-foundation/01-RESEARCH.md
@blueprints/users/models.py
@blueprints/conversation/models.py
@.env.example
</context>
<tasks>
<task type="auto">
<name>Task 1: Create email blueprint with encrypted Tortoise ORM models</name>
<files>
blueprints/email/__init__.py
blueprints/email/models.py
blueprints/email/crypto_service.py
</files>
<action>
Create `blueprints/email/` directory with three files following existing blueprint patterns:
**1. crypto_service.py** - Implement Fernet encryption for credentials:
- Create `EncryptedTextField` class extending `fields.TextField`
- Override `to_db_value()` to encrypt strings before database storage
- Override `to_python_value()` to decrypt strings when loading from database
- Load FERNET_KEY from environment variable in `__init__`
- Raise ValueError if FERNET_KEY is missing or invalid
- Add `validate_fernet_key()` function that tests encrypt/decrypt cycle
- Follow pattern from RESEARCH.md Example 2 (line 581-619)
**2. models.py** - Create three Tortoise ORM models following existing patterns:
`EmailAccount`:
- UUIDField primary key
- ForeignKeyField to models.User (related_name="email_accounts")
- email_address CharField(255) unique
- display_name CharField(255) nullable
- imap_host CharField(255)
- imap_port IntField default=993
- imap_username CharField(255)
- imap_password EncryptedTextField() - transparently encrypted
- is_active BooleanField default=True
- last_error TextField nullable
- created_at/updated_at DatetimeField with auto_now_add/auto_now
- Meta: table = "email_accounts"
`EmailSyncStatus`:
- UUIDField primary key
- ForeignKeyField to EmailAccount (related_name="sync_status", unique=True)
- last_sync_date DatetimeField nullable
- last_message_uid IntField default=0
- message_count IntField default=0
- consecutive_failures IntField default=0
- last_failure_date DatetimeField nullable
- updated_at DatetimeField auto_now
- Meta: table = "email_sync_status"
`Email`:
- UUIDField primary key
- ForeignKeyField to EmailAccount (related_name="emails")
- message_id CharField(255) unique, indexed (RFC822 Message-ID)
- subject CharField(500)
- from_address CharField(255)
- to_address TextField
- date DatetimeField
- body_text TextField nullable
- body_html TextField nullable
- chromadb_doc_id CharField(255) nullable
- created_at DatetimeField auto_now_add
- expires_at DatetimeField (auto-set to created_at + 30 days)
- Override async save() to auto-set expires_at if not set
- Meta: table = "emails"
Follow conventions from blueprints/conversation/models.py and blueprints/users/models.py.
**3. __init__.py** - Create empty blueprint registration file:
- Create Quart Blueprint named "email_blueprint" with url_prefix="/api/email"
- Import models for Tortoise ORM registration
- Add comment: "Routes will be added in Phase 2"
Use imports matching existing patterns: `from tortoise import fields`, `from tortoise.models import Model`.
</action>
<verify>
- `cat blueprints/email/crypto_service.py` shows EncryptedTextField class with to_db_value/to_python_value methods
- `cat blueprints/email/models.py` shows three model classes with correct field definitions
- `python -c "from blueprints.email.models import EmailAccount, EmailSyncStatus, Email; print('Models import OK')"` succeeds
- `grep -r "EncryptedTextField" blueprints/email/models.py` shows import and usage in EmailAccount.imap_password
</verify>
<done>Three model files exist with EmailAccount having encrypted password field, all models follow Tortoise ORM conventions, imports resolve without errors</done>
</task>
<task type="auto">
<name>Task 2: Add FERNET_KEY to environment configuration and generate migration</name>
<files>
.env.example
migrations/models/XX_YYYYMMDDHHMMSS_add_email_tables.py
</files>
<action>
**1. Update .env.example:**
- Add section header: `# Email Integration`
- Add FERNET_KEY with generation instructions:
```
# Email Encryption Key (32-byte URL-safe base64)
# Generate with: python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
FERNET_KEY=your-fernet-key-here
```
**2. Generate Aerich migration:**
Run `aerich migrate --name add_email_tables` inside Docker container to create migration for email_accounts, email_sync_status, and emails tables.
The migration will be auto-generated based on the Tortoise ORM models defined in Task 1.
If Docker environment not running, use: `docker compose -f docker-compose.dev.yml exec raggr aerich migrate --name add_email_tables`
Verify migration file created in migrations/models/ with timestamp prefix.
</action>
<verify>
- `grep FERNET_KEY .env.example` shows encryption key configuration
- `ls migrations/models/*_add_email_tables.py` shows migration file exists
- `cat migrations/models/*_add_email_tables.py` shows CREATE TABLE statements for email_accounts, email_sync_status, emails
</verify>
<done>FERNET_KEY documented in .env.example with generation command, migration file exists with email table definitions</done>
</task>
</tasks>
<verification>
After task completion:
1. Run `python -c "from blueprints.email.crypto_service import validate_fernet_key; import os; os.environ['FERNET_KEY']='test'; validate_fernet_key()"` - should raise ValueError for invalid key
2. Run `python -c "from cryptography.fernet import Fernet; import os; os.environ['FERNET_KEY']=Fernet.generate_key().decode(); from blueprints.email.crypto_service import validate_fernet_key; validate_fernet_key(); print('✓ Encryption validated')"` - should succeed
3. Check `aerich history` shows new migration in list
4. Run `aerich upgrade` to apply migration (creates tables in database)
5. Verify tables exist: `docker compose -f docker-compose.dev.yml exec postgres psql -U raggr -d raggr -c "\dt email*"` - should list three tables
</verification>
<success_criteria>
- EmailAccount model has encrypted imap_password field that uses EncryptedTextField
- EmailSyncStatus model tracks last sync state with unique foreign key to EmailAccount
- Email model stores message metadata with 30-day expiration logic in save()
- EncryptedTextField transparently encrypts/decrypts using Fernet
- validate_fernet_key() function can detect invalid or missing keys
- Database migration exists and can create three email tables
- .env.example documents FERNET_KEY with generation command
- All models follow existing codebase conventions (snake_case, async patterns, field types)
</success_criteria>
<output>
After completion, create `.planning/phases/01-foundation/01-01-SUMMARY.md`
</output>

View File

@@ -0,0 +1,295 @@
---
phase: 01-foundation
plan: 02
type: execute
wave: 2
depends_on: ["01-01"]
files_modified:
- blueprints/email/imap_service.py
- blueprints/email/parser_service.py
- pyproject.toml
autonomous: true
must_haves:
truths:
- "IMAP service can connect to mail server and authenticate with credentials"
- "IMAP service can list mailbox folders and return parsed folder names"
- "Email parser extracts plain text and HTML bodies from multipart messages"
- "Email parser handles emails with only text, only HTML, or both formats"
artifacts:
- path: "blueprints/email/imap_service.py"
provides: "IMAP connection and folder listing"
min_lines: 60
exports: ["IMAPService"]
- path: "blueprints/email/parser_service.py"
provides: "Email body parsing from RFC822 bytes"
min_lines: 50
exports: ["parse_email_body"]
- path: "pyproject.toml"
provides: "aioimaplib and html2text dependencies"
contains: "aioimaplib"
key_links:
- from: "blueprints/email/imap_service.py"
to: "aioimaplib.IMAP4_SSL"
via: "import and instantiation"
pattern: "from aioimaplib import IMAP4_SSL"
- from: "blueprints/email/parser_service.py"
to: "email.message_from_bytes"
via: "stdlib email module"
pattern: "from email import message_from_bytes"
- from: "blueprints/email/imap_service.py"
to: "blueprints/email/models.EmailAccount"
via: "type hints for account parameter"
pattern: "account: EmailAccount"
---
<objective>
Build IMAP connection utility and email parsing service for retrieving and processing email messages.
Purpose: Create the integration layer that communicates with IMAP mail servers and parses RFC822 email format into usable text content. These services enable the system to fetch emails and extract meaningful text for RAG indexing.
Output: IMAPService class with async connection handling, folder listing, and proper cleanup. Email parsing function that extracts text/HTML bodies from multipart MIME messages.
</objective>
<execution_context>
@/Users/ryanchen/.claude/get-shit-done/workflows/execute-plan.md
@/Users/ryanchen/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/01-foundation/01-RESEARCH.md
@.planning/phases/01-foundation/01-01-SUMMARY.md
@blueprints/email/models.py
@utils/ynab_service.py
@utils/mealie_service.py
</context>
<tasks>
<task type="auto">
<name>Task 1: Implement IMAP connection service with authentication and folder listing</name>
<files>
blueprints/email/imap_service.py
pyproject.toml
</files>
<action>
**1. Add dependencies to pyproject.toml:**
- Add to `dependencies` array: `"aioimaplib>=2.0.1"` and `"html2text>=2025.4.15"`
- Run `pip install aioimaplib html2text` to install
**2. Create imap_service.py with IMAPService class:**
Implement async IMAP client following patterns from RESEARCH.md (lines 116-188, 494-577):
```python
import asyncio
import logging
from typing import Optional
from aioimaplib import IMAP4_SSL
logger = logging.getLogger(__name__)
class IMAPService:
"""Async IMAP client for email operations."""
async def connect(
self,
host: str,
username: str,
password: str,
port: int = 993,
timeout: int = 10
) -> IMAP4_SSL:
"""
Establish IMAP connection with authentication.
Returns authenticated IMAP4_SSL client.
Raises exception on connection or auth failure.
Must call close() to properly disconnect.
"""
# Create connection with timeout
# Wait for server greeting
# Authenticate with login()
# Return authenticated client
# On failure: call logout() and raise
async def list_folders(self, imap: IMAP4_SSL) -> list[str]:
"""
List all mailbox folders.
Returns list of folder names (e.g., ["INBOX", "Sent", "Drafts"]).
"""
# Call imap.list('""', '*')
# Parse LIST response lines
# Extract folder names from response format: (* LIST (...) "/" "INBOX")
# Return cleaned folder names
async def close(self, imap: IMAP4_SSL) -> None:
"""
Properly close IMAP connection.
CRITICAL: Must use logout(), not close().
close() only closes mailbox, logout() closes TCP connection.
"""
# Try/except for best-effort cleanup
# Call await imap.logout()
```
Key implementation details:
- Import `IMAP4_SSL` from aioimaplib
- Use `await imap.wait_hello_from_server()` after instantiation
- Use `await imap.login(username, password)` for authentication
- Always call `logout()` not `close()` to close TCP connection
- Handle connection errors with try/except and logger.error
- Use logger with prefix `[IMAP]` for operations and `[IMAP ERROR]` for failures
- Follow async patterns from existing service classes (ynab_service.py, mealie_service.py)
**Anti-patterns to avoid** (from RESEARCH.md lines 331-339):
- Don't use imap.close() for disconnect (only closes mailbox)
- Don't share connections across tasks (not thread-safe)
- Always logout() in finally block for cleanup
</action>
<verify>
- `cat blueprints/email/imap_service.py` shows IMAPService class with connect/list_folders/close methods
- `python -c "from blueprints.email.imap_service import IMAPService; print('✓ IMAPService imports')"` succeeds
- `grep "await imap.logout()" blueprints/email/imap_service.py` shows proper cleanup
- `grep "aioimaplib" pyproject.toml` shows dependency added
</verify>
<done>IMAPService class exists with async connect/list_folders/close methods, uses aioimaplib correctly with logout() for cleanup, dependencies added to pyproject.toml</done>
</task>
<task type="auto">
<name>Task 2: Create email body parser for multipart MIME messages</name>
<files>
blueprints/email/parser_service.py
</files>
<action>
Create parser_service.py with email parsing function following RESEARCH.md patterns (lines 190-239, 494-577):
```python
import logging
from email import message_from_bytes
from email.policy import default
from email.utils import parsedate_to_datetime
from typing import Optional
import html2text
logger = logging.getLogger(__name__)
def parse_email_body(raw_email_bytes: bytes) -> dict:
"""
Extract text and HTML bodies from RFC822 email bytes.
Args:
raw_email_bytes: Raw email message bytes from IMAP FETCH
Returns:
Dictionary with keys:
- "text": Plain text body (None if not present)
- "html": HTML body (None if not present)
- "preferred": Best available body (text preferred, HTML converted if text missing)
- "subject": Email subject
- "from": Sender address
- "to": Recipient address(es)
- "date": Parsed datetime object
- "message_id": RFC822 Message-ID header
"""
# Parse with modern EmailMessage API and default policy
# Use msg.get_body(preferencelist=('plain',)) for text part
# Use msg.get_body(preferencelist=('html',)) for HTML part
# Call get_content() on parts for proper decoding (not get_payload())
# If text exists: preferred = text
# If text missing and HTML exists: convert HTML to text with html2text
# Extract metadata: subject, from, to, date, message-id
# Use parsedate_to_datetime() for date parsing
# Return dictionary with all fields
```
Implementation details:
- Use `message_from_bytes(raw_email_bytes, policy=default)` for modern API
- Use `msg.get_body(preferencelist=(...))` to handle multipart/alternative correctly
- Call `part.get_content()` not `part.get_payload()` for proper decoding (handles encoding automatically)
- For HTML conversion: `h = html2text.HTML2Text(); h.ignore_links = False; text = h.handle(html_body)`
- Handle missing headers gracefully: `msg.get("header-name", "")` returns empty string if missing
- Use `parsedate_to_datetime()` from email.utils to parse Date header into datetime object
- Log errors with `[EMAIL PARSER]` prefix
- Handle UnicodeDecodeError by logging and returning partial data
**Key insight from RESEARCH.md** (line 389-399):
- Use `email.policy.default` for modern encoding handling
- Call `get_content()` not `get_payload()` to avoid encoding issues
- Prefer plain text over HTML for RAG indexing (less boilerplate)
Follow function signature and return type from RESEARCH.md Example 3 (lines 196-238).
</action>
<verify>
- `cat blueprints/email/parser_service.py` shows parse_email_body function
- `python -c "from blueprints.email.parser_service import parse_email_body; print('✓ Parser imports')"` succeeds
- `grep "message_from_bytes" blueprints/email/parser_service.py` shows stdlib email module usage
- `grep "get_body" blueprints/email/parser_service.py` shows modern EmailMessage API usage
- `grep "html2text" blueprints/email/parser_service.py` shows HTML conversion
</verify>
<done>parse_email_body function exists, extracts text/HTML bodies using modern email.message API, converts HTML to text when needed, returns complete metadata dictionary</done>
</task>
</tasks>
<verification>
After task completion:
1. Test IMAP connection (requires test IMAP server or skip):
```python
from blueprints.email.imap_service import IMAPService
import asyncio
async def test():
service = IMAPService()
# Connect to test server (e.g., imap.gmail.com)
# Test will be done in Phase 2 with real accounts
print("✓ IMAPService ready for testing")
asyncio.run(test())
```
2. Test email parsing with sample RFC822 message:
```python
from blueprints.email.parser_service import parse_email_body
# Create minimal RFC822 message
sample = b"""From: sender@example.com
To: recipient@example.com
Subject: Test Email
Message-ID: <test123@example.com>
Date: Mon, 7 Feb 2026 10:00:00 -0800
Content-Type: text/plain; charset="utf-8"
This is the email body.
"""
result = parse_email_body(sample)
assert result["subject"] == "Test Email"
assert "email body" in result["text"]
assert result["preferred"] is not None
print("✓ Email parsing works")
```
3. Verify dependencies installed: `pip list | grep -E "(aioimaplib|html2text)"` shows both packages
</verification>
<success_criteria>
- IMAPService can establish connection with host/username/password/port parameters
- IMAPService.connect() returns authenticated IMAP4_SSL client
- IMAPService.list_folders() parses IMAP LIST response and returns folder names
- IMAPService.close() calls logout() for proper TCP cleanup
- parse_email_body() extracts text and HTML bodies from RFC822 bytes
- parse_email_body() prefers plain text over HTML for "preferred" field
- parse_email_body() converts HTML to text using html2text when text body missing
- parse_email_body() extracts all metadata: subject, from, to, date, message_id
- Both services follow async patterns and logging conventions from existing codebase
- Dependencies (aioimaplib, html2text) added to pyproject.toml and installed
</success_criteria>
<output>
After completion, create `.planning/phases/01-foundation/01-02-SUMMARY.md`
</output>