Phase 01: Foundation - 2 plan(s) in 2 wave(s) - 1 parallel, 1 sequential - Ready for execution
296 lines
11 KiB
Markdown
296 lines
11 KiB
Markdown
---
|
|
phase: 01-foundation
|
|
plan: 02
|
|
type: execute
|
|
wave: 2
|
|
depends_on: ["01-01"]
|
|
files_modified:
|
|
- blueprints/email/imap_service.py
|
|
- blueprints/email/parser_service.py
|
|
- pyproject.toml
|
|
autonomous: true
|
|
|
|
must_haves:
|
|
truths:
|
|
- "IMAP service can connect to mail server and authenticate with credentials"
|
|
- "IMAP service can list mailbox folders and return parsed folder names"
|
|
- "Email parser extracts plain text and HTML bodies from multipart messages"
|
|
- "Email parser handles emails with only text, only HTML, or both formats"
|
|
artifacts:
|
|
- path: "blueprints/email/imap_service.py"
|
|
provides: "IMAP connection and folder listing"
|
|
min_lines: 60
|
|
exports: ["IMAPService"]
|
|
- path: "blueprints/email/parser_service.py"
|
|
provides: "Email body parsing from RFC822 bytes"
|
|
min_lines: 50
|
|
exports: ["parse_email_body"]
|
|
- path: "pyproject.toml"
|
|
provides: "aioimaplib and html2text dependencies"
|
|
contains: "aioimaplib"
|
|
key_links:
|
|
- from: "blueprints/email/imap_service.py"
|
|
to: "aioimaplib.IMAP4_SSL"
|
|
via: "import and instantiation"
|
|
pattern: "from aioimaplib import IMAP4_SSL"
|
|
- from: "blueprints/email/parser_service.py"
|
|
to: "email.message_from_bytes"
|
|
via: "stdlib email module"
|
|
pattern: "from email import message_from_bytes"
|
|
- from: "blueprints/email/imap_service.py"
|
|
to: "blueprints/email/models.EmailAccount"
|
|
via: "type hints for account parameter"
|
|
pattern: "account: EmailAccount"
|
|
---
|
|
|
|
<objective>
|
|
Build IMAP connection utility and email parsing service for retrieving and processing email messages.
|
|
|
|
Purpose: Create the integration layer that communicates with IMAP mail servers and parses RFC822 email format into usable text content. These services enable the system to fetch emails and extract meaningful text for RAG indexing.
|
|
|
|
Output: IMAPService class with async connection handling, folder listing, and proper cleanup. Email parsing function that extracts text/HTML bodies from multipart MIME messages.
|
|
</objective>
|
|
|
|
<execution_context>
|
|
@/Users/ryanchen/.claude/get-shit-done/workflows/execute-plan.md
|
|
@/Users/ryanchen/.claude/get-shit-done/templates/summary.md
|
|
</execution_context>
|
|
|
|
<context>
|
|
@.planning/PROJECT.md
|
|
@.planning/ROADMAP.md
|
|
@.planning/phases/01-foundation/01-RESEARCH.md
|
|
@.planning/phases/01-foundation/01-01-SUMMARY.md
|
|
@blueprints/email/models.py
|
|
@utils/ynab_service.py
|
|
@utils/mealie_service.py
|
|
</context>
|
|
|
|
<tasks>
|
|
|
|
<task type="auto">
|
|
<name>Task 1: Implement IMAP connection service with authentication and folder listing</name>
|
|
<files>
|
|
blueprints/email/imap_service.py
|
|
pyproject.toml
|
|
</files>
|
|
<action>
|
|
**1. Add dependencies to pyproject.toml:**
|
|
- Add to `dependencies` array: `"aioimaplib>=2.0.1"` and `"html2text>=2025.4.15"`
|
|
- Run `pip install aioimaplib html2text` to install
|
|
|
|
**2. Create imap_service.py with IMAPService class:**
|
|
|
|
Implement async IMAP client following patterns from RESEARCH.md (lines 116-188, 494-577):
|
|
|
|
```python
|
|
import asyncio
|
|
import logging
|
|
from typing import Optional
|
|
from aioimaplib import IMAP4_SSL
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
class IMAPService:
|
|
"""Async IMAP client for email operations."""
|
|
|
|
async def connect(
|
|
self,
|
|
host: str,
|
|
username: str,
|
|
password: str,
|
|
port: int = 993,
|
|
timeout: int = 10
|
|
) -> IMAP4_SSL:
|
|
"""
|
|
Establish IMAP connection with authentication.
|
|
|
|
Returns authenticated IMAP4_SSL client.
|
|
Raises exception on connection or auth failure.
|
|
Must call close() to properly disconnect.
|
|
"""
|
|
# Create connection with timeout
|
|
# Wait for server greeting
|
|
# Authenticate with login()
|
|
# Return authenticated client
|
|
# On failure: call logout() and raise
|
|
|
|
async def list_folders(self, imap: IMAP4_SSL) -> list[str]:
|
|
"""
|
|
List all mailbox folders.
|
|
|
|
Returns list of folder names (e.g., ["INBOX", "Sent", "Drafts"]).
|
|
"""
|
|
# Call imap.list('""', '*')
|
|
# Parse LIST response lines
|
|
# Extract folder names from response format: (* LIST (...) "/" "INBOX")
|
|
# Return cleaned folder names
|
|
|
|
async def close(self, imap: IMAP4_SSL) -> None:
|
|
"""
|
|
Properly close IMAP connection.
|
|
|
|
CRITICAL: Must use logout(), not close().
|
|
close() only closes mailbox, logout() closes TCP connection.
|
|
"""
|
|
# Try/except for best-effort cleanup
|
|
# Call await imap.logout()
|
|
```
|
|
|
|
Key implementation details:
|
|
- Import `IMAP4_SSL` from aioimaplib
|
|
- Use `await imap.wait_hello_from_server()` after instantiation
|
|
- Use `await imap.login(username, password)` for authentication
|
|
- Always call `logout()` not `close()` to close TCP connection
|
|
- Handle connection errors with try/except and logger.error
|
|
- Use logger with prefix `[IMAP]` for operations and `[IMAP ERROR]` for failures
|
|
- Follow async patterns from existing service classes (ynab_service.py, mealie_service.py)
|
|
|
|
**Anti-patterns to avoid** (from RESEARCH.md lines 331-339):
|
|
- Don't use imap.close() for disconnect (only closes mailbox)
|
|
- Don't share connections across tasks (not thread-safe)
|
|
- Always logout() in finally block for cleanup
|
|
</action>
|
|
<verify>
|
|
- `cat blueprints/email/imap_service.py` shows IMAPService class with connect/list_folders/close methods
|
|
- `python -c "from blueprints.email.imap_service import IMAPService; print('✓ IMAPService imports')"` succeeds
|
|
- `grep "await imap.logout()" blueprints/email/imap_service.py` shows proper cleanup
|
|
- `grep "aioimaplib" pyproject.toml` shows dependency added
|
|
</verify>
|
|
<done>IMAPService class exists with async connect/list_folders/close methods, uses aioimaplib correctly with logout() for cleanup, dependencies added to pyproject.toml</done>
|
|
</task>
|
|
|
|
<task type="auto">
|
|
<name>Task 2: Create email body parser for multipart MIME messages</name>
|
|
<files>
|
|
blueprints/email/parser_service.py
|
|
</files>
|
|
<action>
|
|
Create parser_service.py with email parsing function following RESEARCH.md patterns (lines 190-239, 494-577):
|
|
|
|
```python
|
|
import logging
|
|
from email import message_from_bytes
|
|
from email.policy import default
|
|
from email.utils import parsedate_to_datetime
|
|
from typing import Optional
|
|
import html2text
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
def parse_email_body(raw_email_bytes: bytes) -> dict:
|
|
"""
|
|
Extract text and HTML bodies from RFC822 email bytes.
|
|
|
|
Args:
|
|
raw_email_bytes: Raw email message bytes from IMAP FETCH
|
|
|
|
Returns:
|
|
Dictionary with keys:
|
|
- "text": Plain text body (None if not present)
|
|
- "html": HTML body (None if not present)
|
|
- "preferred": Best available body (text preferred, HTML converted if text missing)
|
|
- "subject": Email subject
|
|
- "from": Sender address
|
|
- "to": Recipient address(es)
|
|
- "date": Parsed datetime object
|
|
- "message_id": RFC822 Message-ID header
|
|
"""
|
|
# Parse with modern EmailMessage API and default policy
|
|
# Use msg.get_body(preferencelist=('plain',)) for text part
|
|
# Use msg.get_body(preferencelist=('html',)) for HTML part
|
|
# Call get_content() on parts for proper decoding (not get_payload())
|
|
# If text exists: preferred = text
|
|
# If text missing and HTML exists: convert HTML to text with html2text
|
|
# Extract metadata: subject, from, to, date, message-id
|
|
# Use parsedate_to_datetime() for date parsing
|
|
# Return dictionary with all fields
|
|
```
|
|
|
|
Implementation details:
|
|
- Use `message_from_bytes(raw_email_bytes, policy=default)` for modern API
|
|
- Use `msg.get_body(preferencelist=(...))` to handle multipart/alternative correctly
|
|
- Call `part.get_content()` not `part.get_payload()` for proper decoding (handles encoding automatically)
|
|
- For HTML conversion: `h = html2text.HTML2Text(); h.ignore_links = False; text = h.handle(html_body)`
|
|
- Handle missing headers gracefully: `msg.get("header-name", "")` returns empty string if missing
|
|
- Use `parsedate_to_datetime()` from email.utils to parse Date header into datetime object
|
|
- Log errors with `[EMAIL PARSER]` prefix
|
|
- Handle UnicodeDecodeError by logging and returning partial data
|
|
|
|
**Key insight from RESEARCH.md** (line 389-399):
|
|
- Use `email.policy.default` for modern encoding handling
|
|
- Call `get_content()` not `get_payload()` to avoid encoding issues
|
|
- Prefer plain text over HTML for RAG indexing (less boilerplate)
|
|
|
|
Follow function signature and return type from RESEARCH.md Example 3 (lines 196-238).
|
|
</action>
|
|
<verify>
|
|
- `cat blueprints/email/parser_service.py` shows parse_email_body function
|
|
- `python -c "from blueprints.email.parser_service import parse_email_body; print('✓ Parser imports')"` succeeds
|
|
- `grep "message_from_bytes" blueprints/email/parser_service.py` shows stdlib email module usage
|
|
- `grep "get_body" blueprints/email/parser_service.py` shows modern EmailMessage API usage
|
|
- `grep "html2text" blueprints/email/parser_service.py` shows HTML conversion
|
|
</verify>
|
|
<done>parse_email_body function exists, extracts text/HTML bodies using modern email.message API, converts HTML to text when needed, returns complete metadata dictionary</done>
|
|
</task>
|
|
|
|
</tasks>
|
|
|
|
<verification>
|
|
After task completion:
|
|
1. Test IMAP connection (requires test IMAP server or skip):
|
|
```python
|
|
from blueprints.email.imap_service import IMAPService
|
|
import asyncio
|
|
|
|
async def test():
|
|
service = IMAPService()
|
|
# Connect to test server (e.g., imap.gmail.com)
|
|
# Test will be done in Phase 2 with real accounts
|
|
print("✓ IMAPService ready for testing")
|
|
|
|
asyncio.run(test())
|
|
```
|
|
|
|
2. Test email parsing with sample RFC822 message:
|
|
```python
|
|
from blueprints.email.parser_service import parse_email_body
|
|
|
|
# Create minimal RFC822 message
|
|
sample = b"""From: sender@example.com
|
|
To: recipient@example.com
|
|
Subject: Test Email
|
|
Message-ID: <test123@example.com>
|
|
Date: Mon, 7 Feb 2026 10:00:00 -0800
|
|
Content-Type: text/plain; charset="utf-8"
|
|
|
|
This is the email body.
|
|
"""
|
|
|
|
result = parse_email_body(sample)
|
|
assert result["subject"] == "Test Email"
|
|
assert "email body" in result["text"]
|
|
assert result["preferred"] is not None
|
|
print("✓ Email parsing works")
|
|
```
|
|
|
|
3. Verify dependencies installed: `pip list | grep -E "(aioimaplib|html2text)"` shows both packages
|
|
</verification>
|
|
|
|
<success_criteria>
|
|
- IMAPService can establish connection with host/username/password/port parameters
|
|
- IMAPService.connect() returns authenticated IMAP4_SSL client
|
|
- IMAPService.list_folders() parses IMAP LIST response and returns folder names
|
|
- IMAPService.close() calls logout() for proper TCP cleanup
|
|
- parse_email_body() extracts text and HTML bodies from RFC822 bytes
|
|
- parse_email_body() prefers plain text over HTML for "preferred" field
|
|
- parse_email_body() converts HTML to text using html2text when text body missing
|
|
- parse_email_body() extracts all metadata: subject, from, to, date, message_id
|
|
- Both services follow async patterns and logging conventions from existing codebase
|
|
- Dependencies (aioimaplib, html2text) added to pyproject.toml and installed
|
|
</success_criteria>
|
|
|
|
<output>
|
|
After completion, create `.planning/phases/01-foundation/01-02-SUMMARY.md`
|
|
</output>
|