Files
simbarag/.planning/phases/01-foundation/01-02-PLAN.md
Ryan Chen 800c6fef7f docs(01): create phase plan
Phase 01: Foundation
- 2 plan(s) in 2 wave(s)
- 1 parallel, 1 sequential
- Ready for execution
2026-02-07 13:35:48 -05:00

11 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, must_haves
phase plan type wave depends_on files_modified autonomous must_haves
01-foundation 02 execute 2
01-01
blueprints/email/imap_service.py
blueprints/email/parser_service.py
pyproject.toml
true
truths artifacts key_links
IMAP service can connect to mail server and authenticate with credentials
IMAP service can list mailbox folders and return parsed folder names
Email parser extracts plain text and HTML bodies from multipart messages
Email parser handles emails with only text, only HTML, or both formats
path provides min_lines exports
blueprints/email/imap_service.py IMAP connection and folder listing 60
IMAPService
path provides min_lines exports
blueprints/email/parser_service.py Email body parsing from RFC822 bytes 50
parse_email_body
path provides contains
pyproject.toml aioimaplib and html2text dependencies aioimaplib
from to via pattern
blueprints/email/imap_service.py aioimaplib.IMAP4_SSL import and instantiation from aioimaplib import IMAP4_SSL
from to via pattern
blueprints/email/parser_service.py email.message_from_bytes stdlib email module from email import message_from_bytes
from to via pattern
blueprints/email/imap_service.py blueprints/email/models.EmailAccount type hints for account parameter account: EmailAccount
Build IMAP connection utility and email parsing service for retrieving and processing email messages.

Purpose: Create the integration layer that communicates with IMAP mail servers and parses RFC822 email format into usable text content. These services enable the system to fetch emails and extract meaningful text for RAG indexing.

Output: IMAPService class with async connection handling, folder listing, and proper cleanup. Email parsing function that extracts text/HTML bodies from multipart MIME messages.

<execution_context> @/Users/ryanchen/.claude/get-shit-done/workflows/execute-plan.md @/Users/ryanchen/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/phases/01-foundation/01-RESEARCH.md @.planning/phases/01-foundation/01-01-SUMMARY.md @blueprints/email/models.py @utils/ynab_service.py @utils/mealie_service.py Task 1: Implement IMAP connection service with authentication and folder listing blueprints/email/imap_service.py pyproject.toml **1. Add dependencies to pyproject.toml:** - Add to `dependencies` array: `"aioimaplib>=2.0.1"` and `"html2text>=2025.4.15"` - Run `pip install aioimaplib html2text` to install

2. Create imap_service.py with IMAPService class:

Implement async IMAP client following patterns from RESEARCH.md (lines 116-188, 494-577):

import asyncio
import logging
from typing import Optional
from aioimaplib import IMAP4_SSL

logger = logging.getLogger(__name__)

class IMAPService:
    """Async IMAP client for email operations."""

    async def connect(
        self,
        host: str,
        username: str,
        password: str,
        port: int = 993,
        timeout: int = 10
    ) -> IMAP4_SSL:
        """
        Establish IMAP connection with authentication.

        Returns authenticated IMAP4_SSL client.
        Raises exception on connection or auth failure.
        Must call close() to properly disconnect.
        """
        # Create connection with timeout
        # Wait for server greeting
        # Authenticate with login()
        # Return authenticated client
        # On failure: call logout() and raise

    async def list_folders(self, imap: IMAP4_SSL) -> list[str]:
        """
        List all mailbox folders.

        Returns list of folder names (e.g., ["INBOX", "Sent", "Drafts"]).
        """
        # Call imap.list('""', '*')
        # Parse LIST response lines
        # Extract folder names from response format: (* LIST (...) "/" "INBOX")
        # Return cleaned folder names

    async def close(self, imap: IMAP4_SSL) -> None:
        """
        Properly close IMAP connection.

        CRITICAL: Must use logout(), not close().
        close() only closes mailbox, logout() closes TCP connection.
        """
        # Try/except for best-effort cleanup
        # Call await imap.logout()

Key implementation details:

  • Import IMAP4_SSL from aioimaplib
  • Use await imap.wait_hello_from_server() after instantiation
  • Use await imap.login(username, password) for authentication
  • Always call logout() not close() to close TCP connection
  • Handle connection errors with try/except and logger.error
  • Use logger with prefix [IMAP] for operations and [IMAP ERROR] for failures
  • Follow async patterns from existing service classes (ynab_service.py, mealie_service.py)

Anti-patterns to avoid (from RESEARCH.md lines 331-339):

  • Don't use imap.close() for disconnect (only closes mailbox)
  • Don't share connections across tasks (not thread-safe)
  • Always logout() in finally block for cleanup
    • cat blueprints/email/imap_service.py shows IMAPService class with connect/list_folders/close methods
    • python -c "from blueprints.email.imap_service import IMAPService; print('✓ IMAPService imports')" succeeds
    • grep "await imap.logout()" blueprints/email/imap_service.py shows proper cleanup
    • grep "aioimaplib" pyproject.toml shows dependency added IMAPService class exists with async connect/list_folders/close methods, uses aioimaplib correctly with logout() for cleanup, dependencies added to pyproject.toml
Task 2: Create email body parser for multipart MIME messages blueprints/email/parser_service.py Create parser_service.py with email parsing function following RESEARCH.md patterns (lines 190-239, 494-577):
import logging
from email import message_from_bytes
from email.policy import default
from email.utils import parsedate_to_datetime
from typing import Optional
import html2text

logger = logging.getLogger(__name__)

def parse_email_body(raw_email_bytes: bytes) -> dict:
    """
    Extract text and HTML bodies from RFC822 email bytes.

    Args:
        raw_email_bytes: Raw email message bytes from IMAP FETCH

    Returns:
        Dictionary with keys:
        - "text": Plain text body (None if not present)
        - "html": HTML body (None if not present)
        - "preferred": Best available body (text preferred, HTML converted if text missing)
        - "subject": Email subject
        - "from": Sender address
        - "to": Recipient address(es)
        - "date": Parsed datetime object
        - "message_id": RFC822 Message-ID header
    """
    # Parse with modern EmailMessage API and default policy
    # Use msg.get_body(preferencelist=('plain',)) for text part
    # Use msg.get_body(preferencelist=('html',)) for HTML part
    # Call get_content() on parts for proper decoding (not get_payload())
    # If text exists: preferred = text
    # If text missing and HTML exists: convert HTML to text with html2text
    # Extract metadata: subject, from, to, date, message-id
    # Use parsedate_to_datetime() for date parsing
    # Return dictionary with all fields

Implementation details:

  • Use message_from_bytes(raw_email_bytes, policy=default) for modern API
  • Use msg.get_body(preferencelist=(...)) to handle multipart/alternative correctly
  • Call part.get_content() not part.get_payload() for proper decoding (handles encoding automatically)
  • For HTML conversion: h = html2text.HTML2Text(); h.ignore_links = False; text = h.handle(html_body)
  • Handle missing headers gracefully: msg.get("header-name", "") returns empty string if missing
  • Use parsedate_to_datetime() from email.utils to parse Date header into datetime object
  • Log errors with [EMAIL PARSER] prefix
  • Handle UnicodeDecodeError by logging and returning partial data

Key insight from RESEARCH.md (line 389-399):

  • Use email.policy.default for modern encoding handling
  • Call get_content() not get_payload() to avoid encoding issues
  • Prefer plain text over HTML for RAG indexing (less boilerplate)

Follow function signature and return type from RESEARCH.md Example 3 (lines 196-238). - cat blueprints/email/parser_service.py shows parse_email_body function - python -c "from blueprints.email.parser_service import parse_email_body; print('✓ Parser imports')" succeeds - grep "message_from_bytes" blueprints/email/parser_service.py shows stdlib email module usage - grep "get_body" blueprints/email/parser_service.py shows modern EmailMessage API usage - grep "html2text" blueprints/email/parser_service.py shows HTML conversion parse_email_body function exists, extracts text/HTML bodies using modern email.message API, converts HTML to text when needed, returns complete metadata dictionary

After task completion: 1. Test IMAP connection (requires test IMAP server or skip): ```python from blueprints.email.imap_service import IMAPService import asyncio

async def test(): service = IMAPService() # Connect to test server (e.g., imap.gmail.com) # Test will be done in Phase 2 with real accounts print("✓ IMAPService ready for testing")

asyncio.run(test())


2. Test email parsing with sample RFC822 message:
```python
from blueprints.email.parser_service import parse_email_body

# Create minimal RFC822 message
sample = b"""From: sender@example.com
To: recipient@example.com
Subject: Test Email
Message-ID: <test123@example.com>
Date: Mon, 7 Feb 2026 10:00:00 -0800
Content-Type: text/plain; charset="utf-8"

This is the email body.
"""

result = parse_email_body(sample)
assert result["subject"] == "Test Email"
assert "email body" in result["text"]
assert result["preferred"] is not None
print("✓ Email parsing works")
  1. Verify dependencies installed: pip list | grep -E "(aioimaplib|html2text)" shows both packages

<success_criteria>

  • IMAPService can establish connection with host/username/password/port parameters
  • IMAPService.connect() returns authenticated IMAP4_SSL client
  • IMAPService.list_folders() parses IMAP LIST response and returns folder names
  • IMAPService.close() calls logout() for proper TCP cleanup
  • parse_email_body() extracts text and HTML bodies from RFC822 bytes
  • parse_email_body() prefers plain text over HTML for "preferred" field
  • parse_email_body() converts HTML to text using html2text when text body missing
  • parse_email_body() extracts all metadata: subject, from, to, date, message_id
  • Both services follow async patterns and logging conventions from existing codebase
  • Dependencies (aioimaplib, html2text) added to pyproject.toml and installed </success_criteria>
After completion, create `.planning/phases/01-foundation/01-02-SUMMARY.md`