Phases: 1. Foundation: Database models and IMAP utilities 2. Account Management: Admin UI for email configuration (ACCT-01 to ACCT-07) 3. Email Ingestion: Sync engine and retention cleanup (SYNC-01 to SYNC-09, RETN-01 to RETN-05) 4. Query Tools: LangChain email analytics (QUERY-01 to QUERY-06) All v1 requirements mapped to phases.
95 lines
4.1 KiB
Markdown
95 lines
4.1 KiB
Markdown
# Roadmap: SimbaRAG Email Integration
|
|
|
|
## Overview
|
|
|
|
Add IMAP email ingestion to SimbaRAG's existing document/finance/meal analytics capabilities. Admin users can configure email accounts, system syncs and embeds emails into ChromaDB on a schedule, automatically purges emails older than 30 days, and provides LangChain tools for inbox analytics through natural conversation.
|
|
|
|
## Phases
|
|
|
|
**Phase Numbering:**
|
|
- Integer phases (1, 2, 3): Planned milestone work
|
|
- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)
|
|
|
|
Decimal phases appear between their surrounding integers in numeric order.
|
|
|
|
- [ ] **Phase 1: Foundation** - Database models and IMAP utilities
|
|
- [ ] **Phase 2: Account Management** - Admin UI for configuring email accounts
|
|
- [ ] **Phase 3: Email Ingestion** - Sync engine, embeddings, retention cleanup
|
|
- [ ] **Phase 4: Query Tools** - LangChain tools for email analytics
|
|
|
|
## Phase Details
|
|
|
|
### Phase 1: Foundation
|
|
**Goal**: Core infrastructure for email ingestion is in place
|
|
**Depends on**: Nothing (first phase)
|
|
**Requirements**: None (foundational infrastructure)
|
|
**Success Criteria** (what must be TRUE):
|
|
1. Database tables exist for email accounts, sync status, and email metadata
|
|
2. IMAP connection utility can authenticate and list folders from test server
|
|
3. Email body parser extracts text from both plain text and HTML formats
|
|
4. Encryption utility securely stores and retrieves IMAP credentials
|
|
**Plans**: TBD
|
|
|
|
Plans:
|
|
- [ ] 01-01: TBD
|
|
|
|
### Phase 2: Account Management
|
|
**Goal**: Admin users can configure and manage IMAP email accounts
|
|
**Depends on**: Phase 1
|
|
**Requirements**: ACCT-01, ACCT-02, ACCT-03, ACCT-04, ACCT-05, ACCT-06, ACCT-07
|
|
**Success Criteria** (what must be TRUE):
|
|
1. Admin can add new IMAP account with host, port, username, password, folder selection
|
|
2. Admin can test IMAP connection and see success/failure before saving
|
|
3. Admin can view list of configured accounts with masked credentials
|
|
4. Admin can edit existing account configuration and delete accounts
|
|
5. Only users in lldap_admin group can access email account endpoints
|
|
**Plans**: TBD
|
|
|
|
Plans:
|
|
- [ ] 02-01: TBD
|
|
|
|
### Phase 3: Email Ingestion
|
|
**Goal**: System automatically syncs emails, creates embeddings, and purges old content
|
|
**Depends on**: Phase 2
|
|
**Requirements**: SYNC-01, SYNC-02, SYNC-03, SYNC-04, SYNC-05, SYNC-06, SYNC-07, SYNC-08, SYNC-09, RETN-01, RETN-02, RETN-03, RETN-04, RETN-05
|
|
**Success Criteria** (what must be TRUE):
|
|
1. System connects to configured IMAP accounts and fetches messages from selected folders
|
|
2. System parses email metadata (subject, sender, date) and extracts body text from plain/HTML
|
|
3. System generates embeddings and stores emails in ChromaDB with metadata
|
|
4. System performs scheduled sync at configurable intervals (default hourly)
|
|
5. System tracks last sync timestamp and performs incremental sync (only new emails)
|
|
6. System automatically purges emails older than retention period (default 30 days)
|
|
7. Admin can view sync logs showing success/failure, counts, and errors
|
|
**Plans**: TBD
|
|
|
|
Plans:
|
|
- [ ] 03-01: TBD
|
|
|
|
### Phase 4: Query Tools
|
|
**Goal**: Admin users can query email content through conversational interface
|
|
**Depends on**: Phase 3
|
|
**Requirements**: QUERY-01, QUERY-02, QUERY-03, QUERY-04, QUERY-05, QUERY-06
|
|
**Success Criteria** (what must be TRUE):
|
|
1. LangChain agent has tool to search emails by content, sender, or date range
|
|
2. Agent can identify most frequent senders in a timeframe
|
|
3. Agent can analyze subject lines and identify common topics
|
|
4. Agent can detect subscription/newsletter patterns (recurring senders, unsubscribe links)
|
|
5. Agent can answer time-based queries ("emails this week", "emails in January")
|
|
6. Only admin users can query email content via conversation interface
|
|
**Plans**: TBD
|
|
|
|
Plans:
|
|
- [ ] 04-01: TBD
|
|
|
|
## Progress
|
|
|
|
**Execution Order:**
|
|
Phases execute in numeric order: 1 → 2 → 3 → 4
|
|
|
|
| Phase | Plans Complete | Status | Completed |
|
|
|-------|----------------|--------|-----------|
|
|
| 1. Foundation | 0/1 | Not started | - |
|
|
| 2. Account Management | 0/1 | Not started | - |
|
|
| 3. Email Ingestion | 0/1 | Not started | - |
|
|
| 4. Query Tools | 0/1 | Not started | - |
|