Files
yottob/CLAUDE.md
Ryan Chen 637cf3b315 Update CLAUDE.md with comprehensive authentication documentation
- Added User Authentication section with security features
- Updated Frontend Interface section to note authentication requirement
- Updated Database Schema with users table and new channel/video fields
- Updated Database Layer architecture with User model details
- Updated Dependencies to include Flask-Login and bcrypt
- Documented user data isolation and multi-tenant architecture
- Added first-time setup instructions for registration

Documentation now reflects:
- Complete authentication system
- User-scoped data model
- Enhanced video metadata fields
- Security best practices

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 14:30:43 -05:00

457 lines
16 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
`yottob` is a Flask-based web application for processing YouTube RSS feeds with SQLAlchemy ORM persistence and async video downloads. The project provides both a REST API and CLI interface for fetching and parsing YouTube channel feeds, with filtering logic to exclude YouTube Shorts. All fetched feeds are automatically saved to a PostgreSQL database for historical tracking. Videos can be downloaded asynchronously as MP4 files using Celery workers and yt-dlp.
The application is containerized with Docker and uses docker-compose to orchestrate multiple services: PostgreSQL, Redis, Flask web app, and Celery worker.
## Quick Start with Docker Compose (Recommended)
**Prerequisites:**
- Docker and Docker Compose installed
- No additional dependencies needed
**Start all services:**
```bash
# Copy environment variables template
cp .env.example .env
# Start all services (postgres, redis, app, celery)
docker-compose up -d
# View logs
docker-compose logs -f
# Stop all services
docker-compose down
# Stop and remove volumes (deletes database data)
docker-compose down -v
```
**Run database migrations (first time setup or after model changes):**
```bash
docker-compose exec app alembic upgrade head
```
**Access the application:**
- Web API: http://localhost:5000
- PostgreSQL: localhost:5432
- Redis: localhost:6379
## Development Setup (Local Without Docker)
This project uses `uv` for dependency management.
**Install dependencies:**
```bash
uv sync
```
**Activate virtual environment:**
```bash
source .venv/bin/activate # On macOS/Linux
```
**Set up environment variables:**
```bash
cp .env.example .env
# Edit .env with your local configuration
```
**Start PostgreSQL (choose one):**
```bash
# Using Docker
docker run -d -p 5432:5432 \
-e POSTGRES_USER=yottob \
-e POSTGRES_PASSWORD=yottob_password \
-e POSTGRES_DB=yottob \
postgres:16-alpine
# Or use existing PostgreSQL installation
```
**Start Redis:**
```bash
# macOS with Homebrew
brew services start redis
# Linux
sudo systemctl start redis
# Docker
docker run -d -p 6379:6379 redis:alpine
```
**Initialize/update database:**
```bash
source .venv/bin/activate && alembic upgrade head
```
**Start Celery worker (required for video downloads):**
```bash
source .venv/bin/activate && celery -A celery_app worker --loglevel=info
```
## Running the Application
**With Docker Compose:**
```bash
docker-compose up
```
**Local development:**
```bash
# Run the CLI feed parser
python main.py
# Run the Flask web application
flask --app main run
```
## User Authentication
The application includes a complete user authentication system using Flask-Login and bcrypt:
**Authentication Pages:**
- `/register` - User registration with email and password
- `/login` - User login with "remember me" functionality
- `/logout` - User logout
**Security Features:**
- Passwords hashed with bcrypt and salt
- Session-based authentication via Flask-Login
- Protected routes with `@login_required` decorator
- User-specific data isolation (multi-tenant architecture)
- Secure password requirements (minimum 8 characters)
- Flash messages for all auth actions
- Redirect to requested page after login
**First Time Setup:**
1. Start the application
2. Navigate to http://localhost:5000/register
3. Create an account
4. Login and start subscribing to channels
**User Data Isolation:**
- Each user can only see their own channels and videos
- Channels are scoped by user_id
- All routes filter data by current_user.id
- Users cannot access other users' content
## Frontend Interface
The application includes a full-featured web interface built with Jinja2 templates:
**Pages:** (all require authentication)
- `/` - Dashboard showing user's videos sorted by date (newest first)
- `/channels` - User's channel management page with refresh functionality
- `/add-channel` - Form to subscribe to new YouTube channels
- `/watch/<video_id>` - Video player page for watching downloaded videos
**Features:**
- User registration and login system
- Video grid with thumbnails and metadata
- Real-time download status indicators (pending, downloading, completed, failed)
- Inline video downloads from dashboard
- HTML5 video player for streaming downloaded videos
- Channel subscription and management
- Refresh individual channels to fetch new videos
- Responsive design for mobile and desktop
- User-specific navigation showing username
**API Endpoints:**
- `/api/feed` - Fetch YouTube channel feed and save to database (GET)
- `/api/channels` - List all tracked channels (GET)
- `/api/history/<channel_id>` - Get video history for a specific channel (GET)
- `/api/download/<video_id>` - Trigger video download (POST)
- `/api/download/status/<video_id>` - Check download status (GET)
- `/api/download/batch` - Batch download multiple videos (POST)
- `/api/videos/refresh/<channel_id>` - Refresh videos for a channel (POST)
- `/api/video/stream/<video_id>` - Stream or download video file (GET)
**API Usage Examples:**
```bash
# Fetch default channel feed (automatically saves to DB)
curl http://localhost:5000/api/feed
# Fetch specific channel with options
curl "http://localhost:5000/api/feed?channel_id=CHANNEL_ID&filter_shorts=false&save=true"
# List all tracked channels
curl http://localhost:5000/api/channels
# Get video history for a channel (limit 20 videos)
curl "http://localhost:5000/api/history/CHANNEL_ID?limit=20"
# Trigger download for a specific video
curl -X POST http://localhost:5000/api/download/123
# Check download status
curl http://localhost:5000/api/download/status/123
# Batch download all pending videos for a channel
curl -X POST "http://localhost:5000/api/download/batch?channel_id=CHANNEL_ID&status=pending"
# Batch download specific video IDs
curl -X POST http://localhost:5000/api/download/batch \
-H "Content-Type: application/json" \
-d '{"video_ids": [1, 2, 3, 4, 5]}'
```
## Architecture
The codebase follows a clean layered architecture with separation of concerns:
### Database Layer
**`models.py`** - SQLAlchemy ORM models
- `Base`: Declarative base for all models
- `User`: User model with Flask-Login integration
- Stores username, email, password_hash, created_at
- Methods: `set_password()`, `check_password()` for bcrypt password handling
- Implements UserMixin for Flask-Login compatibility
- `DownloadStatus`: Enum for download states (pending, downloading, completed, failed)
- `Channel`: Stores YouTube channel metadata per user
- Fields: user_id, channel_id, title, link, rss_url, last_fetched_at
- Unique constraint: (user_id, channel_id) - one user can't subscribe twice
- `VideoEntry`: Stores individual video entries with full metadata
- Fields: video_id, title, video_url, thumbnail_url, description, published_at
- Download tracking: download_status, download_path, download_started_at, download_completed_at, download_error, file_size
- Unique constraint: (video_id, channel_id) - prevents duplicate videos
- Relationships:
- One User has many Channels
- One Channel has many VideoEntries
**`database.py`** - Database configuration and session management
- `DATABASE_URL`: Database URL from environment variable (PostgreSQL in production, SQLite fallback for local dev)
- `engine`: SQLAlchemy engine instance
- `init_db()`: Creates all tables
- `get_db_session()`: Context manager for database sessions
### Async Task Queue Layer
**`celery_app.py`** - Celery configuration
- Celery instance configured with Redis broker
- Task serialization and worker configuration
- 1-hour task timeout with automatic retries
**`download_service.py`** - Video download tasks
- `download_video(video_id)`: Celery task to download a single video as MP4
- Uses yt-dlp with MP4 format preference
- Updates database with download progress and status
- Automatic retry on failure (max 3 attempts)
- `download_videos_batch(video_ids)`: Queue multiple downloads
- Downloads saved to `downloads/` directory
### Core Logic Layer
**`feed_parser.py`** - Reusable YouTube feed parsing module
- `YouTubeFeedParser`: Main parser class that encapsulates channel-specific logic
- `FeedEntry`: In-memory data model for feed entries
- `fetch_feed()`: Fetches and parses RSS feeds
- `save_to_db()`: Persists feed data to database with upsert logic
- Independent of Flask - can be imported and used in any Python context
### Web Server Layer
**`main.py`** - Flask application and routes
**Frontend Routes:**
- `index()`: Dashboard page with all videos sorted by date (main.py:24)
- `channels_page()`: Channel management page (main.py:40)
- `add_channel_page()`: Add channel form and subscription handler (main.py:52)
- `watch_video()`: Video player page (main.py:94)
**API Routes:**
- `get_feed()`: Fetch YouTube feed and save to database (main.py:110)
- `get_channels()`: List all tracked channels (main.py:145)
- `get_history()`: Video history for a channel (main.py:172)
- `trigger_download()`: Queue video download task (main.py:216)
- `get_download_status()`: Check download status (main.py:258)
- `trigger_batch_download()`: Queue multiple downloads (main.py:290)
- `refresh_channel_videos()`: Refresh videos for a channel (main.py:347)
- `stream_video()`: Stream or download video file (main.py:391)
### Frontend Templates
**`templates/base.html`** - Base template with navigation and common layout
- Navigation bar with logo and menu
- Flash message display system
- Common styles and responsive design
**`templates/dashboard.html`** - Main video listing page
- Video grid sorted by published date (newest first)
- Thumbnail display with download status badges
- Inline download buttons for pending videos
- Empty state for new installations
**`templates/channels.html`** - Channel management interface
- List of subscribed channels with metadata
- Refresh button to fetch new videos per channel
- Link to add new channels
- Video count and last updated timestamps
**`templates/add_channel.html`** - Channel subscription form
- Form to input YouTube RSS feed URL
- Help section with instructions on finding RSS URLs
- Examples and format guidance
**`templates/watch.html`** - Video player page
- HTML5 video player for downloaded videos
- Download status placeholders (downloading, failed, pending)
- Video metadata (title, channel, publish date)
- Download button for pending videos
- Auto-refresh when video is downloading
**`static/style.css`** - Application styles
- Dark theme inspired by YouTube
- Responsive grid layout
- Video card components
- Form styling
- Badge and button components
## Feed Parsing Implementation
The `YouTubeFeedParser` class in `feed_parser.py`:
- Constructs YouTube RSS feed URLs from channel IDs
- Uses feedparser to fetch and parse feeds
- Validates HTTP 200 status before processing
- Optionally filters out YouTube Shorts (any entry with "shorts" in URL)
- Returns structured dictionary with feed metadata and entries
**YouTube RSS Feed URL Format:**
```
https://www.youtube.com/feeds/videos.xml?channel_id={CHANNEL_ID}
```
## Database Migrations
This project uses Alembic for database schema migrations.
**Create a new migration after model changes:**
```bash
source .venv/bin/activate && alembic revision --autogenerate -m "Description of changes"
```
**Apply migrations:**
```bash
source .venv/bin/activate && alembic upgrade head
```
**View migration history:**
```bash
source .venv/bin/activate && alembic history
```
**Rollback to previous version:**
```bash
source .venv/bin/activate && alembic downgrade -1
```
**Migration files location:** `alembic/versions/`
**Important notes:**
- Always review auto-generated migrations before applying
- The database is automatically initialized on Flask app startup via `init_db()`
- Migration configuration is in `alembic.ini` and `alembic/env.py`
- Models are imported in `alembic/env.py` for autogenerate support
## Database Schema
**users table:**
- `id`: Primary key
- `username`: Unique username (indexed)
- `email`: Unique email address (indexed)
- `password_hash`: Bcrypt-hashed password
- `created_at`: Timestamp when user registered
**channels table:**
- `id`: Primary key
- `user_id`: Foreign key to users.id (indexed)
- `channel_id`: YouTube channel ID (indexed)
- `title`: Channel title
- `link`: Channel URL
- `rss_url`: YouTube RSS feed URL
- `last_fetched_at`: Timestamp of last feed fetch
- Unique index: `idx_user_channel` on (user_id, channel_id) - prevents duplicate subscriptions
**video_entries table:**
- `id`: Primary key
- `channel_id`: Foreign key to channels.id
- `video_id`: YouTube video ID (indexed)
- `title`: Video title
- `video_url`: YouTube video URL (indexed)
- `thumbnail_url`: Video thumbnail URL
- `description`: Video description
- `published_at`: When video was published on YouTube (indexed)
- `created_at`: Timestamp when video was first recorded
- `download_status`: Enum (pending, downloading, completed, failed)
- `download_path`: Local file path to downloaded MP4
- `download_started_at`: When download began
- `download_completed_at`: When download finished
- `download_error`: Error message if download failed
- `file_size`: Size in bytes of downloaded file
- Unique index: `idx_video_id_channel` on (video_id, channel_id) - prevents duplicates
- Index: `idx_channel_created` on (channel_id, created_at) for fast queries
- Index: `idx_download_status` on download_status for filtering
- Index: `idx_published_at` on published_at for date sorting
## Video Download System
The application uses Celery with Redis for asynchronous video downloads:
**Download Workflow:**
1. User triggers download via `/api/download/<video_id>` (POST)
2. VideoEntry status changes to "downloading"
3. Celery worker picks up task and uses yt-dlp to download as MP4
4. Progress updates written to database
5. On completion, status changes to "completed" with file path
6. On failure, status changes to "failed" with error message (auto-retry 3x)
**yt-dlp Configuration:**
- Format: `bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best`
- Output format: MP4 (converted if necessary using FFmpeg)
- Output location: `downloads/<video_id>_<title>.mp4`
- Progress hooks for real-time status updates
**Requirements:**
- Redis server must be running (localhost:6379)
- Celery worker must be running to process downloads
- FFmpeg recommended for format conversion (yt-dlp will use it if available)
## Environment Variables
All environment variables can be configured in `.env` file (see `.env.example` for template):
- `DATABASE_URL`: PostgreSQL connection string (default: `sqlite:///yottob.db` for local dev)
- `CELERY_BROKER_URL`: Redis URL for Celery broker (default: `redis://localhost:6379/0`)
- `CELERY_RESULT_BACKEND`: Redis URL for Celery results (default: `redis://localhost:6379/0`)
- `FLASK_ENV`: Flask environment (development or production)
- `POSTGRES_USER`: PostgreSQL username (for docker-compose)
- `POSTGRES_PASSWORD`: PostgreSQL password (for docker-compose)
- `POSTGRES_DB`: PostgreSQL database name (for docker-compose)
## Docker Compose Services
The application consists of 4 services defined in `docker-compose.yml`:
1. **postgres**: PostgreSQL 16 database with persistent volume
2. **redis**: Redis 7 message broker for Celery
3. **app**: Flask web application (exposed on port 5000)
4. **celery**: Celery worker for async video downloads
All services have health checks and automatic restarts configured.
## Dependencies
- **Flask 3.1.2+**: Web framework
- **Flask-Login 0.6.0+**: User session management
- **bcrypt 4.0.0+**: Password hashing
- **feedparser 6.0.12+**: RSS/Atom feed parsing
- **SQLAlchemy 2.0.0+**: ORM for database operations
- **psycopg2-binary 2.9.0+**: PostgreSQL database driver
- **Alembic 1.13.0+**: Database migration tool
- **Celery 5.3.0+**: Distributed task queue for async jobs
- **Redis 5.0.0+**: Message broker for Celery
- **yt-dlp 2024.0.0+**: YouTube video downloader
- **Python 3.14+**: Required runtime version