Files
yottob/CLAUDE.md
Ryan Chen cf692d2299 Migrate to Docker Compose with PostgreSQL
- Created docker-compose.yml with 4 services:
  - postgres: PostgreSQL 16 database with persistent volume
  - redis: Redis 7 message broker
  - app: Flask web application (port 5000)
  - celery: Celery worker for async downloads
- Created Dockerfile with Python 3.14, FFmpeg, and uv
- Added psycopg2-binary dependency for PostgreSQL driver
- Updated database.py to use DATABASE_URL environment variable
  - Supports PostgreSQL in production
  - Falls back to SQLite for local development
- Updated celery_app.py to use environment variables:
  - CELERY_BROKER_URL and CELERY_RESULT_BACKEND
- Created .env.example with all configuration variables
- Created .dockerignore to optimize Docker builds
- Updated .gitignore to exclude .env and Docker files
- Updated CLAUDE.md with comprehensive Docker documentation:
  - Quick start with docker-compose commands
  - Environment variable configuration
  - Local development setup instructions
  - Service architecture overview

All services have health checks and automatic restart configured.
Start entire stack with: docker-compose up

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 14:09:40 -05:00

11 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

yottob is a Flask-based web application for processing YouTube RSS feeds with SQLAlchemy ORM persistence and async video downloads. The project provides both a REST API and CLI interface for fetching and parsing YouTube channel feeds, with filtering logic to exclude YouTube Shorts. All fetched feeds are automatically saved to a PostgreSQL database for historical tracking. Videos can be downloaded asynchronously as MP4 files using Celery workers and yt-dlp.

The application is containerized with Docker and uses docker-compose to orchestrate multiple services: PostgreSQL, Redis, Flask web app, and Celery worker.

Prerequisites:

  • Docker and Docker Compose installed
  • No additional dependencies needed

Start all services:

# Copy environment variables template
cp .env.example .env

# Start all services (postgres, redis, app, celery)
docker-compose up -d

# View logs
docker-compose logs -f

# Stop all services
docker-compose down

# Stop and remove volumes (deletes database data)
docker-compose down -v

Run database migrations (first time setup or after model changes):

docker-compose exec app alembic upgrade head

Access the application:

Development Setup (Local Without Docker)

This project uses uv for dependency management.

Install dependencies:

uv sync

Activate virtual environment:

source .venv/bin/activate  # On macOS/Linux

Set up environment variables:

cp .env.example .env
# Edit .env with your local configuration

Start PostgreSQL (choose one):

# Using Docker
docker run -d -p 5432:5432 \
  -e POSTGRES_USER=yottob \
  -e POSTGRES_PASSWORD=yottob_password \
  -e POSTGRES_DB=yottob \
  postgres:16-alpine

# Or use existing PostgreSQL installation

Start Redis:

# macOS with Homebrew
brew services start redis

# Linux
sudo systemctl start redis

# Docker
docker run -d -p 6379:6379 redis:alpine

Initialize/update database:

source .venv/bin/activate && alembic upgrade head

Start Celery worker (required for video downloads):

source .venv/bin/activate && celery -A celery_app worker --loglevel=info

Running the Application

With Docker Compose:

docker-compose up

Local development:

# Run the CLI feed parser
python main.py

# Run the Flask web application
flask --app main run

The web server exposes:

  • / - Main page (renders index.html)
  • /api/feed - API endpoint for fetching feeds and saving to database
  • /api/channels - List all tracked channels
  • /api/history/<channel_id> - Get video history for a specific channel
  • /api/download/<video_id> - Trigger video download (POST)
  • /api/download/status/<video_id> - Check download status (GET)
  • /api/download/batch - Batch download multiple videos (POST)

API Usage Examples:

# Fetch default channel feed (automatically saves to DB)
curl http://localhost:5000/api/feed

# Fetch specific channel with options
curl "http://localhost:5000/api/feed?channel_id=CHANNEL_ID&filter_shorts=false&save=true"

# List all tracked channels
curl http://localhost:5000/api/channels

# Get video history for a channel (limit 20 videos)
curl "http://localhost:5000/api/history/CHANNEL_ID?limit=20"

# Trigger download for a specific video
curl -X POST http://localhost:5000/api/download/123

# Check download status
curl http://localhost:5000/api/download/status/123

# Batch download all pending videos for a channel
curl -X POST "http://localhost:5000/api/download/batch?channel_id=CHANNEL_ID&status=pending"

# Batch download specific video IDs
curl -X POST http://localhost:5000/api/download/batch \
  -H "Content-Type: application/json" \
  -d '{"video_ids": [1, 2, 3, 4, 5]}'

Architecture

The codebase follows a clean layered architecture with separation of concerns:

Database Layer

models.py - SQLAlchemy ORM models

  • Base: Declarative base for all models
  • DownloadStatus: Enum for download states (pending, downloading, completed, failed)
  • Channel: Stores YouTube channel metadata (channel_id, title, link, last_fetched)
  • VideoEntry: Stores individual video entries with foreign key to Channel, plus download tracking fields:
    • download_status, download_path, download_started_at, download_completed_at, download_error, file_size
  • Relationships: One Channel has many VideoEntry records

database.py - Database configuration and session management

  • DATABASE_URL: Database URL from environment variable (PostgreSQL in production, SQLite fallback for local dev)
  • engine: SQLAlchemy engine instance
  • init_db(): Creates all tables
  • get_db_session(): Context manager for database sessions

Async Task Queue Layer

celery_app.py - Celery configuration

  • Celery instance configured with Redis broker
  • Task serialization and worker configuration
  • 1-hour task timeout with automatic retries

download_service.py - Video download tasks

  • download_video(video_id): Celery task to download a single video as MP4
    • Uses yt-dlp with MP4 format preference
    • Updates database with download progress and status
    • Automatic retry on failure (max 3 attempts)
  • download_videos_batch(video_ids): Queue multiple downloads
  • Downloads saved to downloads/ directory

Core Logic Layer

feed_parser.py - Reusable YouTube feed parsing module

  • YouTubeFeedParser: Main parser class that encapsulates channel-specific logic
  • FeedEntry: In-memory data model for feed entries
  • fetch_feed(): Fetches and parses RSS feeds
  • save_to_db(): Persists feed data to database with upsert logic
  • Independent of Flask - can be imported and used in any Python context

Web Server Layer

main.py - Flask application and routes

  • app: Flask application instance (main.py:10)
  • Database initialization on startup (main.py:16)
  • index(): Homepage route handler (main.py:21)
  • get_feed(): REST API endpoint (main.py:27) that fetches and saves to DB
  • get_channels(): Lists all tracked channels (main.py:60)
  • get_history(): Returns video history for a channel (main.py:87)
  • trigger_download(): Queue video download task (main.py:134)
  • get_download_status(): Check download status (main.py:163)
  • trigger_batch_download(): Queue multiple downloads (main.py:193)
  • main(): CLI entry point for testing (main.py:251)

Templates

templates/index.html - Frontend HTML (currently static placeholder)

Feed Parsing Implementation

The YouTubeFeedParser class in feed_parser.py:

  • Constructs YouTube RSS feed URLs from channel IDs
  • Uses feedparser to fetch and parse feeds
  • Validates HTTP 200 status before processing
  • Optionally filters out YouTube Shorts (any entry with "shorts" in URL)
  • Returns structured dictionary with feed metadata and entries

YouTube RSS Feed URL Format:

https://www.youtube.com/feeds/videos.xml?channel_id={CHANNEL_ID}

Database Migrations

This project uses Alembic for database schema migrations.

Create a new migration after model changes:

source .venv/bin/activate && alembic revision --autogenerate -m "Description of changes"

Apply migrations:

source .venv/bin/activate && alembic upgrade head

View migration history:

source .venv/bin/activate && alembic history

Rollback to previous version:

source .venv/bin/activate && alembic downgrade -1

Migration files location: alembic/versions/

Important notes:

  • Always review auto-generated migrations before applying
  • The database is automatically initialized on Flask app startup via init_db()
  • Migration configuration is in alembic.ini and alembic/env.py
  • Models are imported in alembic/env.py for autogenerate support

Database Schema

channels table:

  • id: Primary key
  • channel_id: YouTube channel ID (unique, indexed)
  • title: Channel title
  • link: Channel URL
  • last_fetched: Timestamp of last feed fetch

video_entries table:

  • id: Primary key
  • channel_id: Foreign key to channels.id
  • title: Video title
  • link: Video URL (unique)
  • created_at: Timestamp when video was first recorded
  • download_status: Enum (pending, downloading, completed, failed)
  • download_path: Local file path to downloaded MP4
  • download_started_at: When download began
  • download_completed_at: When download finished
  • download_error: Error message if download failed
  • file_size: Size in bytes of downloaded file
  • Index: idx_channel_created on (channel_id, created_at) for fast queries
  • Index: idx_download_status on download_status for filtering

Video Download System

The application uses Celery with Redis for asynchronous video downloads:

Download Workflow:

  1. User triggers download via /api/download/<video_id> (POST)
  2. VideoEntry status changes to "downloading"
  3. Celery worker picks up task and uses yt-dlp to download as MP4
  4. Progress updates written to database
  5. On completion, status changes to "completed" with file path
  6. On failure, status changes to "failed" with error message (auto-retry 3x)

yt-dlp Configuration:

  • Format: bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best
  • Output format: MP4 (converted if necessary using FFmpeg)
  • Output location: downloads/<video_id>_<title>.mp4
  • Progress hooks for real-time status updates

Requirements:

  • Redis server must be running (localhost:6379)
  • Celery worker must be running to process downloads
  • FFmpeg recommended for format conversion (yt-dlp will use it if available)

Environment Variables

All environment variables can be configured in .env file (see .env.example for template):

  • DATABASE_URL: PostgreSQL connection string (default: sqlite:///yottob.db for local dev)
  • CELERY_BROKER_URL: Redis URL for Celery broker (default: redis://localhost:6379/0)
  • CELERY_RESULT_BACKEND: Redis URL for Celery results (default: redis://localhost:6379/0)
  • FLASK_ENV: Flask environment (development or production)
  • POSTGRES_USER: PostgreSQL username (for docker-compose)
  • POSTGRES_PASSWORD: PostgreSQL password (for docker-compose)
  • POSTGRES_DB: PostgreSQL database name (for docker-compose)

Docker Compose Services

The application consists of 4 services defined in docker-compose.yml:

  1. postgres: PostgreSQL 16 database with persistent volume
  2. redis: Redis 7 message broker for Celery
  3. app: Flask web application (exposed on port 5000)
  4. celery: Celery worker for async video downloads

All services have health checks and automatic restarts configured.

Dependencies

  • Flask 3.1.2+: Web framework
  • feedparser 6.0.12+: RSS/Atom feed parsing
  • SQLAlchemy 2.0.0+: ORM for database operations
  • psycopg2-binary 2.9.0+: PostgreSQL database driver
  • Alembic 1.13.0+: Database migration tool
  • Celery 5.3.0+: Distributed task queue for async jobs
  • Redis 5.0.0+: Message broker for Celery
  • yt-dlp 2024.0.0+: YouTube video downloader
  • Python 3.14+: Required runtime version