- Created docker-compose.yml with 4 services: - postgres: PostgreSQL 16 database with persistent volume - redis: Redis 7 message broker - app: Flask web application (port 5000) - celery: Celery worker for async downloads - Created Dockerfile with Python 3.14, FFmpeg, and uv - Added psycopg2-binary dependency for PostgreSQL driver - Updated database.py to use DATABASE_URL environment variable - Supports PostgreSQL in production - Falls back to SQLite for local development - Updated celery_app.py to use environment variables: - CELERY_BROKER_URL and CELERY_RESULT_BACKEND - Created .env.example with all configuration variables - Created .dockerignore to optimize Docker builds - Updated .gitignore to exclude .env and Docker files - Updated CLAUDE.md with comprehensive Docker documentation: - Quick start with docker-compose commands - Environment variable configuration - Local development setup instructions - Service architecture overview All services have health checks and automatic restart configured. Start entire stack with: docker-compose up 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
11 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
yottob is a Flask-based web application for processing YouTube RSS feeds with SQLAlchemy ORM persistence and async video downloads. The project provides both a REST API and CLI interface for fetching and parsing YouTube channel feeds, with filtering logic to exclude YouTube Shorts. All fetched feeds are automatically saved to a PostgreSQL database for historical tracking. Videos can be downloaded asynchronously as MP4 files using Celery workers and yt-dlp.
The application is containerized with Docker and uses docker-compose to orchestrate multiple services: PostgreSQL, Redis, Flask web app, and Celery worker.
Quick Start with Docker Compose (Recommended)
Prerequisites:
- Docker and Docker Compose installed
- No additional dependencies needed
Start all services:
# Copy environment variables template
cp .env.example .env
# Start all services (postgres, redis, app, celery)
docker-compose up -d
# View logs
docker-compose logs -f
# Stop all services
docker-compose down
# Stop and remove volumes (deletes database data)
docker-compose down -v
Run database migrations (first time setup or after model changes):
docker-compose exec app alembic upgrade head
Access the application:
- Web API: http://localhost:5000
- PostgreSQL: localhost:5432
- Redis: localhost:6379
Development Setup (Local Without Docker)
This project uses uv for dependency management.
Install dependencies:
uv sync
Activate virtual environment:
source .venv/bin/activate # On macOS/Linux
Set up environment variables:
cp .env.example .env
# Edit .env with your local configuration
Start PostgreSQL (choose one):
# Using Docker
docker run -d -p 5432:5432 \
-e POSTGRES_USER=yottob \
-e POSTGRES_PASSWORD=yottob_password \
-e POSTGRES_DB=yottob \
postgres:16-alpine
# Or use existing PostgreSQL installation
Start Redis:
# macOS with Homebrew
brew services start redis
# Linux
sudo systemctl start redis
# Docker
docker run -d -p 6379:6379 redis:alpine
Initialize/update database:
source .venv/bin/activate && alembic upgrade head
Start Celery worker (required for video downloads):
source .venv/bin/activate && celery -A celery_app worker --loglevel=info
Running the Application
With Docker Compose:
docker-compose up
Local development:
# Run the CLI feed parser
python main.py
# Run the Flask web application
flask --app main run
The web server exposes:
/- Main page (rendersindex.html)/api/feed- API endpoint for fetching feeds and saving to database/api/channels- List all tracked channels/api/history/<channel_id>- Get video history for a specific channel/api/download/<video_id>- Trigger video download (POST)/api/download/status/<video_id>- Check download status (GET)/api/download/batch- Batch download multiple videos (POST)
API Usage Examples:
# Fetch default channel feed (automatically saves to DB)
curl http://localhost:5000/api/feed
# Fetch specific channel with options
curl "http://localhost:5000/api/feed?channel_id=CHANNEL_ID&filter_shorts=false&save=true"
# List all tracked channels
curl http://localhost:5000/api/channels
# Get video history for a channel (limit 20 videos)
curl "http://localhost:5000/api/history/CHANNEL_ID?limit=20"
# Trigger download for a specific video
curl -X POST http://localhost:5000/api/download/123
# Check download status
curl http://localhost:5000/api/download/status/123
# Batch download all pending videos for a channel
curl -X POST "http://localhost:5000/api/download/batch?channel_id=CHANNEL_ID&status=pending"
# Batch download specific video IDs
curl -X POST http://localhost:5000/api/download/batch \
-H "Content-Type: application/json" \
-d '{"video_ids": [1, 2, 3, 4, 5]}'
Architecture
The codebase follows a clean layered architecture with separation of concerns:
Database Layer
models.py - SQLAlchemy ORM models
Base: Declarative base for all modelsDownloadStatus: Enum for download states (pending, downloading, completed, failed)Channel: Stores YouTube channel metadata (channel_id, title, link, last_fetched)VideoEntry: Stores individual video entries with foreign key to Channel, plus download tracking fields:download_status,download_path,download_started_at,download_completed_at,download_error,file_size
- Relationships: One Channel has many VideoEntry records
database.py - Database configuration and session management
DATABASE_URL: Database URL from environment variable (PostgreSQL in production, SQLite fallback for local dev)engine: SQLAlchemy engine instanceinit_db(): Creates all tablesget_db_session(): Context manager for database sessions
Async Task Queue Layer
celery_app.py - Celery configuration
- Celery instance configured with Redis broker
- Task serialization and worker configuration
- 1-hour task timeout with automatic retries
download_service.py - Video download tasks
download_video(video_id): Celery task to download a single video as MP4- Uses yt-dlp with MP4 format preference
- Updates database with download progress and status
- Automatic retry on failure (max 3 attempts)
download_videos_batch(video_ids): Queue multiple downloads- Downloads saved to
downloads/directory
Core Logic Layer
feed_parser.py - Reusable YouTube feed parsing module
YouTubeFeedParser: Main parser class that encapsulates channel-specific logicFeedEntry: In-memory data model for feed entriesfetch_feed(): Fetches and parses RSS feedssave_to_db(): Persists feed data to database with upsert logic- Independent of Flask - can be imported and used in any Python context
Web Server Layer
main.py - Flask application and routes
app: Flask application instance (main.py:10)- Database initialization on startup (main.py:16)
index(): Homepage route handler (main.py:21)get_feed(): REST API endpoint (main.py:27) that fetches and saves to DBget_channels(): Lists all tracked channels (main.py:60)get_history(): Returns video history for a channel (main.py:87)trigger_download(): Queue video download task (main.py:134)get_download_status(): Check download status (main.py:163)trigger_batch_download(): Queue multiple downloads (main.py:193)main(): CLI entry point for testing (main.py:251)
Templates
templates/index.html - Frontend HTML (currently static placeholder)
Feed Parsing Implementation
The YouTubeFeedParser class in feed_parser.py:
- Constructs YouTube RSS feed URLs from channel IDs
- Uses feedparser to fetch and parse feeds
- Validates HTTP 200 status before processing
- Optionally filters out YouTube Shorts (any entry with "shorts" in URL)
- Returns structured dictionary with feed metadata and entries
YouTube RSS Feed URL Format:
https://www.youtube.com/feeds/videos.xml?channel_id={CHANNEL_ID}
Database Migrations
This project uses Alembic for database schema migrations.
Create a new migration after model changes:
source .venv/bin/activate && alembic revision --autogenerate -m "Description of changes"
Apply migrations:
source .venv/bin/activate && alembic upgrade head
View migration history:
source .venv/bin/activate && alembic history
Rollback to previous version:
source .venv/bin/activate && alembic downgrade -1
Migration files location: alembic/versions/
Important notes:
- Always review auto-generated migrations before applying
- The database is automatically initialized on Flask app startup via
init_db() - Migration configuration is in
alembic.iniandalembic/env.py - Models are imported in
alembic/env.pyfor autogenerate support
Database Schema
channels table:
id: Primary keychannel_id: YouTube channel ID (unique, indexed)title: Channel titlelink: Channel URLlast_fetched: Timestamp of last feed fetch
video_entries table:
id: Primary keychannel_id: Foreign key to channels.idtitle: Video titlelink: Video URL (unique)created_at: Timestamp when video was first recordeddownload_status: Enum (pending, downloading, completed, failed)download_path: Local file path to downloaded MP4download_started_at: When download begandownload_completed_at: When download finisheddownload_error: Error message if download failedfile_size: Size in bytes of downloaded file- Index:
idx_channel_createdon (channel_id, created_at) for fast queries - Index:
idx_download_statuson download_status for filtering
Video Download System
The application uses Celery with Redis for asynchronous video downloads:
Download Workflow:
- User triggers download via
/api/download/<video_id>(POST) - VideoEntry status changes to "downloading"
- Celery worker picks up task and uses yt-dlp to download as MP4
- Progress updates written to database
- On completion, status changes to "completed" with file path
- On failure, status changes to "failed" with error message (auto-retry 3x)
yt-dlp Configuration:
- Format:
bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best - Output format: MP4 (converted if necessary using FFmpeg)
- Output location:
downloads/<video_id>_<title>.mp4 - Progress hooks for real-time status updates
Requirements:
- Redis server must be running (localhost:6379)
- Celery worker must be running to process downloads
- FFmpeg recommended for format conversion (yt-dlp will use it if available)
Environment Variables
All environment variables can be configured in .env file (see .env.example for template):
DATABASE_URL: PostgreSQL connection string (default:sqlite:///yottob.dbfor local dev)CELERY_BROKER_URL: Redis URL for Celery broker (default:redis://localhost:6379/0)CELERY_RESULT_BACKEND: Redis URL for Celery results (default:redis://localhost:6379/0)FLASK_ENV: Flask environment (development or production)POSTGRES_USER: PostgreSQL username (for docker-compose)POSTGRES_PASSWORD: PostgreSQL password (for docker-compose)POSTGRES_DB: PostgreSQL database name (for docker-compose)
Docker Compose Services
The application consists of 4 services defined in docker-compose.yml:
- postgres: PostgreSQL 16 database with persistent volume
- redis: Redis 7 message broker for Celery
- app: Flask web application (exposed on port 5000)
- celery: Celery worker for async video downloads
All services have health checks and automatic restarts configured.
Dependencies
- Flask 3.1.2+: Web framework
- feedparser 6.0.12+: RSS/Atom feed parsing
- SQLAlchemy 2.0.0+: ORM for database operations
- psycopg2-binary 2.9.0+: PostgreSQL database driver
- Alembic 1.13.0+: Database migration tool
- Celery 5.3.0+: Distributed task queue for async jobs
- Redis 5.0.0+: Message broker for Celery
- yt-dlp 2024.0.0+: YouTube video downloader
- Python 3.14+: Required runtime version