- Added yt-dlp, celery, and redis dependencies to pyproject.toml - Extended VideoEntry model with download tracking fields: - download_status (enum: pending, downloading, completed, failed) - download_path, download_started_at, download_completed_at - download_error, file_size - Created celery_app.py with Redis broker configuration - Created download_service.py with async download tasks: - download_video() task downloads as MP4 format - Configured yt-dlp for best MP4 quality with fallback - Automatic retries on failure (max 3 attempts) - Progress tracking and database updates - Added Flask API endpoints in main.py: - POST /api/download/<video_id> to trigger download - GET /api/download/status/<video_id> to check status - POST /api/download/batch for bulk downloads - Generated and applied Alembic migration for new fields - Created downloads/ directory for video storage - Updated .gitignore to exclude downloads/ directory - Updated CLAUDE.md with comprehensive documentation: - Redis and Celery setup instructions - Download workflow and architecture - yt-dlp configuration details - New API endpoint examples Videos are downloaded as MP4 files using Celery workers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
8.9 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
yottob is a Flask-based web application for processing YouTube RSS feeds with SQLAlchemy ORM persistence and async video downloads. The project provides both a REST API and CLI interface for fetching and parsing YouTube channel feeds, with filtering logic to exclude YouTube Shorts. All fetched feeds are automatically saved to a SQLite database for historical tracking. Videos can be downloaded asynchronously as MP4 files using Celery workers and yt-dlp.
Development Setup
This project uses uv for dependency management.
Install dependencies:
uv sync
Activate virtual environment:
source .venv/bin/activate # On macOS/Linux
Initialize/update database:
# Run migrations to create or update database schema
source .venv/bin/activate && alembic upgrade head
Start Redis (required for Celery):
# macOS with Homebrew
brew services start redis
# Linux
sudo systemctl start redis
# Docker
docker run -d -p 6379:6379 redis:alpine
# Verify Redis is running
redis-cli ping # Should return "PONG"
Start Celery worker (required for video downloads):
source .venv/bin/activate && celery -A celery_app worker --loglevel=info
Running the Application
Run the CLI feed parser:
python main.py
This executes the main() function which fetches and parses a YouTube channel RSS feed for testing.
Run the Flask web application:
flask --app main run
The web server exposes:
/- Main page (rendersindex.html)/api/feed- API endpoint for fetching feeds and saving to database/api/channels- List all tracked channels/api/history/<channel_id>- Get video history for a specific channel/api/download/<video_id>- Trigger video download (POST)/api/download/status/<video_id>- Check download status (GET)/api/download/batch- Batch download multiple videos (POST)
API Usage Examples:
# Fetch default channel feed (automatically saves to DB)
curl http://localhost:5000/api/feed
# Fetch specific channel with options
curl "http://localhost:5000/api/feed?channel_id=CHANNEL_ID&filter_shorts=false&save=true"
# List all tracked channels
curl http://localhost:5000/api/channels
# Get video history for a channel (limit 20 videos)
curl "http://localhost:5000/api/history/CHANNEL_ID?limit=20"
# Trigger download for a specific video
curl -X POST http://localhost:5000/api/download/123
# Check download status
curl http://localhost:5000/api/download/status/123
# Batch download all pending videos for a channel
curl -X POST "http://localhost:5000/api/download/batch?channel_id=CHANNEL_ID&status=pending"
# Batch download specific video IDs
curl -X POST http://localhost:5000/api/download/batch \
-H "Content-Type: application/json" \
-d '{"video_ids": [1, 2, 3, 4, 5]}'
Architecture
The codebase follows a clean layered architecture with separation of concerns:
Database Layer
models.py - SQLAlchemy ORM models
Base: Declarative base for all modelsDownloadStatus: Enum for download states (pending, downloading, completed, failed)Channel: Stores YouTube channel metadata (channel_id, title, link, last_fetched)VideoEntry: Stores individual video entries with foreign key to Channel, plus download tracking fields:download_status,download_path,download_started_at,download_completed_at,download_error,file_size
- Relationships: One Channel has many VideoEntry records
database.py - Database configuration and session management
DATABASE_URL: SQLite database location (yottob.db)engine: SQLAlchemy engine instanceinit_db(): Creates all tablesget_db_session(): Context manager for database sessions
Async Task Queue Layer
celery_app.py - Celery configuration
- Celery instance configured with Redis broker
- Task serialization and worker configuration
- 1-hour task timeout with automatic retries
download_service.py - Video download tasks
download_video(video_id): Celery task to download a single video as MP4- Uses yt-dlp with MP4 format preference
- Updates database with download progress and status
- Automatic retry on failure (max 3 attempts)
download_videos_batch(video_ids): Queue multiple downloads- Downloads saved to
downloads/directory
Core Logic Layer
feed_parser.py - Reusable YouTube feed parsing module
YouTubeFeedParser: Main parser class that encapsulates channel-specific logicFeedEntry: In-memory data model for feed entriesfetch_feed(): Fetches and parses RSS feedssave_to_db(): Persists feed data to database with upsert logic- Independent of Flask - can be imported and used in any Python context
Web Server Layer
main.py - Flask application and routes
app: Flask application instance (main.py:10)- Database initialization on startup (main.py:16)
index(): Homepage route handler (main.py:21)get_feed(): REST API endpoint (main.py:27) that fetches and saves to DBget_channels(): Lists all tracked channels (main.py:60)get_history(): Returns video history for a channel (main.py:87)trigger_download(): Queue video download task (main.py:134)get_download_status(): Check download status (main.py:163)trigger_batch_download(): Queue multiple downloads (main.py:193)main(): CLI entry point for testing (main.py:251)
Templates
templates/index.html - Frontend HTML (currently static placeholder)
Feed Parsing Implementation
The YouTubeFeedParser class in feed_parser.py:
- Constructs YouTube RSS feed URLs from channel IDs
- Uses feedparser to fetch and parse feeds
- Validates HTTP 200 status before processing
- Optionally filters out YouTube Shorts (any entry with "shorts" in URL)
- Returns structured dictionary with feed metadata and entries
YouTube RSS Feed URL Format:
https://www.youtube.com/feeds/videos.xml?channel_id={CHANNEL_ID}
Database Migrations
This project uses Alembic for database schema migrations.
Create a new migration after model changes:
source .venv/bin/activate && alembic revision --autogenerate -m "Description of changes"
Apply migrations:
source .venv/bin/activate && alembic upgrade head
View migration history:
source .venv/bin/activate && alembic history
Rollback to previous version:
source .venv/bin/activate && alembic downgrade -1
Migration files location: alembic/versions/
Important notes:
- Always review auto-generated migrations before applying
- The database is automatically initialized on Flask app startup via
init_db() - Migration configuration is in
alembic.iniandalembic/env.py - Models are imported in
alembic/env.pyfor autogenerate support
Database Schema
channels table:
id: Primary keychannel_id: YouTube channel ID (unique, indexed)title: Channel titlelink: Channel URLlast_fetched: Timestamp of last feed fetch
video_entries table:
id: Primary keychannel_id: Foreign key to channels.idtitle: Video titlelink: Video URL (unique)created_at: Timestamp when video was first recordeddownload_status: Enum (pending, downloading, completed, failed)download_path: Local file path to downloaded MP4download_started_at: When download begandownload_completed_at: When download finisheddownload_error: Error message if download failedfile_size: Size in bytes of downloaded file- Index:
idx_channel_createdon (channel_id, created_at) for fast queries - Index:
idx_download_statuson download_status for filtering
Video Download System
The application uses Celery with Redis for asynchronous video downloads:
Download Workflow:
- User triggers download via
/api/download/<video_id>(POST) - VideoEntry status changes to "downloading"
- Celery worker picks up task and uses yt-dlp to download as MP4
- Progress updates written to database
- On completion, status changes to "completed" with file path
- On failure, status changes to "failed" with error message (auto-retry 3x)
yt-dlp Configuration:
- Format:
bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best - Output format: MP4 (converted if necessary using FFmpeg)
- Output location:
downloads/<video_id>_<title>.mp4 - Progress hooks for real-time status updates
Requirements:
- Redis server must be running (localhost:6379)
- Celery worker must be running to process downloads
- FFmpeg recommended for format conversion (yt-dlp will use it if available)
Dependencies
- Flask 3.1.2+: Web framework
- feedparser 6.0.12+: RSS/Atom feed parsing
- SQLAlchemy 2.0.0+: ORM for database operations
- Alembic 1.13.0+: Database migration tool
- Celery 5.3.0+: Distributed task queue for async jobs
- Redis 5.0.0+: Message broker for Celery
- yt-dlp 2024.0.0+: YouTube video downloader
- Python 3.14+: Required runtime version