Files

Ryan Chen 637cf3b315 Update CLAUDE.md with comprehensive authentication documentation

- Added User Authentication section with security features
- Updated Frontend Interface section to note authentication requirement
- Updated Database Schema with users table and new channel/video fields
- Updated Database Layer architecture with User model details
- Updated Dependencies to include Flask-Login and bcrypt
- Documented user data isolation and multi-tenant architecture
- Added first-time setup instructions for registration

Documentation now reflects:
- Complete authentication system
- User-scoped data model
- Enhanced video metadata fields
- Security best practices

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-26 14:30:43 -05:00

16 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

yottob is a Flask-based web application for processing YouTube RSS feeds with SQLAlchemy ORM persistence and async video downloads. The project provides both a REST API and CLI interface for fetching and parsing YouTube channel feeds, with filtering logic to exclude YouTube Shorts. All fetched feeds are automatically saved to a PostgreSQL database for historical tracking. Videos can be downloaded asynchronously as MP4 files using Celery workers and yt-dlp.

The application is containerized with Docker and uses docker-compose to orchestrate multiple services: PostgreSQL, Redis, Flask web app, and Celery worker.

Quick Start with Docker Compose (Recommended)

Prerequisites:

Docker and Docker Compose installed
No additional dependencies needed

Start all services:

# Copy environment variables template
cp .env.example .env

# Start all services (postgres, redis, app, celery)
docker-compose up -d

# View logs
docker-compose logs -f

# Stop all services
docker-compose down

# Stop and remove volumes (deletes database data)
docker-compose down -v

Run database migrations (first time setup or after model changes):

docker-compose exec app alembic upgrade head

Access the application:

Web API: http://localhost:5000
PostgreSQL: localhost:5432
Redis: localhost:6379

Development Setup (Local Without Docker)

This project uses uv for dependency management.

Install dependencies:

uv sync

Activate virtual environment:

source .venv/bin/activate  # On macOS/Linux

Set up environment variables:

cp .env.example .env
# Edit .env with your local configuration

Start PostgreSQL (choose one):

# Using Docker
docker run -d -p 5432:5432 \
  -e POSTGRES_USER=yottob \
  -e POSTGRES_PASSWORD=yottob_password \
  -e POSTGRES_DB=yottob \
  postgres:16-alpine

# Or use existing PostgreSQL installation

Start Redis:

# macOS with Homebrew
brew services start redis

# Linux
sudo systemctl start redis

# Docker
docker run -d -p 6379:6379 redis:alpine

Initialize/update database:

source .venv/bin/activate && alembic upgrade head

Start Celery worker (required for video downloads):

source .venv/bin/activate && celery -A celery_app worker --loglevel=info

Running the Application

With Docker Compose:

docker-compose up

Local development:

# Run the CLI feed parser
python main.py

# Run the Flask web application
flask --app main run

User Authentication

The application includes a complete user authentication system using Flask-Login and bcrypt:

Authentication Pages:

/register - User registration with email and password
/login - User login with "remember me" functionality
/logout - User logout

Security Features:

Passwords hashed with bcrypt and salt
Session-based authentication via Flask-Login
Protected routes with @login_required decorator
User-specific data isolation (multi-tenant architecture)
Secure password requirements (minimum 8 characters)
Flash messages for all auth actions
Redirect to requested page after login

First Time Setup:

Start the application
Navigate to http://localhost:5000/register
Create an account
Login and start subscribing to channels

User Data Isolation:

Each user can only see their own channels and videos
Channels are scoped by user_id
All routes filter data by current_user.id
Users cannot access other users' content

Frontend Interface

The application includes a full-featured web interface built with Jinja2 templates:

Pages: (all require authentication)

/ - Dashboard showing user's videos sorted by date (newest first)
/channels - User's channel management page with refresh functionality
/add-channel - Form to subscribe to new YouTube channels
/watch/<video_id> - Video player page for watching downloaded videos

Features:

User registration and login system
Video grid with thumbnails and metadata
Real-time download status indicators (pending, downloading, completed, failed)
Inline video downloads from dashboard
HTML5 video player for streaming downloaded videos
Channel subscription and management
Refresh individual channels to fetch new videos
Responsive design for mobile and desktop
User-specific navigation showing username

API Endpoints:

/api/feed - Fetch YouTube channel feed and save to database (GET)
/api/channels - List all tracked channels (GET)
/api/history/<channel_id> - Get video history for a specific channel (GET)
/api/download/<video_id> - Trigger video download (POST)
/api/download/status/<video_id> - Check download status (GET)
/api/download/batch - Batch download multiple videos (POST)
/api/videos/refresh/<channel_id> - Refresh videos for a channel (POST)
/api/video/stream/<video_id> - Stream or download video file (GET)

API Usage Examples:

# Fetch default channel feed (automatically saves to DB)
curl http://localhost:5000/api/feed

# Fetch specific channel with options
curl "http://localhost:5000/api/feed?channel_id=CHANNEL_ID&filter_shorts=false&save=true"

# List all tracked channels
curl http://localhost:5000/api/channels

# Get video history for a channel (limit 20 videos)
curl "http://localhost:5000/api/history/CHANNEL_ID?limit=20"

# Trigger download for a specific video
curl -X POST http://localhost:5000/api/download/123

# Check download status
curl http://localhost:5000/api/download/status/123

# Batch download all pending videos for a channel
curl -X POST "http://localhost:5000/api/download/batch?channel_id=CHANNEL_ID&status=pending"

# Batch download specific video IDs
curl -X POST http://localhost:5000/api/download/batch \
  -H "Content-Type: application/json" \
  -d '{"video_ids": [1, 2, 3, 4, 5]}'

Architecture

The codebase follows a clean layered architecture with separation of concerns:

Database Layer

models.py - SQLAlchemy ORM models

Base: Declarative base for all models
User: User model with Flask-Login integration
- Stores username, email, password_hash, created_at
- Methods: set_password(), check_password() for bcrypt password handling
- Implements UserMixin for Flask-Login compatibility
DownloadStatus: Enum for download states (pending, downloading, completed, failed)
Channel: Stores YouTube channel metadata per user
- Fields: user_id, channel_id, title, link, rss_url, last_fetched_at
- Unique constraint: (user_id, channel_id) - one user can't subscribe twice
VideoEntry: Stores individual video entries with full metadata
- Fields: video_id, title, video_url, thumbnail_url, description, published_at
- Download tracking: download_status, download_path, download_started_at, download_completed_at, download_error, file_size
- Unique constraint: (video_id, channel_id) - prevents duplicate videos
Relationships:
- One User has many Channels
- One Channel has many VideoEntries

database.py - Database configuration and session management

DATABASE_URL: Database URL from environment variable (PostgreSQL in production, SQLite fallback for local dev)
engine: SQLAlchemy engine instance
init_db(): Creates all tables
get_db_session(): Context manager for database sessions

Async Task Queue Layer

celery_app.py - Celery configuration

Celery instance configured with Redis broker
Task serialization and worker configuration
1-hour task timeout with automatic retries

download_service.py - Video download tasks

download_video(video_id): Celery task to download a single video as MP4
- Uses yt-dlp with MP4 format preference
- Updates database with download progress and status
- Automatic retry on failure (max 3 attempts)
download_videos_batch(video_ids): Queue multiple downloads
Downloads saved to downloads/ directory

Core Logic Layer

feed_parser.py - Reusable YouTube feed parsing module

YouTubeFeedParser: Main parser class that encapsulates channel-specific logic
FeedEntry: In-memory data model for feed entries
fetch_feed(): Fetches and parses RSS feeds
save_to_db(): Persists feed data to database with upsert logic
Independent of Flask - can be imported and used in any Python context

Web Server Layer

main.py - Flask application and routes

Frontend Routes:

index(): Dashboard page with all videos sorted by date (main.py:24)
channels_page(): Channel management page (main.py:40)
add_channel_page(): Add channel form and subscription handler (main.py:52)
watch_video(): Video player page (main.py:94)

API Routes:

get_feed(): Fetch YouTube feed and save to database (main.py:110)
get_channels(): List all tracked channels (main.py:145)
get_history(): Video history for a channel (main.py:172)
trigger_download(): Queue video download task (main.py:216)
get_download_status(): Check download status (main.py:258)
trigger_batch_download(): Queue multiple downloads (main.py:290)
refresh_channel_videos(): Refresh videos for a channel (main.py:347)
stream_video(): Stream or download video file (main.py:391)

Frontend Templates

templates/base.html - Base template with navigation and common layout

Navigation bar with logo and menu
Flash message display system
Common styles and responsive design

templates/dashboard.html - Main video listing page

Video grid sorted by published date (newest first)
Thumbnail display with download status badges
Inline download buttons for pending videos
Empty state for new installations

templates/channels.html - Channel management interface

List of subscribed channels with metadata
Refresh button to fetch new videos per channel
Link to add new channels
Video count and last updated timestamps

templates/add_channel.html - Channel subscription form

Form to input YouTube RSS feed URL
Help section with instructions on finding RSS URLs
Examples and format guidance

templates/watch.html - Video player page

HTML5 video player for downloaded videos
Download status placeholders (downloading, failed, pending)
Video metadata (title, channel, publish date)
Download button for pending videos
Auto-refresh when video is downloading

static/style.css - Application styles

Dark theme inspired by YouTube
Responsive grid layout
Video card components
Form styling
Badge and button components

Feed Parsing Implementation

The YouTubeFeedParser class in feed_parser.py:

Constructs YouTube RSS feed URLs from channel IDs
Uses feedparser to fetch and parse feeds
Validates HTTP 200 status before processing
Optionally filters out YouTube Shorts (any entry with "shorts" in URL)
Returns structured dictionary with feed metadata and entries

YouTube RSS Feed URL Format:

https://www.youtube.com/feeds/videos.xml?channel_id={CHANNEL_ID}

Database Migrations

This project uses Alembic for database schema migrations.

Create a new migration after model changes:

source .venv/bin/activate && alembic revision --autogenerate -m "Description of changes"

Apply migrations:

source .venv/bin/activate && alembic upgrade head

View migration history:

source .venv/bin/activate && alembic history

Rollback to previous version:

source .venv/bin/activate && alembic downgrade -1

Migration files location: alembic/versions/

Important notes:

Always review auto-generated migrations before applying
The database is automatically initialized on Flask app startup via init_db()
Migration configuration is in alembic.ini and alembic/env.py
Models are imported in alembic/env.py for autogenerate support

Database Schema

users table:

id: Primary key
username: Unique username (indexed)
email: Unique email address (indexed)
password_hash: Bcrypt-hashed password
created_at: Timestamp when user registered

channels table:

id: Primary key
user_id: Foreign key to users.id (indexed)
channel_id: YouTube channel ID (indexed)
title: Channel title
link: Channel URL
rss_url: YouTube RSS feed URL
last_fetched_at: Timestamp of last feed fetch
Unique index: idx_user_channel on (user_id, channel_id) - prevents duplicate subscriptions

video_entries table:

id: Primary key
channel_id: Foreign key to channels.id
video_id: YouTube video ID (indexed)
title: Video title
video_url: YouTube video URL (indexed)
thumbnail_url: Video thumbnail URL
description: Video description
published_at: When video was published on YouTube (indexed)
created_at: Timestamp when video was first recorded
download_status: Enum (pending, downloading, completed, failed)
download_path: Local file path to downloaded MP4
download_started_at: When download began
download_completed_at: When download finished
download_error: Error message if download failed
file_size: Size in bytes of downloaded file
Unique index: idx_video_id_channel on (video_id, channel_id) - prevents duplicates
Index: idx_channel_created on (channel_id, created_at) for fast queries
Index: idx_download_status on download_status for filtering
Index: idx_published_at on published_at for date sorting

Video Download System

The application uses Celery with Redis for asynchronous video downloads:

Download Workflow:

User triggers download via /api/download/<video_id> (POST)
VideoEntry status changes to "downloading"
Celery worker picks up task and uses yt-dlp to download as MP4
Progress updates written to database
On completion, status changes to "completed" with file path
On failure, status changes to "failed" with error message (auto-retry 3x)

yt-dlp Configuration:

Format: bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best
Output format: MP4 (converted if necessary using FFmpeg)
Output location: downloads/<video_id>_<title>.mp4
Progress hooks for real-time status updates

Requirements:

Redis server must be running (localhost:6379)
Celery worker must be running to process downloads
FFmpeg recommended for format conversion (yt-dlp will use it if available)

Environment Variables

All environment variables can be configured in .env file (see .env.example for template):

DATABASE_URL: PostgreSQL connection string (default: sqlite:///yottob.db for local dev)
CELERY_BROKER_URL: Redis URL for Celery broker (default: redis://localhost:6379/0)
CELERY_RESULT_BACKEND: Redis URL for Celery results (default: redis://localhost:6379/0)
FLASK_ENV: Flask environment (development or production)
POSTGRES_USER: PostgreSQL username (for docker-compose)
POSTGRES_PASSWORD: PostgreSQL password (for docker-compose)
POSTGRES_DB: PostgreSQL database name (for docker-compose)

Docker Compose Services

The application consists of 4 services defined in docker-compose.yml:

postgres: PostgreSQL 16 database with persistent volume
redis: Redis 7 message broker for Celery
app: Flask web application (exposed on port 5000)
celery: Celery worker for async video downloads

All services have health checks and automatic restarts configured.

Dependencies

Flask 3.1.2+: Web framework
Flask-Login 0.6.0+: User session management
bcrypt 4.0.0+: Password hashing
feedparser 6.0.12+: RSS/Atom feed parsing
SQLAlchemy 2.0.0+: ORM for database operations
psycopg2-binary 2.9.0+: PostgreSQL database driver
Alembic 1.13.0+: Database migration tool
Celery 5.3.0+: Distributed task queue for async jobs
Redis 5.0.0+: Message broker for Celery
yt-dlp 2024.0.0+: YouTube video downloader
Python 3.14+: Required runtime version

16 KiB Raw Blame History