Add SQLAlchemy ORM with Alembic migrations
- Added SQLAlchemy 2.0 and Alembic 1.13 dependencies - Created models.py with Channel and VideoEntry ORM models - Created database.py for database configuration and session management - Initialized Alembic migration system with initial migration - Updated feed_parser.py with save_to_db() method for persistence - Updated main.py with database initialization and new API routes: - /api/feed now saves to database by default - /api/channels lists all tracked channels - /api/history/<channel_id> returns video history - Updated .gitignore to exclude database files - Updated CLAUDE.md with comprehensive ORM and migration documentation Database uses SQLite (yottob.db) with upsert logic to avoid duplicates. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
6
.gitignore
vendored
6
.gitignore
vendored
@@ -8,3 +8,9 @@ wheels/
|
||||
|
||||
# Virtual environments
|
||||
.venv
|
||||
|
||||
# Database files
|
||||
*.db
|
||||
*.db-journal
|
||||
*.db-shm
|
||||
*.db-wal
|
||||
|
||||
107
CLAUDE.md
107
CLAUDE.md
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
||||
|
||||
## Project Overview
|
||||
|
||||
`yottob` is a Flask-based web application for processing YouTube RSS feeds. The project provides both a REST API and CLI interface for fetching and parsing YouTube channel feeds, with filtering logic to exclude YouTube Shorts.
|
||||
`yottob` is a Flask-based web application for processing YouTube RSS feeds with SQLAlchemy ORM persistence. The project provides both a REST API and CLI interface for fetching and parsing YouTube channel feeds, with filtering logic to exclude YouTube Shorts. All fetched feeds are automatically saved to a SQLite database for historical tracking.
|
||||
|
||||
## Development Setup
|
||||
|
||||
@@ -20,6 +20,12 @@ uv sync
|
||||
source .venv/bin/activate # On macOS/Linux
|
||||
```
|
||||
|
||||
**Initialize/update database:**
|
||||
```bash
|
||||
# Run migrations to create or update database schema
|
||||
source .venv/bin/activate && alembic upgrade head
|
||||
```
|
||||
|
||||
## Running the Application
|
||||
|
||||
**Run the CLI feed parser:**
|
||||
@@ -34,33 +40,59 @@ flask --app main run
|
||||
```
|
||||
The web server exposes:
|
||||
- `/` - Main page (renders `index.html`)
|
||||
- `/api/feed` - API endpoint for fetching feeds
|
||||
- `/api/feed` - API endpoint for fetching feeds and saving to database
|
||||
- `/api/channels` - List all tracked channels
|
||||
- `/api/history/<channel_id>` - Get video history for a specific channel
|
||||
|
||||
**API Usage Example:**
|
||||
**API Usage Examples:**
|
||||
```bash
|
||||
# Fetch default channel feed
|
||||
# Fetch default channel feed (automatically saves to DB)
|
||||
curl http://localhost:5000/api/feed
|
||||
|
||||
# Fetch specific channel (without filtering Shorts)
|
||||
curl "http://localhost:5000/api/feed?channel_id=CHANNEL_ID&filter_shorts=false"
|
||||
# Fetch specific channel with options
|
||||
curl "http://localhost:5000/api/feed?channel_id=CHANNEL_ID&filter_shorts=false&save=true"
|
||||
|
||||
# List all tracked channels
|
||||
curl http://localhost:5000/api/channels
|
||||
|
||||
# Get video history for a channel (limit 20 videos)
|
||||
curl "http://localhost:5000/api/history/CHANNEL_ID?limit=20"
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
The codebase follows a clean separation between business logic and web server:
|
||||
The codebase follows a clean layered architecture with separation of concerns:
|
||||
|
||||
### Database Layer
|
||||
**`models.py`** - SQLAlchemy ORM models
|
||||
- `Base`: Declarative base for all models
|
||||
- `Channel`: Stores YouTube channel metadata (channel_id, title, link, last_fetched)
|
||||
- `VideoEntry`: Stores individual video entries with foreign key to Channel
|
||||
- Relationships: One Channel has many VideoEntry records
|
||||
|
||||
**`database.py`** - Database configuration and session management
|
||||
- `DATABASE_URL`: SQLite database location (yottob.db)
|
||||
- `engine`: SQLAlchemy engine instance
|
||||
- `init_db()`: Creates all tables
|
||||
- `get_db_session()`: Context manager for database sessions
|
||||
|
||||
### Core Logic Layer
|
||||
**`feed_parser.py`** - Reusable YouTube feed parsing module
|
||||
- `YouTubeFeedParser`: Main parser class that encapsulates channel-specific logic
|
||||
- `FeedEntry`: Data model for individual feed entries
|
||||
- `FeedEntry`: In-memory data model for feed entries
|
||||
- `fetch_feed()`: Fetches and parses RSS feeds
|
||||
- `save_to_db()`: Persists feed data to database with upsert logic
|
||||
- Independent of Flask - can be imported and used in any Python context
|
||||
|
||||
### Web Server Layer
|
||||
**`main.py`** - Flask application and routes
|
||||
- `app`: Flask application instance (main.py:7)
|
||||
- `index()`: Homepage route handler (main.py:14)
|
||||
- `get_feed()`: REST API endpoint (main.py:20) that uses `YouTubeFeedParser`
|
||||
- `main()`: CLI entry point for testing (main.py:42)
|
||||
- `app`: Flask application instance (main.py:9)
|
||||
- Database initialization on startup (main.py:16)
|
||||
- `index()`: Homepage route handler (main.py:20)
|
||||
- `get_feed()`: REST API endpoint (main.py:26) that fetches and saves to DB
|
||||
- `get_channels()`: Lists all tracked channels (main.py:59)
|
||||
- `get_history()`: Returns video history for a channel (main.py:86)
|
||||
- `main()`: CLI entry point for testing (main.py:132)
|
||||
|
||||
### Templates
|
||||
**`templates/index.html`** - Frontend HTML (currently static placeholder)
|
||||
@@ -79,8 +111,59 @@ The `YouTubeFeedParser` class in `feed_parser.py`:
|
||||
https://www.youtube.com/feeds/videos.xml?channel_id={CHANNEL_ID}
|
||||
```
|
||||
|
||||
## Database Migrations
|
||||
|
||||
This project uses Alembic for database schema migrations.
|
||||
|
||||
**Create a new migration after model changes:**
|
||||
```bash
|
||||
source .venv/bin/activate && alembic revision --autogenerate -m "Description of changes"
|
||||
```
|
||||
|
||||
**Apply migrations:**
|
||||
```bash
|
||||
source .venv/bin/activate && alembic upgrade head
|
||||
```
|
||||
|
||||
**View migration history:**
|
||||
```bash
|
||||
source .venv/bin/activate && alembic history
|
||||
```
|
||||
|
||||
**Rollback to previous version:**
|
||||
```bash
|
||||
source .venv/bin/activate && alembic downgrade -1
|
||||
```
|
||||
|
||||
**Migration files location:** `alembic/versions/`
|
||||
|
||||
**Important notes:**
|
||||
- Always review auto-generated migrations before applying
|
||||
- The database is automatically initialized on Flask app startup via `init_db()`
|
||||
- Migration configuration is in `alembic.ini` and `alembic/env.py`
|
||||
- Models are imported in `alembic/env.py` for autogenerate support
|
||||
|
||||
## Database Schema
|
||||
|
||||
**channels table:**
|
||||
- `id`: Primary key
|
||||
- `channel_id`: YouTube channel ID (unique, indexed)
|
||||
- `title`: Channel title
|
||||
- `link`: Channel URL
|
||||
- `last_fetched`: Timestamp of last feed fetch
|
||||
|
||||
**video_entries table:**
|
||||
- `id`: Primary key
|
||||
- `channel_id`: Foreign key to channels.id
|
||||
- `title`: Video title
|
||||
- `link`: Video URL (unique)
|
||||
- `created_at`: Timestamp when video was first recorded
|
||||
- Index: `idx_channel_created` on (channel_id, created_at) for fast queries
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **Flask 3.1.2+**: Web framework
|
||||
- **feedparser 6.0.12+**: RSS/Atom feed parsing
|
||||
- **SQLAlchemy 2.0.0+**: ORM for database operations
|
||||
- **Alembic 1.13.0+**: Database migration tool
|
||||
- **Python 3.14+**: Required runtime version
|
||||
|
||||
147
alembic.ini
Normal file
147
alembic.ini
Normal file
@@ -0,0 +1,147 @@
|
||||
# A generic, single database configuration.
|
||||
|
||||
[alembic]
|
||||
# path to migration scripts.
|
||||
# this is typically a path given in POSIX (e.g. forward slashes)
|
||||
# format, relative to the token %(here)s which refers to the location of this
|
||||
# ini file
|
||||
script_location = %(here)s/alembic
|
||||
|
||||
# template used to generate migration file names; The default value is %%(rev)s_%%(slug)s
|
||||
# Uncomment the line below if you want the files to be prepended with date and time
|
||||
# see https://alembic.sqlalchemy.org/en/latest/tutorial.html#editing-the-ini-file
|
||||
# for all available tokens
|
||||
# file_template = %%(year)d_%%(month).2d_%%(day).2d_%%(hour).2d%%(minute).2d-%%(rev)s_%%(slug)s
|
||||
|
||||
# sys.path path, will be prepended to sys.path if present.
|
||||
# defaults to the current working directory. for multiple paths, the path separator
|
||||
# is defined by "path_separator" below.
|
||||
prepend_sys_path = .
|
||||
|
||||
|
||||
# timezone to use when rendering the date within the migration file
|
||||
# as well as the filename.
|
||||
# If specified, requires the tzdata library which can be installed by adding
|
||||
# `alembic[tz]` to the pip requirements.
|
||||
# string value is passed to ZoneInfo()
|
||||
# leave blank for localtime
|
||||
# timezone =
|
||||
|
||||
# max length of characters to apply to the "slug" field
|
||||
# truncate_slug_length = 40
|
||||
|
||||
# set to 'true' to run the environment during
|
||||
# the 'revision' command, regardless of autogenerate
|
||||
# revision_environment = false
|
||||
|
||||
# set to 'true' to allow .pyc and .pyo files without
|
||||
# a source .py file to be detected as revisions in the
|
||||
# versions/ directory
|
||||
# sourceless = false
|
||||
|
||||
# version location specification; This defaults
|
||||
# to <script_location>/versions. When using multiple version
|
||||
# directories, initial revisions must be specified with --version-path.
|
||||
# The path separator used here should be the separator specified by "path_separator"
|
||||
# below.
|
||||
# version_locations = %(here)s/bar:%(here)s/bat:%(here)s/alembic/versions
|
||||
|
||||
# path_separator; This indicates what character is used to split lists of file
|
||||
# paths, including version_locations and prepend_sys_path within configparser
|
||||
# files such as alembic.ini.
|
||||
# The default rendered in new alembic.ini files is "os", which uses os.pathsep
|
||||
# to provide os-dependent path splitting.
|
||||
#
|
||||
# Note that in order to support legacy alembic.ini files, this default does NOT
|
||||
# take place if path_separator is not present in alembic.ini. If this
|
||||
# option is omitted entirely, fallback logic is as follows:
|
||||
#
|
||||
# 1. Parsing of the version_locations option falls back to using the legacy
|
||||
# "version_path_separator" key, which if absent then falls back to the legacy
|
||||
# behavior of splitting on spaces and/or commas.
|
||||
# 2. Parsing of the prepend_sys_path option falls back to the legacy
|
||||
# behavior of splitting on spaces, commas, or colons.
|
||||
#
|
||||
# Valid values for path_separator are:
|
||||
#
|
||||
# path_separator = :
|
||||
# path_separator = ;
|
||||
# path_separator = space
|
||||
# path_separator = newline
|
||||
#
|
||||
# Use os.pathsep. Default configuration used for new projects.
|
||||
path_separator = os
|
||||
|
||||
# set to 'true' to search source files recursively
|
||||
# in each "version_locations" directory
|
||||
# new in Alembic version 1.10
|
||||
# recursive_version_locations = false
|
||||
|
||||
# the output encoding used when revision files
|
||||
# are written from script.py.mako
|
||||
# output_encoding = utf-8
|
||||
|
||||
# database URL. This is consumed by the user-maintained env.py script only.
|
||||
# other means of configuring database URLs may be customized within the env.py
|
||||
# file.
|
||||
sqlalchemy.url = sqlite:///yottob.db
|
||||
|
||||
|
||||
[post_write_hooks]
|
||||
# post_write_hooks defines scripts or Python functions that are run
|
||||
# on newly generated revision scripts. See the documentation for further
|
||||
# detail and examples
|
||||
|
||||
# format using "black" - use the console_scripts runner, against the "black" entrypoint
|
||||
# hooks = black
|
||||
# black.type = console_scripts
|
||||
# black.entrypoint = black
|
||||
# black.options = -l 79 REVISION_SCRIPT_FILENAME
|
||||
|
||||
# lint with attempts to fix using "ruff" - use the module runner, against the "ruff" module
|
||||
# hooks = ruff
|
||||
# ruff.type = module
|
||||
# ruff.module = ruff
|
||||
# ruff.options = check --fix REVISION_SCRIPT_FILENAME
|
||||
|
||||
# Alternatively, use the exec runner to execute a binary found on your PATH
|
||||
# hooks = ruff
|
||||
# ruff.type = exec
|
||||
# ruff.executable = ruff
|
||||
# ruff.options = check --fix REVISION_SCRIPT_FILENAME
|
||||
|
||||
# Logging configuration. This is also consumed by the user-maintained
|
||||
# env.py script only.
|
||||
[loggers]
|
||||
keys = root,sqlalchemy,alembic
|
||||
|
||||
[handlers]
|
||||
keys = console
|
||||
|
||||
[formatters]
|
||||
keys = generic
|
||||
|
||||
[logger_root]
|
||||
level = WARNING
|
||||
handlers = console
|
||||
qualname =
|
||||
|
||||
[logger_sqlalchemy]
|
||||
level = WARNING
|
||||
handlers =
|
||||
qualname = sqlalchemy.engine
|
||||
|
||||
[logger_alembic]
|
||||
level = INFO
|
||||
handlers =
|
||||
qualname = alembic
|
||||
|
||||
[handler_console]
|
||||
class = StreamHandler
|
||||
args = (sys.stderr,)
|
||||
level = NOTSET
|
||||
formatter = generic
|
||||
|
||||
[formatter_generic]
|
||||
format = %(levelname)-5.5s [%(name)s] %(message)s
|
||||
datefmt = %H:%M:%S
|
||||
1
alembic/README
Normal file
1
alembic/README
Normal file
@@ -0,0 +1 @@
|
||||
Generic single-database configuration.
|
||||
79
alembic/env.py
Normal file
79
alembic/env.py
Normal file
@@ -0,0 +1,79 @@
|
||||
from logging.config import fileConfig
|
||||
|
||||
from sqlalchemy import engine_from_config
|
||||
from sqlalchemy import pool
|
||||
|
||||
from alembic import context
|
||||
|
||||
# this is the Alembic Config object, which provides
|
||||
# access to the values within the .ini file in use.
|
||||
config = context.config
|
||||
|
||||
# Interpret the config file for Python logging.
|
||||
# This line sets up loggers basically.
|
||||
if config.config_file_name is not None:
|
||||
fileConfig(config.config_file_name)
|
||||
|
||||
# add your model's MetaData object here
|
||||
# for 'autogenerate' support
|
||||
# from myapp import mymodel
|
||||
# target_metadata = mymodel.Base.metadata
|
||||
from models import Base
|
||||
target_metadata = Base.metadata
|
||||
|
||||
# other values from the config, defined by the needs of env.py,
|
||||
# can be acquired:
|
||||
# my_important_option = config.get_main_option("my_important_option")
|
||||
# ... etc.
|
||||
|
||||
|
||||
def run_migrations_offline() -> None:
|
||||
"""Run migrations in 'offline' mode.
|
||||
|
||||
This configures the context with just a URL
|
||||
and not an Engine, though an Engine is acceptable
|
||||
here as well. By skipping the Engine creation
|
||||
we don't even need a DBAPI to be available.
|
||||
|
||||
Calls to context.execute() here emit the given string to the
|
||||
script output.
|
||||
|
||||
"""
|
||||
url = config.get_main_option("sqlalchemy.url")
|
||||
context.configure(
|
||||
url=url,
|
||||
target_metadata=target_metadata,
|
||||
literal_binds=True,
|
||||
dialect_opts={"paramstyle": "named"},
|
||||
)
|
||||
|
||||
with context.begin_transaction():
|
||||
context.run_migrations()
|
||||
|
||||
|
||||
def run_migrations_online() -> None:
|
||||
"""Run migrations in 'online' mode.
|
||||
|
||||
In this scenario we need to create an Engine
|
||||
and associate a connection with the context.
|
||||
|
||||
"""
|
||||
connectable = engine_from_config(
|
||||
config.get_section(config.config_ini_section, {}),
|
||||
prefix="sqlalchemy.",
|
||||
poolclass=pool.NullPool,
|
||||
)
|
||||
|
||||
with connectable.connect() as connection:
|
||||
context.configure(
|
||||
connection=connection, target_metadata=target_metadata
|
||||
)
|
||||
|
||||
with context.begin_transaction():
|
||||
context.run_migrations()
|
||||
|
||||
|
||||
if context.is_offline_mode():
|
||||
run_migrations_offline()
|
||||
else:
|
||||
run_migrations_online()
|
||||
28
alembic/script.py.mako
Normal file
28
alembic/script.py.mako
Normal file
@@ -0,0 +1,28 @@
|
||||
"""${message}
|
||||
|
||||
Revision ID: ${up_revision}
|
||||
Revises: ${down_revision | comma,n}
|
||||
Create Date: ${create_date}
|
||||
|
||||
"""
|
||||
from typing import Sequence, Union
|
||||
|
||||
from alembic import op
|
||||
import sqlalchemy as sa
|
||||
${imports if imports else ""}
|
||||
|
||||
# revision identifiers, used by Alembic.
|
||||
revision: str = ${repr(up_revision)}
|
||||
down_revision: Union[str, Sequence[str], None] = ${repr(down_revision)}
|
||||
branch_labels: Union[str, Sequence[str], None] = ${repr(branch_labels)}
|
||||
depends_on: Union[str, Sequence[str], None] = ${repr(depends_on)}
|
||||
|
||||
|
||||
def upgrade() -> None:
|
||||
"""Upgrade schema."""
|
||||
${upgrades if upgrades else "pass"}
|
||||
|
||||
|
||||
def downgrade() -> None:
|
||||
"""Downgrade schema."""
|
||||
${downgrades if downgrades else "pass"}
|
||||
@@ -0,0 +1,54 @@
|
||||
"""Initial migration: Channel and VideoEntry tables
|
||||
|
||||
Revision ID: 270efe6976bc
|
||||
Revises:
|
||||
Create Date: 2025-11-26 13:55:52.270543
|
||||
|
||||
"""
|
||||
from typing import Sequence, Union
|
||||
|
||||
from alembic import op
|
||||
import sqlalchemy as sa
|
||||
|
||||
|
||||
# revision identifiers, used by Alembic.
|
||||
revision: str = '270efe6976bc'
|
||||
down_revision: Union[str, Sequence[str], None] = None
|
||||
branch_labels: Union[str, Sequence[str], None] = None
|
||||
depends_on: Union[str, Sequence[str], None] = None
|
||||
|
||||
|
||||
def upgrade() -> None:
|
||||
"""Upgrade schema."""
|
||||
# ### commands auto generated by Alembic - please adjust! ###
|
||||
op.create_table('channels',
|
||||
sa.Column('id', sa.Integer(), nullable=False),
|
||||
sa.Column('channel_id', sa.String(length=50), nullable=False),
|
||||
sa.Column('title', sa.String(length=200), nullable=False),
|
||||
sa.Column('link', sa.String(length=500), nullable=False),
|
||||
sa.Column('last_fetched', sa.DateTime(), nullable=False),
|
||||
sa.PrimaryKeyConstraint('id')
|
||||
)
|
||||
op.create_index(op.f('ix_channels_channel_id'), 'channels', ['channel_id'], unique=True)
|
||||
op.create_table('video_entries',
|
||||
sa.Column('id', sa.Integer(), nullable=False),
|
||||
sa.Column('channel_id', sa.Integer(), nullable=False),
|
||||
sa.Column('title', sa.String(length=500), nullable=False),
|
||||
sa.Column('link', sa.String(length=500), nullable=False),
|
||||
sa.Column('created_at', sa.DateTime(), nullable=False),
|
||||
sa.ForeignKeyConstraint(['channel_id'], ['channels.id'], ),
|
||||
sa.PrimaryKeyConstraint('id'),
|
||||
sa.UniqueConstraint('link')
|
||||
)
|
||||
op.create_index('idx_channel_created', 'video_entries', ['channel_id', 'created_at'], unique=False)
|
||||
# ### end Alembic commands ###
|
||||
|
||||
|
||||
def downgrade() -> None:
|
||||
"""Downgrade schema."""
|
||||
# ### commands auto generated by Alembic - please adjust! ###
|
||||
op.drop_index('idx_channel_created', table_name='video_entries')
|
||||
op.drop_table('video_entries')
|
||||
op.drop_index(op.f('ix_channels_channel_id'), table_name='channels')
|
||||
op.drop_table('channels')
|
||||
# ### end Alembic commands ###
|
||||
71
database.py
Normal file
71
database.py
Normal file
@@ -0,0 +1,71 @@
|
||||
"""Database configuration and session management."""
|
||||
|
||||
from contextlib import contextmanager
|
||||
from typing import Generator
|
||||
|
||||
from sqlalchemy import create_engine
|
||||
from sqlalchemy.orm import sessionmaker, Session
|
||||
|
||||
from models import Base
|
||||
|
||||
|
||||
# Database configuration
|
||||
DATABASE_URL = "sqlite:///yottob.db"
|
||||
|
||||
# Create engine
|
||||
engine = create_engine(
|
||||
DATABASE_URL,
|
||||
echo=False, # Set to True for SQL query logging
|
||||
connect_args={"check_same_thread": False} # Needed for SQLite
|
||||
)
|
||||
|
||||
# Session factory
|
||||
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
|
||||
|
||||
|
||||
def init_db() -> None:
|
||||
"""Initialize the database by creating all tables.
|
||||
|
||||
This function should be called when the application starts.
|
||||
It creates all tables defined in the models if they don't exist.
|
||||
"""
|
||||
Base.metadata.create_all(bind=engine)
|
||||
|
||||
|
||||
@contextmanager
|
||||
def get_db_session() -> Generator[Session, None, None]:
|
||||
"""Context manager for database sessions.
|
||||
|
||||
Usage:
|
||||
with get_db_session() as session:
|
||||
# Use session here
|
||||
session.query(...)
|
||||
|
||||
The session is automatically committed on success and rolled back on error.
|
||||
"""
|
||||
session = SessionLocal()
|
||||
try:
|
||||
yield session
|
||||
session.commit()
|
||||
except Exception:
|
||||
session.rollback()
|
||||
raise
|
||||
finally:
|
||||
session.close()
|
||||
|
||||
|
||||
def get_db() -> Generator[Session, None, None]:
|
||||
"""Dependency function for Flask routes to get a database session.
|
||||
|
||||
Usage in Flask:
|
||||
session = next(get_db())
|
||||
try:
|
||||
# Use session
|
||||
finally:
|
||||
session.close()
|
||||
"""
|
||||
db = SessionLocal()
|
||||
try:
|
||||
yield db
|
||||
finally:
|
||||
db.close()
|
||||
@@ -4,9 +4,15 @@ This module handles fetching and parsing YouTube channel RSS feeds,
|
||||
with filtering capabilities to exclude unwanted content like Shorts.
|
||||
"""
|
||||
|
||||
from datetime import datetime
|
||||
import feedparser
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
from sqlalchemy.orm import Session
|
||||
from sqlalchemy.exc import IntegrityError
|
||||
|
||||
from models import Channel, VideoEntry
|
||||
|
||||
|
||||
class FeedEntry:
|
||||
"""Represents a single entry in a YouTube RSS feed."""
|
||||
@@ -66,3 +72,58 @@ class YouTubeFeedParser:
|
||||
"feed_link": feed.feed.link,
|
||||
"entries": [entry.to_dict() for entry in entries]
|
||||
}
|
||||
|
||||
def save_to_db(self, db_session: Session, feed_data: Dict) -> Channel:
|
||||
"""Save feed data to the database.
|
||||
|
||||
Args:
|
||||
db_session: SQLAlchemy database session
|
||||
feed_data: Dictionary containing feed metadata and entries (from fetch_feed)
|
||||
|
||||
Returns:
|
||||
The Channel model instance
|
||||
|
||||
This method uses upsert logic:
|
||||
- Updates existing channel if it exists
|
||||
- Creates new channel if it doesn't exist
|
||||
- Only inserts new video entries (ignores duplicates)
|
||||
"""
|
||||
# Get or create channel
|
||||
channel = db_session.query(Channel).filter_by(
|
||||
channel_id=self.channel_id
|
||||
).first()
|
||||
|
||||
if channel:
|
||||
# Update existing channel
|
||||
channel.title = feed_data["feed_title"]
|
||||
channel.link = feed_data["feed_link"]
|
||||
channel.last_fetched = datetime.utcnow()
|
||||
else:
|
||||
# Create new channel
|
||||
channel = Channel(
|
||||
channel_id=self.channel_id,
|
||||
title=feed_data["feed_title"],
|
||||
link=feed_data["feed_link"],
|
||||
last_fetched=datetime.utcnow()
|
||||
)
|
||||
db_session.add(channel)
|
||||
db_session.flush() # Get the channel ID
|
||||
|
||||
# Add video entries (ignore duplicates)
|
||||
for entry_data in feed_data["entries"]:
|
||||
# Check if video already exists
|
||||
existing = db_session.query(VideoEntry).filter_by(
|
||||
link=entry_data["link"]
|
||||
).first()
|
||||
|
||||
if not existing:
|
||||
video = VideoEntry(
|
||||
channel_id=channel.id,
|
||||
title=entry_data["title"],
|
||||
link=entry_data["link"],
|
||||
created_at=datetime.utcnow()
|
||||
)
|
||||
db_session.add(video)
|
||||
|
||||
db_session.commit()
|
||||
return channel
|
||||
|
||||
92
main.py
92
main.py
@@ -2,6 +2,8 @@
|
||||
|
||||
from flask import Flask, render_template, request, jsonify
|
||||
from feed_parser import YouTubeFeedParser
|
||||
from database import init_db, get_db_session
|
||||
from models import Channel, VideoEntry
|
||||
|
||||
|
||||
app = Flask(__name__)
|
||||
@@ -10,6 +12,11 @@ app = Flask(__name__)
|
||||
DEFAULT_CHANNEL_ID = "UCtTWOND3uyl4tVc_FarDmpw"
|
||||
|
||||
|
||||
# Initialize database on app startup
|
||||
with app.app_context():
|
||||
init_db()
|
||||
|
||||
|
||||
@app.route("/", methods=["GET"])
|
||||
def index():
|
||||
"""Render the main page."""
|
||||
@@ -18,17 +25,19 @@ def index():
|
||||
|
||||
@app.route("/api/feed", methods=["GET"])
|
||||
def get_feed():
|
||||
"""API endpoint to fetch YouTube channel feed.
|
||||
"""API endpoint to fetch YouTube channel feed and save to database.
|
||||
|
||||
Query parameters:
|
||||
channel_id: YouTube channel ID (optional, uses default if not provided)
|
||||
filter_shorts: Whether to filter out Shorts (default: true)
|
||||
save: Whether to save to database (default: true)
|
||||
|
||||
Returns:
|
||||
JSON response with feed data or error message
|
||||
"""
|
||||
channel_id = request.args.get("channel_id", DEFAULT_CHANNEL_ID)
|
||||
filter_shorts = request.args.get("filter_shorts", "true").lower() == "true"
|
||||
save_to_db = request.args.get("save", "true").lower() == "true"
|
||||
|
||||
parser = YouTubeFeedParser(channel_id)
|
||||
result = parser.fetch_feed(filter_shorts=filter_shorts)
|
||||
@@ -36,9 +45,90 @@ def get_feed():
|
||||
if result is None:
|
||||
return jsonify({"error": "Failed to fetch feed"}), 500
|
||||
|
||||
# Save to database if requested
|
||||
if save_to_db:
|
||||
try:
|
||||
with get_db_session() as session:
|
||||
parser.save_to_db(session, result)
|
||||
except Exception as e:
|
||||
return jsonify({"error": f"Failed to save to database: {str(e)}"}), 500
|
||||
|
||||
return jsonify(result)
|
||||
|
||||
|
||||
@app.route("/api/channels", methods=["GET"])
|
||||
def get_channels():
|
||||
"""API endpoint to list all tracked channels.
|
||||
|
||||
Returns:
|
||||
JSON response with list of channels
|
||||
"""
|
||||
try:
|
||||
with get_db_session() as session:
|
||||
channels = session.query(Channel).all()
|
||||
return jsonify({
|
||||
"channels": [
|
||||
{
|
||||
"id": ch.id,
|
||||
"channel_id": ch.channel_id,
|
||||
"title": ch.title,
|
||||
"link": ch.link,
|
||||
"last_fetched": ch.last_fetched.isoformat(),
|
||||
"video_count": len(ch.videos)
|
||||
}
|
||||
for ch in channels
|
||||
]
|
||||
})
|
||||
except Exception as e:
|
||||
return jsonify({"error": f"Failed to fetch channels: {str(e)}"}), 500
|
||||
|
||||
|
||||
@app.route("/api/history/<channel_id>", methods=["GET"])
|
||||
def get_history(channel_id: str):
|
||||
"""API endpoint to get video history for a specific channel.
|
||||
|
||||
Args:
|
||||
channel_id: YouTube channel ID
|
||||
|
||||
Query parameters:
|
||||
limit: Maximum number of videos to return (default: 50)
|
||||
|
||||
Returns:
|
||||
JSON response with channel info and video history
|
||||
"""
|
||||
limit = request.args.get("limit", "50")
|
||||
try:
|
||||
limit = int(limit)
|
||||
except ValueError:
|
||||
limit = 50
|
||||
|
||||
try:
|
||||
with get_db_session() as session:
|
||||
channel = session.query(Channel).filter_by(
|
||||
channel_id=channel_id
|
||||
).first()
|
||||
|
||||
if not channel:
|
||||
return jsonify({"error": "Channel not found"}), 404
|
||||
|
||||
videos = session.query(VideoEntry).filter_by(
|
||||
channel_id=channel.id
|
||||
).order_by(VideoEntry.created_at.desc()).limit(limit).all()
|
||||
|
||||
return jsonify({
|
||||
"channel": {
|
||||
"channel_id": channel.channel_id,
|
||||
"title": channel.title,
|
||||
"link": channel.link,
|
||||
"last_fetched": channel.last_fetched.isoformat()
|
||||
},
|
||||
"videos": [video.to_dict() for video in videos],
|
||||
"total_videos": len(channel.videos)
|
||||
})
|
||||
except Exception as e:
|
||||
return jsonify({"error": f"Failed to fetch history: {str(e)}"}), 500
|
||||
|
||||
|
||||
def main():
|
||||
"""CLI entry point for testing feed parser."""
|
||||
parser = YouTubeFeedParser(DEFAULT_CHANNEL_ID)
|
||||
|
||||
62
models.py
Normal file
62
models.py
Normal file
@@ -0,0 +1,62 @@
|
||||
"""Database models for YouTube feed storage."""
|
||||
|
||||
from datetime import datetime
|
||||
from typing import List
|
||||
|
||||
from sqlalchemy import String, DateTime, ForeignKey, Index
|
||||
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship
|
||||
|
||||
|
||||
class Base(DeclarativeBase):
|
||||
"""Base class for all database models."""
|
||||
pass
|
||||
|
||||
|
||||
class Channel(Base):
|
||||
"""YouTube channel model."""
|
||||
|
||||
__tablename__ = "channels"
|
||||
|
||||
id: Mapped[int] = mapped_column(primary_key=True)
|
||||
channel_id: Mapped[str] = mapped_column(String(50), unique=True, nullable=False, index=True)
|
||||
title: Mapped[str] = mapped_column(String(200), nullable=False)
|
||||
link: Mapped[str] = mapped_column(String(500), nullable=False)
|
||||
last_fetched: Mapped[datetime] = mapped_column(DateTime, nullable=False, default=datetime.utcnow)
|
||||
|
||||
# Relationship to video entries
|
||||
videos: Mapped[List["VideoEntry"]] = relationship("VideoEntry", back_populates="channel", cascade="all, delete-orphan")
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return f"<Channel(id={self.id}, channel_id='{self.channel_id}', title='{self.title}')>"
|
||||
|
||||
|
||||
class VideoEntry(Base):
|
||||
"""YouTube video entry model."""
|
||||
|
||||
__tablename__ = "video_entries"
|
||||
|
||||
id: Mapped[int] = mapped_column(primary_key=True)
|
||||
channel_id: Mapped[int] = mapped_column(ForeignKey("channels.id"), nullable=False)
|
||||
title: Mapped[str] = mapped_column(String(500), nullable=False)
|
||||
link: Mapped[str] = mapped_column(String(500), unique=True, nullable=False)
|
||||
created_at: Mapped[datetime] = mapped_column(DateTime, nullable=False, default=datetime.utcnow)
|
||||
|
||||
# Relationship to channel
|
||||
channel: Mapped["Channel"] = relationship("Channel", back_populates="videos")
|
||||
|
||||
# Index for faster queries
|
||||
__table_args__ = (
|
||||
Index('idx_channel_created', 'channel_id', 'created_at'),
|
||||
)
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return f"<VideoEntry(id={self.id}, title='{self.title}', link='{self.link}')>"
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Convert to dictionary for API responses."""
|
||||
return {
|
||||
"id": self.id,
|
||||
"title": self.title,
|
||||
"link": self.link,
|
||||
"created_at": self.created_at.isoformat()
|
||||
}
|
||||
@@ -5,6 +5,8 @@ description = "Add your description here"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.14"
|
||||
dependencies = [
|
||||
"alembic>=1.13.0",
|
||||
"feedparser>=6.0.12",
|
||||
"flask>=3.1.2",
|
||||
"sqlalchemy>=2.0.0",
|
||||
]
|
||||
|
||||
69
uv.lock
generated
69
uv.lock
generated
@@ -2,6 +2,20 @@ version = 1
|
||||
revision = 2
|
||||
requires-python = ">=3.14"
|
||||
|
||||
[[package]]
|
||||
name = "alembic"
|
||||
version = "1.17.2"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "mako" },
|
||||
{ name = "sqlalchemy" },
|
||||
{ name = "typing-extensions" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/02/a6/74c8cadc2882977d80ad756a13857857dbcf9bd405bc80b662eb10651282/alembic-1.17.2.tar.gz", hash = "sha256:bbe9751705c5e0f14877f02d46c53d10885e377e3d90eda810a016f9baa19e8e", size = 1988064, upload-time = "2025-11-14T20:35:04.057Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/ba/88/6237e97e3385b57b5f1528647addea5cc03d4d65d5979ab24327d41fb00d/alembic-1.17.2-py3-none-any.whl", hash = "sha256:f483dd1fe93f6c5d49217055e4d15b905b425b6af906746abb35b69c1996c4e6", size = 248554, upload-time = "2025-11-14T20:35:05.699Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "blinker"
|
||||
version = "1.9.0"
|
||||
@@ -61,6 +75,23 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/ec/f9/7f9263c5695f4bd0023734af91bedb2ff8209e8de6ead162f35d8dc762fd/flask-3.1.2-py3-none-any.whl", hash = "sha256:ca1d8112ec8a6158cc29ea4858963350011b5c846a414cdb7a954aa9e967d03c", size = 103308, upload-time = "2025-08-19T21:03:19.499Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "greenlet"
|
||||
version = "3.2.4"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/03/b8/704d753a5a45507a7aab61f18db9509302ed3d0a27ac7e0359ec2905b1a6/greenlet-3.2.4.tar.gz", hash = "sha256:0dca0d95ff849f9a364385f36ab49f50065d76964944638be9691e1832e9f86d", size = 188260, upload-time = "2025-08-07T13:24:33.51Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/22/5c/85273fd7cc388285632b0498dbbab97596e04b154933dfe0f3e68156c68c/greenlet-3.2.4-cp314-cp314-macosx_11_0_universal2.whl", hash = "sha256:49a30d5fda2507ae77be16479bdb62a660fa51b1eb4928b524975b3bde77b3c0", size = 273586, upload-time = "2025-08-07T13:16:08.004Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/d1/75/10aeeaa3da9332c2e761e4c50d4c3556c21113ee3f0afa2cf5769946f7a3/greenlet-3.2.4-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:299fd615cd8fc86267b47597123e3f43ad79c9d8a22bebdce535e53550763e2f", size = 686346, upload-time = "2025-08-07T13:42:59.944Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/c0/aa/687d6b12ffb505a4447567d1f3abea23bd20e73a5bed63871178e0831b7a/greenlet-3.2.4-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.whl", hash = "sha256:c17b6b34111ea72fc5a4e4beec9711d2226285f0386ea83477cbb97c30a3f3a5", size = 699218, upload-time = "2025-08-07T13:45:30.969Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/dc/8b/29aae55436521f1d6f8ff4e12fb676f3400de7fcf27fccd1d4d17fd8fecd/greenlet-3.2.4-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.whl", hash = "sha256:b4a1870c51720687af7fa3e7cda6d08d801dae660f75a76f3845b642b4da6ee1", size = 694659, upload-time = "2025-08-07T13:53:17.759Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/92/2e/ea25914b1ebfde93b6fc4ff46d6864564fba59024e928bdc7de475affc25/greenlet-3.2.4-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:061dc4cf2c34852b052a8620d40f36324554bc192be474b9e9770e8c042fd735", size = 695355, upload-time = "2025-08-07T13:18:34.517Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/72/60/fc56c62046ec17f6b0d3060564562c64c862948c9d4bc8aa807cf5bd74f4/greenlet-3.2.4-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:44358b9bf66c8576a9f57a590d5f5d6e72fa4228b763d0e43fee6d3b06d3a337", size = 657512, upload-time = "2025-08-07T13:18:33.969Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/23/6e/74407aed965a4ab6ddd93a7ded3180b730d281c77b765788419484cdfeef/greenlet-3.2.4-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:2917bdf657f5859fbf3386b12d68ede4cf1f04c90c3a6bc1f013dd68a22e2269", size = 1612508, upload-time = "2025-11-04T12:42:23.427Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/0d/da/343cd760ab2f92bac1845ca07ee3faea9fe52bee65f7bcb19f16ad7de08b/greenlet-3.2.4-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:015d48959d4add5d6c9f6c5210ee3803a830dce46356e3bc326d6776bde54681", size = 1680760, upload-time = "2025-11-04T12:42:25.341Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/e3/a5/6ddab2b4c112be95601c13428db1d8b6608a8b6039816f2ba09c346c08fc/greenlet-3.2.4-cp314-cp314-win_amd64.whl", hash = "sha256:e37ab26028f12dbb0ff65f29a8d3d44a765c61e729647bf2ddfbbed621726f01", size = 303425, upload-time = "2025-08-07T13:32:27.59Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "itsdangerous"
|
||||
version = "2.2.0"
|
||||
@@ -82,6 +113,18 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/62/a1/3d680cbfd5f4b8f15abc1d571870c5fc3e594bb582bc3b64ea099db13e56/jinja2-3.1.6-py3-none-any.whl", hash = "sha256:85ece4451f492d0c13c5dd7c13a64681a86afae63a5f347908daf103ce6d2f67", size = 134899, upload-time = "2025-03-05T20:05:00.369Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "mako"
|
||||
version = "1.3.10"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "markupsafe" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/9e/38/bd5b78a920a64d708fe6bc8e0a2c075e1389d53bef8413725c63ba041535/mako-1.3.10.tar.gz", hash = "sha256:99579a6f39583fa7e5630a28c3c1f440e4e97a414b80372649c0ce338da2ea28", size = 392474, upload-time = "2025-04-10T12:44:31.16Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/87/fb/99f81ac72ae23375f22b7afdb7642aba97c00a713c217124420147681a2f/mako-1.3.10-py3-none-any.whl", hash = "sha256:baef24a52fc4fc514a0887ac600f9f1cff3d82c61d4d700a1fa84d597b88db59", size = 78509, upload-time = "2025-04-10T12:50:53.297Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "markupsafe"
|
||||
version = "3.0.3"
|
||||
@@ -118,6 +161,28 @@ version = "1.0.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/9e/bd/3704a8c3e0942d711c1299ebf7b9091930adae6675d7c8f476a7ce48653c/sgmllib3k-1.0.0.tar.gz", hash = "sha256:7868fb1c8bfa764c1ac563d3cf369c381d1325d36124933a726f29fcdaa812e9", size = 5750, upload-time = "2010-08-24T14:33:52.445Z" }
|
||||
|
||||
[[package]]
|
||||
name = "sqlalchemy"
|
||||
version = "2.0.44"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "greenlet", marker = "platform_machine == 'AMD64' or platform_machine == 'WIN32' or platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'ppc64le' or platform_machine == 'win32' or platform_machine == 'x86_64'" },
|
||||
{ name = "typing-extensions" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/f0/f2/840d7b9496825333f532d2e3976b8eadbf52034178aac53630d09fe6e1ef/sqlalchemy-2.0.44.tar.gz", hash = "sha256:0ae7454e1ab1d780aee69fd2aae7d6b8670a581d8847f2d1e0f7ddfbf47e5a22", size = 9819830, upload-time = "2025-10-10T14:39:12.935Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/9c/5e/6a29fa884d9fb7ddadf6b69490a9d45fded3b38541713010dad16b77d015/sqlalchemy-2.0.44-py3-none-any.whl", hash = "sha256:19de7ca1246fbef9f9d1bff8f1ab25641569df226364a0e40457dc5457c54b05", size = 1928718, upload-time = "2025-10-10T15:29:45.32Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "typing-extensions"
|
||||
version = "4.15.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/72/94/1a15dd82efb362ac84269196e94cf00f187f7ed21c242792a923cdb1c61f/typing_extensions-4.15.0.tar.gz", hash = "sha256:0cea48d173cc12fa28ecabc3b837ea3cf6f38c6d1136f85cbaaf598984861466", size = 109391, upload-time = "2025-08-25T13:49:26.313Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/18/67/36e9267722cc04a6b9f15c7f3441c2363321a3ea07da7ae0c0707beb2a9c/typing_extensions-4.15.0-py3-none-any.whl", hash = "sha256:f0fa19c6845758ab08074a0cfa8b7aecb71c999ca73d62883bc25cc018c4e548", size = 44614, upload-time = "2025-08-25T13:49:24.86Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "werkzeug"
|
||||
version = "3.1.3"
|
||||
@@ -135,12 +200,16 @@ name = "yottob"
|
||||
version = "0.1.0"
|
||||
source = { virtual = "." }
|
||||
dependencies = [
|
||||
{ name = "alembic" },
|
||||
{ name = "feedparser" },
|
||||
{ name = "flask" },
|
||||
{ name = "sqlalchemy" },
|
||||
]
|
||||
|
||||
[package.metadata]
|
||||
requires-dist = [
|
||||
{ name = "alembic", specifier = ">=1.13.0" },
|
||||
{ name = "feedparser", specifier = ">=6.0.12" },
|
||||
{ name = "flask", specifier = ">=3.1.2" },
|
||||
{ name = "sqlalchemy", specifier = ">=2.0.0" },
|
||||
]
|
||||
|
||||
Reference in New Issue
Block a user