The open-source alternative to Opus Clip, Vidyo.ai, Klap, SubMagic, 2short.ai, and other AI clipping tools. Drop in any long-form YouTube video and get back ranked, viral-ready 9:16 shorts — for free, with no per-clip credits, no watermarks, and full control over the highlight algorithm.
Built for creators, agencies, and developers who don't want to pay $20–$300/month or be capped on minutes processed. Uses GPT-class LLM highlight detection and Whisper transcription to extract the most viral-worthy moments and auto-crop them vertically for TikTok, Reels, and Shorts.
Building your own Opus Clip–style SaaS? Skip the infra and ship on the same APIs that power this repo:
- AI Clipping API — end-to-end clip selection + render
- Auto-Crop API — vertical reframing only
| This repo | Opus Clip / Vidyo.ai / Klap / SubMagic | |
|---|---|---|
| Price | Free + open source (pay only for API usage) | $20–$300/month subscriptions |
| Per-clip credits | None — process unlimited videos | Monthly minute caps, overage fees |
| Watermarks | Never | On free tiers |
| Highlight algorithm | Fully editable virality framework | Black box |
| Output format | Any aspect ratio, any resolution | Locked presets |
| Batch processing | xargs an entire URL list |
Manual upload one-by-one |
| JSON / API output | Built-in (--output-json) |
Limited or paid tier only |
| Self-hostable | Yes — runs on your machine or server | SaaS only, your videos sit on their servers |
| White-label / embeddable | Yes — MIT licensed, import as Python lib | No |
- 🎬 YouTube In, Vertical Out: Hand it any YouTube URL — get back N viral-ready 9:16 mp4s
- 🔀 Two Modes — API (fast) or Local (offline): Default
--mode apiuses MuAPI for download/transcription/cropping;--mode localruns entirely on your machine withyt-dlp,faster-whisper, andffmpeg/opencv, and lets you pick OpenAI or Gemini for highlight ranking - 🤖 Virality-Aware Highlight Selection: Clips ranked on hooks, emotional peaks, opinion bombs, revelation moments, conflict, quotable lines, story peaks, and practical value — not just generic "interesting"
- 📈 Score + Hook + Reason for Every Clip: Each highlight comes with a viral score, an opening hook line, and a one-sentence explanation of why it works
- 🎤 Whisper Transcription, Your Choice: Cloud (
/openai-whispervia MuAPI) or local (faster-whisper, CPU or CUDA) — same downstream output shape - 🧩 Long-Video Aware: Videos over 30 minutes are auto-chunked with overlap so nothing gets missed
- ♻️ Smart Dedupe: Overlapping highlights are collapsed by score so you never get two near-duplicate clips
- 🎯 Smart Vertical Crop: API mode uses MuAPI's auto-crop; local mode runs OpenCV face tracking with motion smoothing
- 📱 Any Aspect Ratio: 9:16 for TikTok/Reels/Shorts, 1:1 for square, anything else by flag
- 🧰 CLI + Python Library: Use it from the shell or import
generate_shorts(...)into your own pipeline - 📦 JSON Output:
--output-jsondumps the full result (transcript + every candidate highlight + final clip URLs/paths) for downstream automation
Don't want to self-host? The AI Clipping API gives you the same Opus Clip–style pipeline as a single HTTP call — no Python, no dependencies, pay-per-clip instead of monthly subscriptions.
- Python 3.10+
- For API mode (default): a MuAPI key — powers download, transcription, highlight ranking, and clipping in a single dependency
- For Local mode (
--mode local):ffmpegon your PATH and an LLM API key (OPENAI_API_KEYorGEMINI_API_KEY; only the LLM step is remote)
-
Clone the repository:
git clone https://github.com/SamurAIGPT/AI-Youtube-Shorts-Generator.git cd AI-Youtube-Shorts-Generator -
Create and activate a virtual environment:
python3.10 -m venv venv source venv/bin/activate -
Install Python dependencies:
pip install -r requirements.txt # Only if you plan to use --mode local: pip install -r requirements-local.txt -
Set up environment variables:
Create a
.envfile in the project root:# API mode (default) MUAPI_API_KEY=your_muapi_key_here # Local mode (--mode local) LLM_PROVIDER=openai # openai or gemini OPENAI_API_KEY=your_openai_key_here OPENAI_MODEL=gpt-4o-mini # optional, default gpt-4o-mini GEMINI_API_KEY=your_gemini_key_here GEMINI_MODEL=gemini-2.5-flash # optional, default gemini-2.5-flash LOCAL_WHISPER_MODEL=base # tiny / base / small / medium / large-v3 LOCAL_WHISPER_DEVICE=auto # auto / cpu / cuda LOCAL_OUTPUT_DIR=output # where local mp4s land
python main.py "https://www.youtube.com/watch?v=VIDEO_ID"python main.py "https://www.youtube.com/watch?v=VIDEO_ID" --mode localLocal mode writes the rendered shorts to ./output/short_01.mp4, short_02.mp4, … (override with LOCAL_OUTPUT_DIR).
python main.py "https://www.youtube.com/watch?v=VIDEO_ID" \
--mode api \
--num-clips 5 \
--aspect-ratio 9:16 \
--output-json result.jsonIn --mode local, you can pass a file:// URL or a direct filesystem path and skip YouTube entirely:
python main.py "/Users/you/Videos/input.mp4" --mode local
python main.py "file:///Users/you/Videos/input.mp4" --mode localThe Python API works the same way:
from shorts_generator import generate_shorts
result = generate_shorts(
"/Users/you/Videos/input.mp4",
num_clips=5,
aspect_ratio="9:16",
mode="local",
)
for short in result["shorts"]:
print(short["score"], short["title"], short["clip_url"])Local transcription is cached as an .srt file in LOCAL_OUTPUT_DIR using the
video's base name. If the cache already exists and is newer than the source
file, the app reuses it instead of running Whisper again.
Local downloads are also cached in LOCAL_OUTPUT_DIR as
source_<youtube_id>.mp4 when the input is a YouTube URL. If that file already
exists, the app skips yt-dlp and reuses the cached video.
Create a urls.txt file with one URL per line, then:
xargs -a urls.txt -I{} python main.py "{}"| Flag | Default | Notes |
|---|---|---|
--mode |
api |
api (MuAPI, fast, no setup) or local (remote URL, file://, or local path + faster-whisper + LLM provider + ffmpeg) |
--num-clips |
3 |
How many shorts to render |
--aspect-ratio |
9:16 |
Any ratio; 9:16 for TikTok/Reels, 1:1 for square |
--format |
720 |
Source download resolution: 360 / 480 / 720 / 1080 |
--language |
auto | Force Whisper language code (e.g. en) |
--output-json |
— | Dump the full result (transcript + all candidates) to a file |
| Step | API mode (--mode api) |
Local mode (--mode local) |
|---|---|---|
| Download | MuAPI /youtube-download |
yt-dlp for remote URLs, direct file path for local inputs |
| Transcription | MuAPI /openai-whisper |
faster-whisper (CPU or CUDA) |
| Highlight LLM | MuAPI gpt-5-mini |
LLM_PROVIDER=openai uses OpenAI (gpt-4o-mini by default), LLM_PROVIDER=gemini uses Gemini (gemini-2.5-flash by default) |
| Vertical crop | MuAPI /autocrop |
ffmpeg + OpenCV face tracking |
| Output | hosted URLs | local mp4 paths |
| Required keys | MUAPI_API_KEY |
OPENAI_API_KEY or GEMINI_API_KEY (+ ffmpeg on PATH) |
- Download: Fetches the source video from YouTube
- Transcribe: MuAPI
/openai-whisperproduces a timestamped transcript (verbose_json segments) - Detect content type: An LLM classifies the video (podcast, interview, tutorial, vlog, etc.) and density, so the prompt can be tuned per content style
- Long-video chunking: Videos > 30 min are split into 20-min overlapping chunks
- Highlight ranking: An LLM scans the transcript through a virality framework — hook moments, emotional peaks, opinion bombs, revelations, conflict, quotables, story peaks, practical value — and emits ranked candidates with scores 0–100
- Dedupe: Overlapping candidates are collapsed by score (>50% overlap → keep the higher score)
- Top-N selection: The top
--num-clipscandidates are selected - Auto-crop: Each highlight is rendered as a vertical short at the requested aspect ratio
Output: a list of mp4 URLs plus, for each clip, its title, viral score, hook sentence, and a one-line reason explaining why it should perform.
Console output looks like:
========================================================================
Highlights: 7 candidates → kept top 3
========================================================================
#1 score=92 124.3s → 187.6s
title: The one mistake that cost me $50K
hook: "Nobody talks about this, but it killed my first startup..."
clip: https://.../short_1.mp4
#2 score=88 ...
--output-json result.json produces:
{
"source_video_url": "...",
"transcript": { "duration": 1873.4, "segments": [...] },
"highlights": [ {...}, {...}, ... ],
"shorts": [
{
"title": "...",
"start_time": 124.3,
"end_time": 187.6,
"score": 92,
"hook_sentence": "...",
"virality_reason": "...",
"clip_url": "https://.../short_1.mp4"
}
]
}Edit shorts_generator/highlights.py:
- Virality framework:
VIRALITY_CRITERIA— the ranked list of signals the LLM optimizes for - System prompt:
HIGHLIGHT_SYSTEM_PROMPT— duration sweet spot, hook rules, JSON schema - Chunk size:
CHUNK_SIZE_SECONDS(default 1200) — chunk length for long videos - Long-video threshold:
LONG_VIDEO_THRESHOLD(default 1800) — videos longer than this are chunked - Chunk overlap:
CHUNK_OVERLAP_SECONDS(default 60) — overlap between chunks so cross-boundary clips aren't missed
Edit shorts_generator/config.py (or set env vars):
MUAPI_POLL_INTERVAL(default 5s) — seconds between job-status pollsMUAPI_POLL_TIMEOUT(default 1800s) — give up after this long
Audio is transcribed by MuAPI's /openai-whisper endpoint (server-side whisper-1). Pass --language <code> to lock the recognition to a specific language; otherwise it auto-detects.
AI-Youtube-Shorts-Generator/
├── main.py CLI entry point
├── requirements.txt core deps (api mode)
├── requirements-local.txt optional deps for --mode local
├── .env.example
└── shorts_generator/
├── config.py env / settings (MuAPI + local LLM + Whisper)
├── muapi.py generic submit + poll wrapper
├── downloader.py API mode: YouTube download via MuAPI
├── transcriber.py API mode: MuAPI /openai-whisper client
├── highlights.py shared LLM virality ranking (pluggable backend)
├── clipper.py API mode: MuAPI /autocrop
├── pipeline.py mode dispatcher (api ↔ local)
└── local/ --mode local backends (offline)
├── downloader.py yt-dlp download
├── transcriber.py faster-whisper transcription
├── llm.py OpenAI or Gemini client selector
└── clipper.py ffmpeg cut + OpenCV vertical crop
The video may have no detectable speech, or it may be in a language Whisper struggles with. Try passing --language en (or the correct ISO-639-1 code) to skip auto-detection.
The AI Clipping API uses an improved algorithm that produces higher-quality clips with better highlight detection.
Contributions are welcome! Please fork the repository and submit a pull request.
This project is licensed under the MIT License.
