On this page

Summarizing and tagging videos with AI

Use the @mux/ai library to automatically generate titles, descriptions, and tags for your videos via LLMs

This guide uses @mux/ai, our open-source library that provides prebuilt workflows for common video AI tasks. It works with your favorite LLM provider (OpenAI, Anthropic, or Google). Check out the GitHub repository for more details!

Automatically generating video metadata like titles, descriptions, and tags helps you build better search experiences, improve content discovery, and save time on manual content curation. The @mux/ai library makes this straightforward by analyzing video transcripts and storyboard images to generate metadata.

Prerequisites

Before starting, make sure you have:

A Mux account with API credentials (token ID and token secret)
An API key for your preferred AI provider (OpenAI, Anthropic, or Google)
Node.js installed
Videos with captions enabled (human-generated captions are best, but auto-generated captions work great too)

Installation

npm install @mux/ai

Configuration

Set your environment variables:

# Required
MUX_TOKEN_ID=your_mux_token_id
MUX_TOKEN_SECRET=your_mux_token_secret
# You only need the API key for the provider you're using
OPENAI_API_KEY=your_openai_api_key # OR
ANTHROPIC_API_KEY=your_anthropic_api_key # OR
GOOGLE_GENERATIVE_AI_API_KEY=your_google_api_key

Basic usage

import { getSummaryAndTags } from "@mux/ai/workflows";

const result = await getSummaryAndTags("your-mux-asset-id", {
  tone: "professional" // or "neutral" or "playful"
});

console.log(result.title);
// "How to Build a Video Platform in 2025"

console.log(result.description);
// "Learn the fundamentals of building a modern video platform..."

console.log(result.tags);
// ["video streaming", "web development", "tutorial", "javascript"]

Tone options

You can control the style of generated content with the tone option:

// Professional tone - formal and business-appropriate
const professional = await getSummaryAndTags("your-mux-asset-id", {
  tone: "professional"
});

// Neutral tone - balanced and conversational (default)
const neutral = await getSummaryAndTags("your-mux-asset-id", {
  tone: "neutral"
});

// Playful tone - playful and engaging
const playful = await getSummaryAndTags("your-mux-asset-id", {
  tone: "playful"
});

Here's some example titles for each tone, based on the same demo video of Mux's thumbnail API:

Neutral: Effortless Thumbnails & GIFs with Mux API
Playful: Developer Snags Thumbnails and GIFs with Mux API
Professional: Mux API Simplifies Video Thumbnail and GIF Creation

Provider options

@mux/ai supports three AI providers:

OpenAI (default): Uses gpt-5.1 model - Fast and cost-effective
Anthropic: Uses claude-sonnet-4-5 model - Great for nuanced understanding
Google: Uses gemini-3-flash-preview model - Balance of speed and quality

const result = await getSummaryAndTags("your-mux-asset-id", {
  provider: "anthropic", // or "openai" or "google"
  model: "claude-opus-4-5" // Optional: override default model
});

Including transcript

By default, @mux/ai analyzes both the storyboard images and transcript. Storyboard images are always included, but you can optionally exclude the transcript:

// Exclude transcript (faster, uses only visual analysis)
const result = await getSummaryAndTags("your-mux-asset-id", {
  includeTranscript: false
});

Custom prompts

You can override specific parts of the prompt to tune the output:

const result = await getSummaryAndTags("your-mux-asset-id", {
  promptOverrides: {
    system: "You are a video content specialist focused on technical tutorials.",
    instructions: "Create a title under 60 characters and exactly 5 tags focused on technical concepts."
  }
});

Webhook integration

For automated metadata generation when videos are uploaded, you should trigger the call to get the summary and tags from the video.asset.track.ready webhook:

export async function handleWebhook(req, res) {
  const event = req.body;

  if (event.type === 'video.asset.track.ready' &&
      event.data.type === 'text' &&
      event.data.language_code === 'en') {
    const result = await getSummaryAndTags(event.data.asset_id, { tone: "professional" });
    await db.updateVideo(event.data.asset_id, { title: result.title, description: result.description, tags: result.tags });
  }
}

Use cases

Once you have automatically generated metadata, you can:

Improve search and discovery: Use titles, descriptions, and tags to build better search experiences with tools like Algolia or Elasticsearch
Content filtering: Allow users to filter videos by auto-generated tags
Analytics and insights: Track content trends across your video library by analyzing tag distributions

How it works

Under the hood, @mux/ai handles:

Fetching storyboard images for visual analysis
Optionally fetching the video transcript from Mux
Sending optimized multimodal prompts to the AI provider
Parsing and validating the structured response
Returning clean, ready-to-use metadata

Mux features used

Storyboard images - Always used for visual analysis
Auto-generated captions - Optionally included for additional context

Best practices

Enable captions: Human-generated captions provide the best results, but auto-generated captions work great too
Choose appropriate tone: Match the tone to your brand voice
Validate critical metadata: Review auto-generated titles for high-visibility content
Cache results: Store generated metadata to avoid regenerating it
Consider cost vs. quality: gpt-5.1 is cost-effective for most use cases

Resources

On this page

Summarizing and tagging videos with AI

Use the @mux/ai library to automatically generate titles, descriptions, and tags for your videos via LLMs

Prerequisites

Before starting, make sure you have:

A Mux account with API credentials (token ID and token secret)
An API key for your preferred AI provider (OpenAI, Anthropic, or Google)
Node.js installed
Videos with captions enabled (human-generated captions are best, but auto-generated captions work great too)

Installation

npm install @mux/ai

Configuration

Set your environment variables:

# Required
MUX_TOKEN_ID=your_mux_token_id
MUX_TOKEN_SECRET=your_mux_token_secret
# You only need the API key for the provider you're using
OPENAI_API_KEY=your_openai_api_key # OR
ANTHROPIC_API_KEY=your_anthropic_api_key # OR
GOOGLE_GENERATIVE_AI_API_KEY=your_google_api_key

Basic usage

import { getSummaryAndTags } from "@mux/ai/workflows";

const result = await getSummaryAndTags("your-mux-asset-id", {
  tone: "professional" // or "neutral" or "playful"
});

console.log(result.title);
// "How to Build a Video Platform in 2025"

console.log(result.description);
// "Learn the fundamentals of building a modern video platform..."

console.log(result.tags);
// ["video streaming", "web development", "tutorial", "javascript"]

Tone options

You can control the style of generated content with the tone option:

// Professional tone - formal and business-appropriate
const professional = await getSummaryAndTags("your-mux-asset-id", {
  tone: "professional"
});

// Neutral tone - balanced and conversational (default)
const neutral = await getSummaryAndTags("your-mux-asset-id", {
  tone: "neutral"
});

// Playful tone - playful and engaging
const playful = await getSummaryAndTags("your-mux-asset-id", {
  tone: "playful"
});

Here's some example titles for each tone, based on the same demo video of Mux's thumbnail API:

Neutral: Effortless Thumbnails & GIFs with Mux API
Playful: Developer Snags Thumbnails and GIFs with Mux API
Professional: Mux API Simplifies Video Thumbnail and GIF Creation

Provider options

@mux/ai supports three AI providers:

OpenAI (default): Uses gpt-5.1 model - Fast and cost-effective
Anthropic: Uses claude-sonnet-4-5 model - Great for nuanced understanding
Google: Uses gemini-3-flash-preview model - Balance of speed and quality

const result = await getSummaryAndTags("your-mux-asset-id", {
  provider: "anthropic", // or "openai" or "google"
  model: "claude-opus-4-5" // Optional: override default model
});

Including transcript

By default, @mux/ai analyzes both the storyboard images and transcript. Storyboard images are always included, but you can optionally exclude the transcript:

// Exclude transcript (faster, uses only visual analysis)
const result = await getSummaryAndTags("your-mux-asset-id", {
  includeTranscript: false
});

Custom prompts

You can override specific parts of the prompt to tune the output:

const result = await getSummaryAndTags("your-mux-asset-id", {
  promptOverrides: {
    system: "You are a video content specialist focused on technical tutorials.",
    instructions: "Create a title under 60 characters and exactly 5 tags focused on technical concepts."
  }
});

Webhook integration

For automated metadata generation when videos are uploaded, you should trigger the call to get the summary and tags from the video.asset.track.ready webhook:

export async function handleWebhook(req, res) {
  const event = req.body;

  if (event.type === 'video.asset.track.ready' &&
      event.data.type === 'text' &&
      event.data.language_code === 'en') {
    const result = await getSummaryAndTags(event.data.asset_id, { tone: "professional" });
    await db.updateVideo(event.data.asset_id, { title: result.title, description: result.description, tags: result.tags });
  }
}

Use cases

Once you have automatically generated metadata, you can:

Improve search and discovery: Use titles, descriptions, and tags to build better search experiences with tools like Algolia or Elasticsearch
Content filtering: Allow users to filter videos by auto-generated tags
Analytics and insights: Track content trends across your video library by analyzing tag distributions

How it works

Under the hood, @mux/ai handles:

Fetching storyboard images for visual analysis
Optionally fetching the video transcript from Mux
Sending optimized multimodal prompts to the AI provider
Parsing and validating the structured response
Returning clean, ready-to-use metadata

Mux features used

Storyboard images - Always used for visual analysis
Auto-generated captions - Optionally included for additional context

Best practices

Enable captions: Human-generated captions provide the best results, but auto-generated captions work great too
Choose appropriate tone: Match the tone to your brand voice
Validate critical metadata: Review auto-generated titles for high-visibility content
Cache results: Store generated metadata to avoid regenerating it
Consider cost vs. quality: gpt-5.1 is cost-effective for most use cases