How to add AI subtitles and chapters to videos on your Next.js site with ImageKit

A step-by-step tutorial for adding AI-generated subtitles, chapter markers, and multi-language translations to a Next.js video player using ImageKit's Video Player SDK. Covers auto transcription, word-level highlighting, auto and manual chapters, and translated caption tracks.

If you're building any kind of long-form video experience on a Next.js site, two features are no longer optional: subtitles and chapters. Subtitles are an accessibility requirement, an SEO signal, and the difference between a viewer watching with sound off and bouncing. Chapters are how viewers navigate a 12-minute lecture without scrubbing blindly through the timeline.

Producing them manually is expensive. Transcription services, VTT files to upload, chapter breakpoints to define and time, then the same work repeated for every translated language you support. For a course platform with hundreds of lessons or a podcast archive that grows weekly, the math doesn't work.

ImageKit's Video Player SDK ships AI-generated subtitles, AI-generated chapters, translations into other languages, and word-level highlighting as configuration on the source object. No transcription queue, no manual VTT files, no separate vendor.

In this tutorial we'll build a single lesson page for a fictional course platform. By the end, the video player will auto-generate English subtitles from the audio, auto-generate chapter markers, translate the subtitles into Spanish and French, and highlight each word as it's spoken.

What we'll cover:

  1. Install and configure the ImageKit Video Player SDK
  2. Render a basic lesson player
  3. Enable AI auto-generated subtitles with word highlighting
  4. Add multi-language translations
  5. Enable AI auto-generated chapters
  6. Add manual chapters when you need precise control
ℹ️

You'll need a Next.js 15+ project and an ImageKit account. The full source code is available on GitHub, and you can try the live demo without any local setup.

Setting up the ImageKit Video Player SDK

Install the package in a Next.js 15+ / React 18+ project:

npm install @imagekit/video-player
ℹ️

The companion demo includes a .npmrc file with legacy-peer-deps=true. The Player SDK's beta still declares React 17/18 in its peerDependencies (marked optional), and npm 10's resolver refuses the install without it. The library itself runs fine on React 19. The .npmrc file is a one-time workaround that can be removed once the SDK adds React 19 to its peer range. If you're installing into your own Next.js 15+ project on React 19, add a .npmrc at the project root with legacy-peer-deps=true and the install above will work.

Add your ImageKit ID to .env.local:

NEXT_PUBLIC_IMAGEKIT_ID=your-imagekit-id

You can find your ImageKit ID in the dashboard under the Developer options menu item. It's the path segment after ik.imagekit.io/.

The Player SDK uses browser APIs, so any component that renders it needs "use client" in the App Router. Here's a minimal lesson player to confirm the install works:

"use client";

import { IKVideoPlayer } from "@imagekit/video-player/react";
import "@imagekit/video-player/styles.css";

export default function LessonPlayer() {
  const ikOptions = {
    imagekitId: process.env.NEXT_PUBLIC_IMAGEKIT_ID!,
  };

  const source = {
    src: "https://ik.imagekit.io/your-id/lesson-01.mp4",
  };

  return <IKVideoPlayer ikOptions={ikOptions} source={source} />;
}

Drop this component into a page and you should see a working player with default controls. If the video doesn't load, double-check that the source URL is publicly accessible from your ImageKit account.

ℹ️

Don't have a video to test with? Use ImageKit's public sample at https://ik.imagekit.io/ikmedia/sample-video.mp4 and set NEXT_PUBLIC_IMAGEKIT_ID=ikmedia. That's exactly what the companion demo ships with so you can try the full flow without uploading anything.

Enabling AI auto-generated subtitles

Subtitles are a property on the source object, configured through the textTracks array. Each entry describes one subtitle track. Set autoGenerate: true and ImageKit runs speech-to-text on the audio and produces a VTT track automatically.

const source = {
  src: "https://ik.imagekit.io/your-id/lesson-01.mp4",
  textTracks: [
    {
      autoGenerate: true,
      default: true,
      maxChars: 60,
    },
  ],
};

That's the full subtitle setup. Press play and the captions render at the bottom of the player. The viewer can toggle them on or off using the captions button in the player controls.

Here's what each option does:

OptionPurpose
autoGenerateTrigger speech-to-text on the source video. Default: false.
defaultWhether the track is enabled by default. Set true for the primary language so captions appear without the viewer needing to enable them. Default: false.
maxCharsMaximum characters per subtitle line before the SDK breaks to a new line. 60 reads cleanly on desktop and mobile. Default: 60.

The first time the video is requested with autoGenerate: true, ImageKit's Video API transcribes it in the background. The track becomes available within seconds for short clips and within a minute for longer videos. Subsequent requests are served from cache instantly.

ℹ️

AI transcription costs 2 extension units per minute of source video. When you enable both auto-generated subtitles and auto-generated chapters, you're billed once — not separately for each. Each translated language adds another 2 units per minute. Check pricing before enabling at scale.

Adding karaoke-style word highlighting

Standard subtitles render a full line at a time. Word highlighting goes further: each word in the current subtitle line gets visually emphasized as it's spoken, which is particularly useful for educational content where active reading along helps comprehension and accessibility.

Enable it with one additional option:

textTracks: [
  {
    autoGenerate: true,
    default: true,
    maxChars: 60,
    highlightWords: true,
  },
],

That's it. The SDK uses word-level timing data from the speech recognition output to track playback against each word in the current line. If the SDK can't get word-level timestamps for the source language, it falls back gracefully to standard line-at-a-time rendering.

Note that highlightWords and maxChars apply to the original auto-generated transcription only. Translated tracks render without word-level highlighting.

Adding multi-language translations

If your audience is global, the same textTracks config can produce additional translated tracks alongside the source-language track. Each translation is one extra entry in the translations array.

textTracks: [
  {
    autoGenerate: true,
    default: true,
    maxChars: 60,
    highlightWords: true,
    translations: [
      { langCode: "es", label: "Spanish" },
      { langCode: "fr", label: "French" },
      { langCode: "de", label: "German" },
    ],
  },
],

Each translation appears as its own selectable subtitle track in the player's captions menu. The SDK uses the source-language transcription as the input to translation, so you only pay for transcription once and get the translated tracks as derivatives.

OptionPurpose
langCodeTwo-letter ISO 639-1 language code. es for Spanish, fr for French, de for German, ja for Japanese, etc.
labelThe display name shown in the captions menu. Use whatever makes sense for your UI.

For a course platform, translations are often the difference between a lesson reaching a regional audience or not. The cost of generating five translated tracks for a 10-minute lesson is a few cents. The cost of producing them manually is days of contractor time per language.

Enabling AI auto-generated chapters

Chapters work the same way as subtitles. Set chapters: true on the source object and ImageKit analyzes the video and produces chapter markers that appear on the progress bar.

const source = {
  src: "https://ik.imagekit.io/your-id/lesson-01.mp4",
  chapters: true,
  textTracks: [
    /* ...subtitle config from above */
  ],
};

The viewer hovers any chapter marker on the progress bar to see the chapter title, and clicks to jump to that section. The chapter list also appears in the player's chapter menu so viewers can scan the full lesson structure before deciding where to watch.

ImageKit's chapter detection works by analyzing the audio for natural topic breaks and visual changes in the video for scene boundaries. For a typical course lesson with clear topic transitions, this produces accurate chapter divisions without manual work.

Adding manual chapters when you need control

Auto chapters are great for speed. They're not always right for content where you need exact section boundaries. A lesson with strict syllabus alignment, a podcast where chapters need to match show notes, a conference talk where each demo needs its own marker. For these cases, the SDK accepts a chapters object with explicit timestamps.

const source = {
  src: "https://ik.imagekit.io/your-id/lesson-01.mp4",
  chapters: {
    0: "Introduction",
    90: "What are Server Components?",
    240: "Building your first component",
    480: "Common pitfalls",
    600: "Wrapping up",
  },
};

The format is an object where each key is a timestamp in seconds and the value is the chapter title. The SDK infers each chapter's end time from the next chapter's start time, and the final chapter runs to the end of the video.

FieldPurpose
Key (number)Seconds into the video where the chapter begins.
Value (string)Display label for the chapter, shown on hover and in the chapter menu.

When you pass an object, the SDK uses your chapters and skips automatic generation entirely. When you pass true, the SDK runs auto-detection. Pick one approach per video.

A practical pattern: use chapters: true for your back catalog where manual work isn't worth it, and pass an explicit object for high-value content where the chapter structure is part of the lesson design.

ℹ️

The ImageKit Video Player SDK is currently in beta. Test thoroughly before using it in production, and pin to a specific version rather than latest.

Putting it together on a lesson page

The full lesson page is small. A header with the course context, the player, and a footer with navigation to the next lesson.

// app/page.tsx
import LessonPlayer from "@/components/LessonPlayer";

export default function LessonPage() {
  return (
    <main className="max-w-4xl mx-auto px-6 py-10">
      <nav className="text-sm text-zinc-500 mb-4">
        <a href="#" className="hover:text-zinc-900">
          React Server Components
        </a>
        <span className="mx-2">/</span>
        <span>Lesson 1</span>
      </nav>

      <header className="mb-8">
        <h1 className="text-3xl font-semibold">
          Introduction to React Server Components
        </h1>
        <p className="text-zinc-600 mt-2">12 minutes · Free preview</p>
      </header>

      <LessonPlayer />

      <footer className="mt-10 pt-6 border-t border-zinc-200 flex justify-between text-sm">
        <span className="text-zinc-400">Previous</span>
        <a href="#" className="text-zinc-900 font-medium hover:underline">
          Next: Data fetching →
        </a>
      </footer>
    </main>
  );
}

That's the whole tutorial. A working lesson page with AI subtitles, AI chapters, three translation tracks, and word highlighting in three files.

What we built

The lesson page now does what would otherwise take a transcription service, a translation pipeline, a chapter editor, and custom player code to wire them together. AI handles transcription and translation. The Player SDK handles rendering. You configured all of it in one source object.

For the broader picture of what the Video Player SDK supports beyond subtitles and chapters, the overview docs cover playlists, shoppable products, smart reframing, and floating playback, all configurable from the same source object you just used. The full reference for the auto-subtitle options used above lives in the auto subtitles guide.

Sign up for a free ImageKit account and try AI subtitles on a video you already have hosted somewhere.