Skip to content

Conversation

@pzanella
Copy link

@pzanella pzanella commented Oct 29, 2025

Related:

Fixes SRT subtitle parsing issues where comma-separated milliseconds (e.g., 00:00:01,000) fail to parse when content contains indented template literals.
Closes #1671

Description:

This PR adds whitespace normalization to the #parseContent method in TextTrack to handle indented template literal content before passing it to the media-captions parser.

Changes:

  • Core Fix: Added content.split('\n').map(line => line.trim()).join('\n').trim() to normalize whitespace while preserving line structure
  • Test Coverage: Added comprehensive unit tests (16 test cases) covering whitespace normalization scenarios
  • Edge Cases: Handles mixed indentation, Windows line endings, tabs, and extremely large indentation
  • Backward Compatibility: Preserves empty lines between subtitle blocks and doesn't affect JSON content parsing

Problem Solved:
The media-captions library expects clean content without leading/trailing whitespace. When using template literals with indentation for inline SRT content, the parser would fail with errors like "cue start timestamp \'00:00:00,000\' is invalid" even though the SRT format itself was correct.

Ready?

Yes - All tests passing, edge cases covered, and backward compatibility maintained.

Anything Else?

Before (Failed):

const track = new TextTrack({
  kind: 'subtitles',
  type: 'srt',
  content: `    1
    00:00:01,000 --> 00:00:05,000
    First subtitle`
});
// ❌ Error: cue start timestamp `00:00:01,000` is invalid

After (Works)

const track = new TextTrack({
  kind: 'subtitles', 
  type: 'srt',
  content: `    1
    00:00:01,000 --> 00:00:05,000
    First subtitle`
});
// ✅ Parses successfully - whitespace normalized automatically

Test Results:16/16 tests passing

Test Category Tests Status Coverage
Whitespace Normalization 11 ✅ Pass Template literals, mixed indentation, leading/trailing whitespace
Edge Cases 3 ✅ Pass Tabs, Windows line endings (\r\n), extreme indentation
Regression Tests 2 ✅ Pass SRT comma timestamps, VTT period timestamps

Detailed Coverage:

  • ✅ Template literal indentation scenarios
  • ✅ Mixed whitespace handling
  • ✅ Empty line preservation between subtitle blocks
  • ✅ JSON content bypass (no normalization applied)
  • ✅ Windows line endings (\r\n) compatibility
  • ✅ Tab characters and extreme indentation handling
  • ✅ SRT comma timestamps (00:00:01,000) parsing
  • ✅ VTT period timestamps (00:00:01.000) parsing
  • ✅ Complex content with HTML formatting and special characters
  • ✅ Single line content normalization
  • ✅ Whitespace-only content handling

Manual Test Scenarios:

Template Literal Testing

// Test 1: Basic indented SRT content
const track1 = new TextTrack({
  kind: 'subtitles',
  type: 'srt',
  content: `    1
    00:00:01,000 --> 00:00:05,000
    First subtitle
    
    2
    00:00:06,000 --> 00:00:10,000
    Second subtitle`
});
// Expected: ✅ Parses successfully

Cross-platform Line Endings

// Test 2: Windows line endings (\r\n)
const windowsContent = "1\r\n00:00:01,000 --> 00:00:05,000\r\nSubtitle text";
const track2 = new TextTrack({
  kind: 'subtitles',
  type: 'srt', 
  content: `    ${windowsContent}`
});
// Expected: ✅ Handles \r\n correctly

JSON Content Bypass

// Test 3: JSON content should skip normalization
const track3 = new TextTrack({
  kind: 'subtitles',
  type: 'json',
  content: {
    cues: [{ startTime: 1, endTime: 5, text: "Test" }]
  }
});
// Expected: ✅ No normalization applied to JSON

VTT Format Testing

// Test 4: VTT with period timestamps
const track4 = new TextTrack({
  kind: 'subtitles',
  type: 'vtt',
  content: `    WEBVTT
    
    1
    00:00:01.000 --> 00:00:05.000
    VTT subtitle`
});
// Expected: ✅ VTT format works with periods

Edge Case Validation

// Test 5: Mixed whitespace (tabs + spaces)
const mixedContent = "\t  1\n\t00:00:01,000 --> 00:00:05,000\n\t  Mixed whitespace";
const track5 = new TextTrack({
  kind: 'subtitles',
  type: 'srt',
  content: mixedContent
});
// Expected: ✅ Handles mixed tab/space indentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support srt file format

1 participant