Skip to content

[Test for Claude Code, Draft] Add speaker identification system to espnet3#6375

Open
sw005320 wants to merge 5 commits into
espnet:masterfrom
sw005320:spk_system
Open

[Test for Claude Code, Draft] Add speaker identification system to espnet3#6375
sw005320 wants to merge 5 commits into
espnet:masterfrom
sw005320:spk_system

Conversation

@sw005320
Copy link
Copy Markdown
Contributor

Summary

  • Add SPKSystem to espnet3/systems/spk/ by porting the espnet2 spk task into the espnet3 framework, following the same patterns established by ASRSystem
  • Add SpeakerTask (direct port of espnet2/tasks/spk.py) reusing all espnet2 spk encoders, pooling layers, projectors, and loss functions
  • Add EER metric extending AbsMetric, computing EER (%) and minDCF via espnet2.utils.eer

Test plan

  • Unit tests added in test/espnet3/systems/spk/test_system.py
    • SPKSystem initialization and stage behavior
    • create_dataset with/without config
    • train delegation to base system
    • EER metric computation and file output
    • SpeakerTask class choices and data name requirements
  • Verify tests pass in a full ESPnet environment with all dependencies installed

🤖 Generated with Claude Code

Port the espnet2 spk task into the espnet3 framework as SPKSystem,
following the same patterns established by ASRSystem.

- espnet3/systems/spk/system.py: SPKSystem extending BaseSystem with
  config-driven create_dataset stage (no tokenizer step required)
- espnet3/systems/spk/task.py: SpeakerTask ported from espnet2/tasks/spk.py,
  reusing all espnet2 spk encoders, pooling, projectors, and loss functions
- espnet3/systems/spk/metrics/eer.py: EER metric extending AbsMetric,
  computing EER (%) and minDCF using espnet2.utils.eer
- test/espnet3/systems/spk/test_system.py: unit tests for SPKSystem,
  SpeakerTask, and EER metric

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. ESPnet3 New Features SID Speaker identification/embedding labels Feb 24, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a speaker identification system (SPKSystem) into espnet3 by porting the espnet2 speaker task. It includes a new SpeakerTask, an EER metric, and corresponding tests. The overall structure is well-organized and aligns with existing patterns in the codebase. I have identified two critical issues that could lead to runtime crashes. One is a missing validation for input_size in SpeakerTask.build_model when a frontend is not specified. The other is a potential ZeroDivisionError during the EER metric calculation if all trial labels are identical. Addressing these issues will enhance the robustness of the new system.

Comment on lines +78 to +81
fnrs, fprs, thresholds = ComputeErrorRates(scores, labels)
min_dcf, _ = ComputeMinDcf(
fnrs, fprs, thresholds, self.p_target, self.c_miss, self.c_fa
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The ComputeErrorRates function from espnet2.utils.eer can raise a ZeroDivisionError if all labels in the input are the same (i.e., all target or all non-target). This will cause a crash. You should add error handling to gracefully manage this edge case, for example by using a try...except ZeroDivisionError block and assigning a sensible default value to min_dcf.

input_size = frontend.output_size()
else:
frontend = None
input_size = args.input_size
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

If args.frontend is None, input_size is set to args.input_size. However, args.input_size can also be None (as it defaults to None), which will likely cause a crash later when it is passed to the encoder's constructor, which expects an integer. You should add a check to ensure input_size is not None when no frontend is used.

            input_size = args.input_size
            if input_size is None:
                raise ValueError("input_size must be specified if frontend is not used.")

Shinji Watanabe and others added 3 commits February 24, 2026 15:24
Remove unused Path, MagicMock, and projector_choices imports.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
os.path.join does not accept keyword arguments, so the test was
failing. Replace with os.path.exists which accepts 'path' as a
keyword argument.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 73.07692% with 49 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.63%. Comparing base (9215b8d) to head (8488dc2).
⚠️ Report is 102 commits behind head on master.

Files with missing lines Patch % Lines
espnet3/systems/spk/task.py 61.71% 49 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master    #6375    +/-   ##
========================================
  Coverage   69.62%   69.63%            
========================================
  Files         775      780     +5     
  Lines       71542    71724   +182     
========================================
+ Hits        49813    49944   +131     
- Misses      21729    21780    +51     
Flag Coverage Δ
test_integration_espnet2 46.88% <ø> (ø)
test_python_espnet2 60.89% <0.00%> (-0.16%) ⬇️
test_python_espnet3 17.33% <73.07%> (+0.19%) ⬆️
test_utils 60.89% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sw005320 sw005320 changed the title Add speaker identification system to espnet3 [Test for Claude Code, Draft] Add speaker identification system to espnet3 Feb 25, 2026
@Fhrozen Fhrozen added this to the v.202601 milestone Mar 2, 2026
@Fhrozen Fhrozen modified the milestones: v.202604, v.202607 Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ESPnet3 New Features SID Speaker identification/embedding size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants