[Test for Claude Code, Draft] Add speaker identification system to espnet3#6375
[Test for Claude Code, Draft] Add speaker identification system to espnet3#6375sw005320 wants to merge 5 commits into
Conversation
Port the espnet2 spk task into the espnet3 framework as SPKSystem, following the same patterns established by ASRSystem. - espnet3/systems/spk/system.py: SPKSystem extending BaseSystem with config-driven create_dataset stage (no tokenizer step required) - espnet3/systems/spk/task.py: SpeakerTask ported from espnet2/tasks/spk.py, reusing all espnet2 spk encoders, pooling, projectors, and loss functions - espnet3/systems/spk/metrics/eer.py: EER metric extending AbsMetric, computing EER (%) and minDCF using espnet2.utils.eer - test/espnet3/systems/spk/test_system.py: unit tests for SPKSystem, SpeakerTask, and EER metric Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a speaker identification system (SPKSystem) into espnet3 by porting the espnet2 speaker task. It includes a new SpeakerTask, an EER metric, and corresponding tests. The overall structure is well-organized and aligns with existing patterns in the codebase. I have identified two critical issues that could lead to runtime crashes. One is a missing validation for input_size in SpeakerTask.build_model when a frontend is not specified. The other is a potential ZeroDivisionError during the EER metric calculation if all trial labels are identical. Addressing these issues will enhance the robustness of the new system.
| fnrs, fprs, thresholds = ComputeErrorRates(scores, labels) | ||
| min_dcf, _ = ComputeMinDcf( | ||
| fnrs, fprs, thresholds, self.p_target, self.c_miss, self.c_fa | ||
| ) |
There was a problem hiding this comment.
The ComputeErrorRates function from espnet2.utils.eer can raise a ZeroDivisionError if all labels in the input are the same (i.e., all target or all non-target). This will cause a crash. You should add error handling to gracefully manage this edge case, for example by using a try...except ZeroDivisionError block and assigning a sensible default value to min_dcf.
| input_size = frontend.output_size() | ||
| else: | ||
| frontend = None | ||
| input_size = args.input_size |
There was a problem hiding this comment.
If args.frontend is None, input_size is set to args.input_size. However, args.input_size can also be None (as it defaults to None), which will likely cause a crash later when it is passed to the encoder's constructor, which expects an integer. You should add a check to ensure input_size is not None when no frontend is used.
input_size = args.input_size
if input_size is None:
raise ValueError("input_size must be specified if frontend is not used.")Remove unused Path, MagicMock, and projector_choices imports. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
os.path.join does not accept keyword arguments, so the test was failing. Replace with os.path.exists which accepts 'path' as a keyword argument. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6375 +/- ##
========================================
Coverage 69.62% 69.63%
========================================
Files 775 780 +5
Lines 71542 71724 +182
========================================
+ Hits 49813 49944 +131
- Misses 21729 21780 +51
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
SPKSystemtoespnet3/systems/spk/by porting the espnet2 spk task into the espnet3 framework, following the same patterns established byASRSystemSpeakerTask(direct port ofespnet2/tasks/spk.py) reusing all espnet2 spk encoders, pooling layers, projectors, and loss functionsEERmetric extendingAbsMetric, computing EER (%) and minDCF viaespnet2.utils.eerTest plan
test/espnet3/systems/spk/test_system.pySPKSysteminitialization and stage behaviorcreate_datasetwith/without configtraindelegation to base systemEERmetric computation and file outputSpeakerTaskclass choices and data name requirements🤖 Generated with Claude Code