Skip to content

Update hubert doc#6396

Open
NewGamezzz wants to merge 7 commits into
espnet:masterfrom
NewGamezzz:update_hubert_doc
Open

Update hubert doc#6396
NewGamezzz wants to merge 7 commits into
espnet:masterfrom
NewGamezzz:update_hubert_doc

Conversation

@NewGamezzz
Copy link
Copy Markdown
Contributor

What did you change?

  • Added a detailed breakdown of the HuBERT recipe flow, including feature dumping and K-means clustering logic.
  • Documented the DiceHubert distillation method, explaining how to use pretrained models to bypass early iterations.
  • Added an Evaluation section specifically for the SUPERB benchmark using the S3PRL upstream espnet_hubert_local.

Why did you make this change?

The current HuBERT documentation does not provide the details of each stage, making it hard to understand the script. These updates provide clear instructions for researchers looking to replicate the HuBERT and DiceHubert results, specifically within the ESPnet2 framework.


Is your PR small enough?

Yes.


@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. Documentation ESPnet2 SSL self-supervised learning labels Mar 25, 2026
@mergify mergify Bot added the README label Mar 25, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly expands the README.md for the HuBERT recipe, providing a detailed workflow, information on DiceHuBERT distillation, and evaluation guidelines. It also introduces a new YAML configuration file for HuBERT distillation training. Feedback includes correcting an invalid arXiv link for DiceHuBERT and replacing a hardcoded Python executable path with a generic command in the README.md examples.

Comment thread egs2/TEMPLATE/hubert1/README.md
Comment thread egs2/TEMPLATE/hubert1/README.md Outdated
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@sw005320
Copy link
Copy Markdown
Contributor

Thanks, @NewGamezzz!

@wanchichen, can you review this PR?

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.32%. Comparing base (2d059f5) to head (a651223).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6396   +/-   ##
=======================================
  Coverage   70.32%   70.32%           
=======================================
  Files         787      787           
  Lines       73651    73651           
=======================================
  Hits        51794    51794           
  Misses      21857    21857           
Flag Coverage Δ
test_integration_espnet2 46.85% <ø> (ø)
test_python_espnet2 61.39% <ø> (ø)
test_python_espnet3 17.56% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Fhrozen
Copy link
Copy Markdown
Member

Fhrozen commented Mar 30, 2026

@claude review

@Fhrozen Fhrozen added this to the v.202607 milestone Apr 7, 2026
@sw005320
Copy link
Copy Markdown
Contributor

@wanchichen, this is a reminder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Documentation ESPnet2 README size:L This PR changes 100-499 lines, ignoring generated files. SSL self-supervised learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants