Mine hard negatives: optionally output similarity scores#3506
Mine hard negatives: optionally output similarity scores#3506tomaarsen merged 9 commits intohuggingface:mainfrom
Conversation
|
Hello! I think it's indeed a good idea to also allow exporting scores, but so far I've introduced that via the If we instead add a parameter akin to
|
4053087 to
e0ac98e
Compare
e0ac98e to
61f4dc4
Compare
|
Hi, did you decide on what to do with |
|
Apologies for the delay. I think it would be preferable indeed to move towards I want share that I'll be taking 3 weeks off starting Monday, so I won't be able to move this PR forward in the meantime. Apologies for this.
|
Okay, I've implemented this |
…r both And consider "scores" and "labels" special label columns for all model archetypes, not just CrossEncoder
|
@tsbalzhanov
So, I've made it so it's either I think the current implementation gives you all the outputs that you might want, while also working nicely out of the box with the Sentence Transformers trainers etc.
|
|
I think using scores without labels defeats the purpose of using them in the first place, because we need an ability to distinguish between hard negatives and positives for training cross encoder. In case of Is there some way to have both scores and labels included in the output? |
|
Thanks for your considered response. I agree completely that more information is almost always better, but this time it contradicts one of the goals of I also agree that the "gold" (human-annotated) positives vs negatives (i.e. Do you know of a situation where you'd need both columns ?
|
There was a problem hiding this comment.
Pull request overview
This PR adds an optional output_scores parameter to the mine_hard_negatives function, allowing users to include similarity scores in the output dataset alongside the mined hard negatives. This enables fine-tuning mining parameters without recalculating scores and supports extracting selection logic outside the mining function. The PR also deprecates the n-tuple-scores output format in favor of using n-tuple with output_scores=True.
Key changes:
- Added
output_scoresparameter to optionally include similarity scores in all output formats - Deprecated
n-tuple-scoresformat with a migration path ton-tuple+output_scores=True - Updated data collator to recognize "labels" and "scores" as valid label columns
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| sentence_transformers/util/hard_negatives.py | Implements output_scores parameter, deprecates n-tuple-scores format, and adds score extraction logic for all output formats |
| tests/util/test_hard_negatives.py | Adds comprehensive test coverage for output_scores parameter across all output formats and validates deprecated format behavior |
| sentence_transformers/data_collator.py | Extends valid label columns to include "labels" and "scores" alongside existing "label" and "score" |
| docs/sentence_transformer/training_overview.md | Updates documentation to reflect new valid label column names |
| docs/cross_encoder/training_overview.md | Removes obsolete comparison note about label column differences |
| docs/cross_encoder/loss_overview.md | Documents ability to output similarity scores instead of binary labels |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Hello
This PR adds an option to include similarity scores into result of mine hard negatives function.
This options might be helpful to fine-tune parameters of the mining function without a need to recalculate scores again or to extract logic of selecting negatives outside of the mining function.
Tsyren Balzhanov