Skip to content

Refactor evaluation #96

Merged
alessiodevoto merged 32 commits intomainfrom
aledev/eval
Jul 9, 2025
Merged

Refactor evaluation #96
alessiodevoto merged 32 commits intomainfrom
aledev/eval

Conversation

@alessiodevoto
Copy link
Contributor

PR description

Description of your PR. Fixes # (issue) (if applicable)

Checklist

  • Tests are working (make test)
  • Code is formatted correctly (make style, on errors try fix with make format)
  • Copyright header is included
  • All commits are signed-off using git commit -s
  • (new press) mypress_press.py is in the presses directory
  • (new press) MyPress is in __init__.py
  • (new press) README.md is updated with a 1 liner about the new press in the Available presses section
  • (new press) New press is in the default_presses list in tests/default_presses.py
  • (new press) A docstring is provided that follows the same structure as the existing ones

Details

Addresses #89 with the following changes/tests:

  • Moved all datasets to the benchmarks dir to prevent clutter (all files left unchanged)
  • Refactored the code for evaluation to work both with yaml file and CLI args (see evaluation.py)
  • Added instructions to README
  • Tested all presses on a single sample from ruler

Additional change

  • minimal change to QFilterPress that was not loading q_filter files correctly due to a mismatch in model names (Meta-Llama -> Llama)

maxjeblick and others added 29 commits July 3, 2025 12:38
Signed-off-by: Maximilian Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Maximilian Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Maximilian Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Maximilian Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Maximilian Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Maximilian Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Maximilian Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Maximilian Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Maximilian Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Maximilian Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Maximilian Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Maximilian Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Maximilian Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Maximilian Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Maximilian Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: alessiodevoto <adevoto@nvidia.com>
Signed-off-by: Maximilian Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: alessiodevoto <adevoto@nvidia.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Copy link
Collaborator

@maxjeblick maxjeblick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR, overall, it looks really good!
I have two suggestions:

  1. Create a dedicated Config class instead of using a dictionary. This could improve readability a bit.
  2. Instead of encoding the config settings in the dataframe/metrics name, create a folder that also saves the config. By this, one can easily load the config.yaml again for downstream processing.

Item 1. is a potential refactoring; 2. should probably be implemented, as the current implementation is a bit hacky.

Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
@alessiodevoto
Copy link
Contributor Author

Thanks, the dataclass is way better indeed! I implemented it and changed the output directory pattern. Now it is like:

<dir name from configs>/
    ├── predictions.csv
    ├── metrics.json
    └── eval_config.yaml

@alessiodevoto
Copy link
Contributor Author

I put the dataclass code in the evaluation.py file, maybe we can move it somewhere else (just wanted to avoid too many files)

Copy link
Collaborator

@maxjeblick maxjeblick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot LGTM!

Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Copy link
Collaborator

@maxjeblick maxjeblick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot LGTM!

@alessiodevoto alessiodevoto merged commit 4e559d3 into main Jul 9, 2025
3 checks passed
@alessiodevoto alessiodevoto deleted the aledev/eval branch July 9, 2025 15:03
maxjeblick pushed a commit that referenced this pull request Aug 12, 2025
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants