Skip to content

Conversation

@cnotley
Copy link

@cnotley cnotley commented Oct 28, 2025

Reference Issues/PRs

Fixes #31599

What does this implement/fix? Explain your changes.

Bug before this change

_MultimetricScorer.__call__ iterates over the scorers in self._scorers and forwards sample_weight to each scorer if appropriate.

Before this change, the code unconditionally did something like:

accepts = scorer._accept_sample_weight()

This assumes that every scorer in self._scorers is a _BaseScorer instance with an _accept_sample_weight attribute.

However, users can (and do) pass a mix of scorer types, for example:

  • a normal scorer created via make_scorer (i.e. a _BaseScorer subclass), and
  • a plain callable (function) that computes a custom score.

Plain callables do not define _accept_sample_weight. In that case, calling _accept_sample_weight() raised:

AttributeError: 'function' object has no attribute '_accept_sample_weight'

This means that evaluating multiple metrics at once with sample_weight=... could crash at runtime if any of those metrics was provided as a bare function instead of a scorer object.

Behavior after this change

We now guard access to _accept_sample_weight().

For each scorer in the multimetric scorer:

  • If the scorer defines _accept_sample_weight(), we call it and (as before) forward sample_weight only if it returns True.
  • If the scorer does not define _accept_sample_weight (e.g. it's just a plain callable), we treat it as not accepting sample_weight instead of raising.

In code terms (simplified for illustration):

try:
    accepts = scorer._accept_sample_weight()
except AttributeError:
    accepts = False

if accepts:
    routed_params[name]["sample_weight"] = sample_weight

This preserves the existing behavior for real scorer objects, while preventing the AttributeError for plain callables. The result is that mixed-metric evaluation with sample_weight no longer crashes.

Tests

This PR adds a non-regression test:
test_multimetric_scorer_mixed_callable_no_attribute_error_with_sample_weight
in sklearn/metrics/tests/test_score_objects.py.

The test does the following:

  • Builds a dummy estimator.
  • Builds one scorer that is a _BaseScorer (and therefore supports _accept_sample_weight) and another scorer that is just a plain function with no _accept_sample_weight.
  • Constructs an instance of _MultimetricScorer combining both scorers.
  • Calls that multimetric scorer with sample_weight=....

The assertions are:

  • No AttributeError (or other exception) is raised when calling with sample_weight.
  • The returned result is a dict containing numeric scores for both metrics.

The test also imports _BaseScorer from sklearn.metrics._scorer so we can explicitly build the mixed scorer setup.

User impact

  • Before: valid code using a mix of scorer styles could raise at runtime when passing sample_weight.
  • After: that code now runs without error.
  • Behavior for scorers that already support sample_weight is unchanged.
  • Behavior for scorers that do not support sample_weight is also unchanged (they simply do not receive sample_weight), but we no longer crash in that case.

This is therefore a robustness / correctness fix to _MultimetricScorer.__call__, plus a regression test to prevent regressions in the future.

Any other comments?

  • This change should be backward compatible.
  • Because this avoids a runtime crash in user code, it may warrant a short changelog entry under bug fixes in doc/whats_new/upcoming_changes/.

@github-actions
Copy link

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


ruff check

ruff detected issues. Please run ruff check --fix --output-format=full locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


sklearn/metrics/_scorer.py:163:89: E501 Line too long (92 > 88)
    |
161 |                         accepts = False
162 |                     if accepts:
163 |                         routed_params[name].score["sample_weight"] = kwargs["sample_weight"]
    |                                                                                         ^^^^ E501
164 |
165 |         for name, scorer in self._scorers.items():
    |

sklearn/metrics/tests/test_score_objects.py:1:1: I001 [*] Import block is un-sorted or un-formatted
   |
 1 | / import numbers
 2 | | import pickle
 3 | | import re
 4 | | from copy import deepcopy
 5 | | from functools import partial
 6 | |
 7 | | import joblib
 8 | | import numpy as np
 9 | | import pytest
10 | | from numpy.testing import assert_allclose
11 | |
12 | | from sklearn import config_context
13 | | from sklearn.base import BaseEstimator, ClassifierMixin
14 | | from sklearn.cluster import KMeans
15 | | from sklearn.datasets import (
16 | |     load_diabetes,
17 | |     make_blobs,
18 | |     make_classification,
19 | |     make_multilabel_classification,
20 | |     make_regression,
21 | | )
22 | | from sklearn.linear_model import LogisticRegression, Perceptron, Ridge
23 | | from sklearn.metrics import (
24 | |     accuracy_score,
25 | |     average_precision_score,
26 | |     balanced_accuracy_score,
27 | |     brier_score_loss,
28 | |     check_scoring,
29 | |     f1_score,
30 | |     fbeta_score,
31 | |     get_scorer,
32 | |     get_scorer_names,
33 | |     jaccard_score,
34 | |     log_loss,
35 | |     make_scorer,
36 | |     matthews_corrcoef,
37 | |     precision_score,
38 | |     r2_score,
39 | |     recall_score,
40 | |     roc_auc_score,
41 | |     top_k_accuracy_score,
42 | | )
43 | | from sklearn.metrics import cluster as cluster_module
44 | | from sklearn.metrics._scorer import (
45 | |     _check_multimetric_scoring,
46 | |     _CurveScorer,
47 | |     _MultimetricScorer,
48 | |     _PassthroughScorer,
49 | |     _Scorer,
50 | |     _BaseScorer,
51 | | )
52 | | from sklearn.model_selection import GridSearchCV, cross_val_score, train_test_split
53 | | from sklearn.multiclass import OneVsRestClassifier
54 | | from sklearn.neighbors import KNeighborsClassifier
55 | | from sklearn.pipeline import Pipeline, make_pipeline
56 | | from sklearn.svm import LinearSVC
57 | | from sklearn.tests.metadata_routing_common import (
58 | |     assert_request_is_empty,
59 | | )
60 | | from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
61 | | from sklearn.utils._testing import (
62 | |     assert_almost_equal,
63 | |     assert_array_equal,
64 | |     ignore_warnings,
65 | | )
66 | | from sklearn.utils.metadata_routing import MetadataRouter, MethodMapping
   | |________________________________________________________________________^ I001
67 |
68 |   REGRESSION_SCORERS = [
   |
   = help: Organize imports

Found 2 errors.
[*] 1 fixable with the `--fix` option.

ruff format

ruff detected issues. Please run ruff format locally and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


--- sklearn/metrics/_scorer.py
+++ sklearn/metrics/_scorer.py
@@ -160,7 +160,9 @@
                     except AttributeError:
                         accepts = False
                     if accepts:
-                        routed_params[name].score["sample_weight"] = kwargs["sample_weight"]
+                        routed_params[name].score["sample_weight"] = kwargs[
+                            "sample_weight"
+                        ]
 
         for name, scorer in self._scorers.items():
             try:

1 file would be reformatted, 923 files already formatted

Generated for commit: fac8d46. Link to the linter CI: here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_MultimetricScorer deals with _accept_sample_weights inconsistently

1 participant