Skip to content

Conversation

@makasim
Copy link
Contributor

@makasim makasim commented Oct 31, 2025

Describe Your Changes

Adjust slowness-based rerouting logic.

Rerouting now occurs only from the slowest node, and only if the cluster as a whole has enough available capacity to handle the additional load.

See the detailed proposal: #9890

TODO:

  • tests
  • changelog
  • doc
  • real world tests

Checklist

The following checks are mandatory:

@makasim makasim changed the base branch from master to cluster October 31, 2025 14:07
@makasim makasim changed the title Slowest rerouting Enhance slowest rerouting logic Oct 31, 2025
@makasim makasim marked this pull request as ready for review October 31, 2025 17:32
@makasim makasim requested review from f41gh7, rtm0, valyala and zekker6 and removed request for valyala October 31, 2025 17:32
//
// See the comments below for detailed conditions.
func allowRerouting(snSource *storageNode, sns []*storageNode) bool {
if *disableRerouting {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should rerouting be disabled for RF>1 ? AFAIK it does not work reliably at the moment.

@makasim makasim force-pushed the slowest-rerouting branch 2 times, most recently from 85c831b to 701080a Compare October 31, 2025 17:42
@makasim makasim changed the title Enhance slowest rerouting logic vminsert: improve slowness-based rerouting logic Oct 31, 2025
@makasim makasim changed the title vminsert: improve slowness-based rerouting logic app/vminsert: improve slowness-based rerouting logic Oct 31, 2025
Adjust slowness-based rerouting logic.

Rerouting now occurs only from the slowest node, and only if the cluster
as a whole has enough available capacity to handle the additional load.

See the detailed proposal:
#9890
@rtm0
Copy link
Contributor

rtm0 commented Nov 1, 2025

I know that the tests are yet to be added, but could you share some manual steps to reproduce that this PR acrually improves the ingestion rate? Maybe also share the metrics to look at in order to confirm the improvement.

@makasim
Copy link
Contributor Author

makasim commented Nov 3, 2025

I know that the tests are yet to be added, but could you share some manual steps to reproduce that this PR actually improves the ingestion rate? Maybe also share the metrics to look at in order to confirm the improvement.

Do you mean integration tests ? Not sure if one could be written. It would be slow (needs warmup for averages) and fragile. Or some options that reduce warmup time just for the sake of tests.

As for manual tests, I'll do some tests myself and share the results here, and also share the scripts I used for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants