Skip to content

Conversation

@sid-rl
Copy link
Contributor

@sid-rl sid-rl commented Dec 18, 2025

Add SDK classes for managing benchmarks:

  • Benchmark: Synchronous class for benchmark operations

    • get_info(): Retrieve benchmark details
    • update(): Update benchmark metadata
    • run(): Start a benchmark run, returns BenchmarkRun
    • add_scenarios(): Add scenarios to the benchmark
    • remove_scenarios(): Remove scenarios from the benchmark
    • list_runs(): List benchmark runs with filtering
  • AsyncBenchmark: Async version with the same interface

  • New TypedDicts for SDK params:

    • SDKBenchmarkUpdateParams
    • SDKBenchmarkStartRunParams
    • SDKBenchmarkListRunsParams
  • Unit tests for both sync and async classes

  • E2E smoketests

@sid-rl sid-rl changed the title feat(sdk): add Benchmark and AsyncBenchmark classes WIP feat(sdk): add Benchmark and AsyncBenchmark classes Dec 18, 2025
@sid-rl sid-rl force-pushed the siddarth/benchmark-sdk branch from 4aab259 to 14962a1 Compare December 18, 2025 18:44
@sid-rl sid-rl changed the title WIP feat(sdk): add Benchmark and AsyncBenchmark classes feat(sdk): add Benchmark and AsyncBenchmark classes Dec 18, 2025
@sid-rl sid-rl requested review from james-rl and jrvb-rl December 18, 2025 18:45
@james-rl
Copy link
Contributor

Some minor feedback to make the tests more useful / less mysterious in case of failures, but this generally looks great

Provides async methods for retrieving benchmark details, updating the benchmark,
managing scenarios, and starting benchmark runs. Obtain instances via
``runloop.benchmark.from_id()`` or ``runloop.benchmark.list()``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to create a link to the BenchmarkOps definitions here? That would make the resulting docs really easy to navigate. Eg, maybe something like this?

You obtain a benchmark with the [runloop.benchmark](some useful link) operations, such as runloop.benchmark.create() and runloop.benchmark.list()

Even better if we can link to the specific methods, but that is less critical IMO. (Just as long as we can get people close...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do once i add the BenchmarkOps classes! the plan is to add them in a separate pr once this one is merged

from .async_benchmark_run import AsyncBenchmarkRun


class AsyncBenchmark:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets highlight that this is a handle to benchmark management operations, but that to understand what is in the benchmark, you need a BenchmarkView. This is somewhat stated here, but I think it would be helpful to be more explicit. What do you think of this?


A handle for managing a Runloop Benchmark.

This provides async methods for retrieving benchmark details....

... The [BenchmarkView](some link) object contains details about the contents of the benchmark. The info() call and various update methods all return the most recent benchmark state.

Or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is true of all the classes we have so far: to understand what is actually in the object X, we have to call get_info() and look at the XView. since BenchmarkView is listed as the return type of get_info() and update(), and is documented in the type reference, i think it's fine to leave as is

if benchmark_data.scenario_ids:
scenario = async_sdk_client.scenario.from_id(benchmark_data.scenario_ids[0])
scenario_runs.append(
await scenario.run(benchmark_run_id=run.id, run_name="sdk-smoketest-async-benchmark-run-scenario")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this bit starts the devbox.... We should set a small-ish lifetime for these in case some crash prevents us from cleaning up nicely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't set devbox keep_alive_time from scenario.run, but moved this to within the try block so that our cleanup is handled in case of an error

benchmark_data = benchmarks[0]

# Create Benchmark wrapper
benchmark = Benchmark(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on Benchmark obj creation -- let's use the SDK for these to keep things idiomatic.


# If the benchmark has scenarios, run one
scenario_runs: list[ScenarioRun] = []
if benchmark_data.scenario_ids:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as w/ the async, test, let's do 2 scenario runs

assert len(bench_scenario_runs) == len(scenario_runs)
for bench_scenario_run in bench_scenario_runs:
assert isinstance(bench_scenario_run, ScenarioRun)
assert bench_scenario_run.id == scenario_runs[0].id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same issue w/ the loop here as w/ the async case

@sid-rl sid-rl force-pushed the siddarth/benchmark-sdk branch from 4d967d8 to 1e71c1d Compare December 19, 2025 00:49
@sid-rl sid-rl requested a review from jrvb-rl December 20, 2025 00:43
Copy link
Contributor

@jrvb-rl jrvb-rl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm loving the new smoketests... these are really clean. Thanks for the changes!!

@sid-rl sid-rl merged commit 8909d8a into next Dec 20, 2025
6 of 7 checks passed
@sid-rl sid-rl deleted the siddarth/benchmark-sdk branch December 20, 2025 01:18
@stainless-app stainless-app bot mentioned this pull request Dec 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants