-
Notifications
You must be signed in to change notification settings - Fork 1
feat(sdk): add Benchmark and AsyncBenchmark classes #714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f79e46c
1e71c1d
6d95fc0
b67f5f7
591aa7a
4959d83
0f8c60f
bf54da8
1cd6d70
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,164 @@ | ||
| """AsyncBenchmark resource class for asynchronous operations.""" | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from typing import List | ||
| from typing_extensions import Unpack, override | ||
|
|
||
| from ..types import BenchmarkView | ||
| from ._types import ( | ||
| BaseRequestOptions, | ||
| LongRequestOptions, | ||
| SDKBenchmarkUpdateParams, | ||
| SDKBenchmarkListRunsParams, | ||
| SDKBenchmarkStartRunParams, | ||
| ) | ||
| from .._types import SequenceNotStr | ||
| from .._client import AsyncRunloop | ||
| from .async_benchmark_run import AsyncBenchmarkRun | ||
|
|
||
|
|
||
| class AsyncBenchmark: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lets highlight that this is a handle to benchmark management operations, but that to understand what is in the benchmark, you need a BenchmarkView. This is somewhat stated here, but I think it would be helpful to be more explicit. What do you think of this? A handle for managing a Runloop Benchmark. This provides async methods for retrieving benchmark details.... ... The [BenchmarkView](some link) object contains details about the contents of the benchmark. The info() call and various update methods all return the most recent benchmark state.Or something like that?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is true of all the classes we have so far: to understand what is actually in the object X, we have to call |
||
| """A benchmark for evaluating agent performance across scenarios (async). | ||
|
|
||
| Provides async methods for retrieving benchmark details, updating the benchmark, | ||
| managing scenarios, and starting benchmark runs. Obtain instances via | ||
| ``runloop.benchmark.from_id()`` or ``runloop.benchmark.list()``. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a way to create a link to the BenchmarkOps definitions here? That would make the resulting docs really easy to navigate. Eg, maybe something like this? You obtain a benchmark with the [runloop.benchmark](some useful link) operations, such as Even better if we can link to the specific methods, but that is less critical IMO. (Just as long as we can get people close...)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will do once i add the BenchmarkOps classes! the plan is to add them in a separate pr once this one is merged |
||
|
|
||
| Example: | ||
| >>> benchmark = runloop.benchmark.from_id("bmd_xxx") | ||
| >>> info = await benchmark.get_info() | ||
sid-rl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| >>> run = await benchmark.start_run(run_name="evaluation-v1") | ||
| >>> for scenario_id in info.scenario_ids: | ||
| ... scenario = await runloop.scenario.from_id(scenario_id) | ||
| ... scenario_run = await scenario.run(benchmark_run_id=run.id, run_name="evaluation-v1") | ||
| """ | ||
|
|
||
| def __init__(self, client: AsyncRunloop, benchmark_id: str) -> None: | ||
| """Create an AsyncBenchmark instance. | ||
|
|
||
| :param client: AsyncRunloop client instance | ||
| :type client: AsyncRunloop | ||
| :param benchmark_id: Benchmark ID | ||
| :type benchmark_id: str | ||
| """ | ||
| self._client = client | ||
| self._id = benchmark_id | ||
|
|
||
| @override | ||
| def __repr__(self) -> str: | ||
| return f"<AsyncBenchmark id={self._id!r}>" | ||
|
|
||
| @property | ||
| def id(self) -> str: | ||
| """Return the benchmark ID. | ||
|
|
||
| :return: Unique benchmark ID | ||
| :rtype: str | ||
| """ | ||
| return self._id | ||
|
|
||
| async def get_info( | ||
| self, | ||
| **options: Unpack[BaseRequestOptions], | ||
| ) -> BenchmarkView: | ||
| """Retrieve current benchmark details. | ||
|
|
||
| :param options: See :typeddict:`~runloop_api_client.sdk._types.BaseRequestOptions` for available options | ||
| :return: Current benchmark info | ||
| :rtype: BenchmarkView | ||
| """ | ||
| return await self._client.benchmarks.retrieve( | ||
| self._id, | ||
| **options, | ||
| ) | ||
|
|
||
| async def update( | ||
| self, | ||
| **params: Unpack[SDKBenchmarkUpdateParams], | ||
| ) -> BenchmarkView: | ||
| """Update the benchmark. | ||
|
|
||
| Only provided fields will be updated. | ||
|
|
||
| :param params: See :typeddict:`~runloop_api_client.sdk._types.SDKBenchmarkUpdateParams` for available parameters | ||
| :return: Updated benchmark info | ||
| :rtype: BenchmarkView | ||
| """ | ||
| return await self._client.benchmarks.update( | ||
| self._id, | ||
| **params, | ||
| ) | ||
|
|
||
| async def start_run( | ||
| self, | ||
| **params: Unpack[SDKBenchmarkStartRunParams], | ||
| ) -> AsyncBenchmarkRun: | ||
| """Start a new benchmark run. | ||
|
|
||
| Creates a new benchmark run and returns an AsyncBenchmarkRun instance for | ||
| managing the run lifecycle. | ||
|
|
||
| :param params: See :typeddict:`~runloop_api_client.sdk._types.SDKBenchmarkStartRunParams` for available parameters | ||
| :return: AsyncBenchmarkRun instance for managing the run | ||
| :rtype: AsyncBenchmarkRun | ||
| """ | ||
| run_view = await self._client.benchmarks.start_run( | ||
| benchmark_id=self._id, | ||
| **params, | ||
| ) | ||
| return AsyncBenchmarkRun(self._client, run_view.id, run_view.benchmark_id) | ||
|
|
||
| async def add_scenarios( | ||
| self, | ||
| scenario_ids: SequenceNotStr[str], | ||
| **options: Unpack[LongRequestOptions], | ||
| ) -> BenchmarkView: | ||
| """Add scenarios to the benchmark. | ||
|
|
||
| :param scenario_ids: List of scenario IDs to add | ||
| :type scenario_ids: SequenceNotStr[str] | ||
| :param options: See :typeddict:`~runloop_api_client.sdk._types.LongRequestOptions` for available options | ||
| :return: Updated benchmark info | ||
| :rtype: BenchmarkView | ||
| """ | ||
| return await self._client.benchmarks.update_scenarios( | ||
| self._id, | ||
| scenarios_to_add=scenario_ids, | ||
| **options, | ||
| ) | ||
|
|
||
| async def remove_scenarios( | ||
| self, | ||
| scenario_ids: SequenceNotStr[str], | ||
| **options: Unpack[LongRequestOptions], | ||
| ) -> BenchmarkView: | ||
| """Remove scenarios from the benchmark. | ||
|
|
||
| :param scenario_ids: List of scenario IDs to remove | ||
| :type scenario_ids: SequenceNotStr[str] | ||
| :param options: See :typeddict:`~runloop_api_client.sdk._types.LongRequestOptions` for available options | ||
| :return: Updated benchmark info | ||
| :rtype: BenchmarkView | ||
| """ | ||
| return await self._client.benchmarks.update_scenarios( | ||
| self._id, | ||
| scenarios_to_remove=scenario_ids, | ||
| **options, | ||
| ) | ||
|
|
||
| async def list_runs( | ||
| self, | ||
| **params: Unpack[SDKBenchmarkListRunsParams], | ||
| ) -> List[AsyncBenchmarkRun]: | ||
| """List all runs for this benchmark. | ||
|
|
||
| :param params: See :typeddict:`~runloop_api_client.sdk._types.SDKBenchmarkListRunsParams` for available parameters | ||
| :return: List of async benchmark runs | ||
| :rtype: List[AsyncBenchmarkRun] | ||
| """ | ||
| page = await self._client.benchmarks.runs.list( | ||
| benchmark_id=self._id, | ||
| **params, | ||
| ) | ||
| return [AsyncBenchmarkRun(self._client, run.id, run.benchmark_id) for run in page.runs] | ||
Uh oh!
There was an error while loading. Please reload this page.