Replies: 1 comment
-
|
hey @kuatroka, are you asking about promptfoo UI or deepeval UI? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I want to evaluate different coding agents with different models and create/annotate and evaluate the output manually without any actual model runs or api calls.
I want to give codex/claude code/droid/opencode, etc... any permutation of models for each agent the same prompt and run them > evaluate manually and annotate if they did what I asked + annotate some subjective outcome ( pretty ui or bad ui, etc) > view reporting results after my experiments with different permutations of agent/model + prompt versions.
Basically I want an option to create a custom model name ( not a full working setup, no working api). For example "codex/gpt-5-codex-mini", "codex/gpt-5-codex-high", "cc/opus-4-5" and give them "prompt20-v-1" and "prompt20-v-2" and without running them through promptfoo annotate the result and even skip the output completely or fill it manually.
Is there a way to do it now without creating any new features?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions