-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Implementation PR
NA
Reference Issues
ENG-2226
AIRTBench-Code/airtbench/main.py
Lines 331 to 350 in 717ccf7
| async def check_for_flags(content: str) -> bool: | |
| for match in re.findall(r"gAAAAA[a-zA-Z0-9\-_=]+", content): | |
| if await check_flag_api(challenge.id, match, args.platform_api_key): | |
| logger.success(f"|- Got the flag for {challenge.id}!") | |
| dn.log_metric("found_flag", 1) | |
| dn.log_metric( | |
| "flag_found", | |
| 1.0, | |
| attributes={ | |
| "challenge_id": challenge.id, | |
| "flag": match[:10] + "...", | |
| }, | |
| ) | |
| return True | |
| return False | |
| # Call this first on the input in case the model already has it | |
| if await check_for_flags(chat.last.content): | |
| return None |
Summary
When a flag is detected in either the model's response or the code execution output, the function immediately returns None. This prematurely terminates the challenge and results in the loss of the full pipeline object, including the crucial context where the flag was found.
Basic Example
steps to reproduce:
Trigger a scenario where a flag is present in either:
- The model's response (
chat.last.content) - The code execution output (
output)
Observe the following code logic:
# Line 367
if await check_for_flags(chat.last.content):
return None
# Line 449
if await check_for_flags(output):
return NoneDrawbacks
NA
Unresolved questions
NA
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels