fix(mbpp): add special oracles by soryxie · Pull Request #212 · evalplus/evalplus

soryxie · 2024-06-09T03:08:52Z

fix #210
Add special oracles for

Mbpp/581
Mbpp/558

They have more than one solutions which could be accepted.

Tested with base_input and plus_input
Tested with evaluate framework

ganler · 2024-06-10T03:02:38Z

maybe also need to fix https://github.com/evalplus/evalplus/blob/master/tools/mbpp/to_original_fmt.py

soryxie · 2024-06-17T14:46:24Z

maybe also need to fix https://github.com/evalplus/evalplus/blob/master/tools/mbpp/to_original_fmt.py

I think it's not necessary to modify. This PR merely adds two additional valid solutions, while the original canonical_solution are also valid :)

soryxie · 2024-06-17T14:48:52Z

I have tested this on EvalPlus, and this modification does not affect other problems. After the modification, it allows for more solutions to pass.

ganler · 2024-06-18T04:13:11Z

maybe also need to fix https://github.com/evalplus/evalplus/blob/master/tools/mbpp/to_original_fmt.py

I think it's not necessary to modify. This PR merely adds two additional valid solutions, while the original canonical_solution are also valid :)

Because now we added special oracles, we should also reflect such oracles when exporting them to original formats, just like:

evalplus/tools/humaneval/to_original_fmt.py

Lines 82 to 85 in d4981ad

    
           if entry_point == "find_zero": 
        
               imports.add("import math") 
        
               aux_fn = inspect.getsource(_poly) + "\n" 
        
               assertion = f"assert _poly(*candidate(*inp), inp) <= {atol}"

soryxie · 2024-06-18T07:11:06Z

I see.
The original fmt dataset works well in this test script now.

# test 581
exec_code_0 = """\
def surface_Area(base, height):
    return (base * base) + (2 * base * height)
"""

exec_code_1 = """\
import math
def surface_Area(base_edge, height):
    slant_height = math.sqrt((base_edge / 2) ** 2 + height ** 2)
    base_area = base_edge ** 2
    lateral_area = 4 * (base_edge * slant_height) / 2
    total_surface_area = base_area + lateral_area
    return round(total_surface_area)
"""
exec(exec_code_0+data[581]['test'], globals())
exec(exec_code_1+data[581]['test'], globals())

# test 558
exec_code_0 = """\
def digit_distance_nums(n1, n2):
    return sum([abs(int(c1) - int(c2)) for c1, c2 in zip(str(n1), str(n2))])
"""

exec_code_1 = """\
def digit_distance_nums(num1: int, num2: int) -> int:
    str_num1 = str(num1)
    str_num2 = str(num2)

    max_length = max(len(str_num1), len(str_num2))

    padded_num1 = str_num1.zfill(max_length)
    padded_num2 = str_num2.zfill(max_length)

    return sum(abs(int(digit1) - int(digit2)) for digit1, digit2 in zip(padded_num1, padded_num2))
"""

exec(exec_code_0+data[558]['test'], globals())
exec(exec_code_1+data[558]['test'], globals())

fix(mbpp): add special oracles

c36c298

soryxie requested a review from ganler June 9, 2024 03:09

fix to original fmt

477ef08

ganler merged commit f86db47 into master Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mbpp): add special oracles#212

fix(mbpp): add special oracles#212
ganler merged 2 commits intomasterfrom
add_mbpp_oracle

soryxie commented Jun 9, 2024 •

edited

Loading

Uh oh!

ganler commented Jun 10, 2024

Uh oh!

soryxie commented Jun 17, 2024

Uh oh!

soryxie commented Jun 17, 2024

Uh oh!

ganler commented Jun 18, 2024

Uh oh!

soryxie commented Jun 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

soryxie commented Jun 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ganler commented Jun 10, 2024

Uh oh!

soryxie commented Jun 17, 2024

Uh oh!

soryxie commented Jun 17, 2024

Uh oh!

ganler commented Jun 18, 2024

Uh oh!

soryxie commented Jun 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

soryxie commented Jun 9, 2024 •

edited

Loading