Skip to content

Commit 091e23e

Browse files
author
anonymous
committed
Add submission instructions in README.md
1 parent 5755a3e commit 091e23e

File tree

124 files changed

+30
-118908
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

124 files changed

+30
-118908
lines changed

README.md

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,25 @@
22

33
JavaBench is a project-level Java benchmark that contains four projects at graduate-level difficulty. The difficulty and quality of JavaBench is validated and guaranteed by graduate students across four years. Please check our [Leaderboard](https://java-bench.github.io/leaderboard.html) for the visualization of the evaluation results.
44

5+
## Updates
6+
7+
- 2024-06-08 Publish benchmark and leaderboard
8+
- 2024-07-24 Add instructions for submitting results
9+
510
## Benchmark Dataset
611

712
The four Java projects in JavaBench are designed for undergraduate students throughout the four aca-demic years from 2019 to 2022. We then use students’ overall scores as evidence of difficulty levels.
813

914
![Dataset](./paper_plot/images/projects.png)
1015

1116
The benchmark dataset is accessible at `./datasets`. We provide three types of datasets with difference context settings.
17+
1218
- Maximum Context: The dataset contains the context information as much as possible (Limited by LLMs).
1319
- Minimum Context: The dataset contains the no context information.
1420
- Selective Context: The dataset contains the context information that only includes method signatures of dependencies extracted by [jdeps](https://docs.oracle.com/en/java/javase/11/tools/jdeps.html).
1521

1622
Below is the structure of the dataset:
23+
1724
- `task_id`: The ID of the completion task, composed of the assignment number and class name.
1825
- `target`: The file path of the task in the Java project.
1926
- `code`: The code snippet that needs to be completed with `// TODO`.
@@ -127,7 +134,7 @@ For example:
127134

128135
```bash
129136
python evaluation.py test-wise \
130-
--output output/result-PA19/gpt-3.5-turbo/test-wise_result.json \
137+
--output output/result-PA19/gpt-3.5-turbo/result-full.json \
131138
--tests data/dataset/testcase/test-PA19.jsonl \
132139
output/result-PA19/gpt-3.5-turbo/samples.jsonl
133140
```
@@ -142,6 +149,27 @@ Below are the instructions for the class-wise evalution output format:
142149
- `has_todo`: Indicates whether the inference result contains `// TODO`, as LLMs may exhibit laziness.
143150
- `can_replace`: Indicates whether the inference result contains a complete class.
144151

152+
### Submission
153+
154+
Now you have three files:
155+
156+
- `samples.jsonl`: Completed code generated by LLMs.
157+
- `single_class.json`: Evaluation results of class-wise granularity.
158+
- `result-full.json`: Evaluation results of test-wise granularity.
159+
160+
**If you're having trouble with the evalution step, you can just upload `samples.jsonl` and we'll evaluate it for you!**
161+
162+
The next step is to submit a pull request for the project:
163+
164+
1. [Fork](https://help.github.com/articles/fork-a-repo/) the repository into your own GitHub account.
165+
2. [Clone](https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository) the repository to your local.
166+
3. Checkout a new branch from main.
167+
4. Make a new directory under the output folder corresponding to the dataset(e.g. `./output/holistic-selective/result-PA19/gpt-3.5-turbo-1106`) and copy all the files above.
168+
5. Submit the Pull Request.
169+
6. The maintainers will review your Pull Request soon.
170+
171+
Once your pull request is accepted, we will update the [Leaderboard](https://java-bench.github.io/leaderboard.html) with your results.
172+
145173
## Contributors
146174

147-
## Citation
175+
## Citation

0 commit comments

Comments
 (0)