fix bugs

HoratioJSY · HoratioJSY · commit 4c0ca671c37b · 2024-04-11T11:56:09.000+08:00
diff --git a/README.md b/README.md
@@ -398,7 +398,11 @@ For each evaluation result in Table 3, there are more detailed evaluation dimens
 
 Another important capability of large language models for code is the ability to understand code context across files, as developers often need to consider information from other files within the current project when writing code. Therefore, we adopted the CrossCodeEval (Ding et al., 2023) evaluation dataset to assess the model's ability to extract cross-file contextual information.
 
-In Table 8, we first evaluate the generation capability of each large language model for code in a single-file setting as a baseline. Then, using BM25 as the similarity metric, we search for similar code within the project based on the context and use it as a prompt to re-evaluate the model's generation performance. Finally, "w/Ref." represents the scenario where we assume we know what the correct reference code looks like, and we search for similar code within the project using the references as a prompt to re-evaluate the model's generation performance. Ultimately, the aiXcoder-7B model achieves the best performance in all languages, indicating that our model has the strongest ability to extract contextual information, especially cross-file contextual information.
+In Table 8, we fix the context length for all models at 16K and format the input using the PSM pattern in FIM. After the model completes inference, all output results are decoded using Greedy Search. First, as a baseline, we evaluate the generation capabilities of various large code models in a single-file scenario.
+
+Then, using BM25 as the similarity metric, we search for the three most similar code blocks within the project as prompts to reassess the model's generation performance. Finally, "w/Ref." indicates that we assume we know what the correct Reference code looks like, and then search for the three most similar codes within the project as prompts to re-evaluate the model's generation performance.
+
+Ultimately, the aiXcoder-7B model performs very well in all languages, demonstrating our model's ability to extract contextual information, especially cross-file contextual information.
 
 ![table_8](./assets/table_8.png)
 
diff --git a/README_CN.md b/README_CN.md
@@ -393,7 +393,11 @@ Table 3 展示了不同模型在不同语言上的平均生成效果，最终的
 
 代码大模型另一个比较重要的能力是跨文件的代码上下文理解能力，因为开发者在实际编写项目中，经常需要考虑当前项目其它文件内的信息。因此我们采用了CrossCodeEval (Ding et al., 2023)评测集，来评估模型提取跨文件上下文信息的能力。
 
-在 Table 8 中，首先作为 Baseline，在单文件的情况下评测各代码大模型的生成能力。然后在 BM25 为相似性指标的情况下，通过上文搜索项目内相似的代码并作为prompt，再次评估模型的生成效果。最后，w/Ref. 表示假设我们知道正确的References 代码是什么样的，然后通过 References 搜索项目内相似的代码作为prompt，再次评估模型的生成效果。最终 aiXcoder-7B 模型在所有语言上的效果都是很好的，这证明了我们模型在提取上下文信息上，尤其是跨文件的上下文信息的能力。
+在 Table 8 中，我们固定所有模型的上下文长度为 16K，并通过 FIM 中的 PSM 模式作为输入构造的格式，模型完成推理后，所有输出结果都以 Greedy Search 的方式解码。首先作为 Baseline，我们在单文件的情况下评测各代码大模型的生成能力。
+
+然后在 BM25 为相似性指标的情况下，通过上文搜索项目内最相似的三个代码块作为prompt，再次评估模型的生成效果。最后，w/Ref. 表示假设我们知道正确的References 代码是什么样的，然后通过 References 搜索项目内最相似的三个代码作为prompt，再次评估模型的生成效果。
+
+最终 aiXcoder-7B 模型在所有语言上的效果都是很好的，这证明了我们模型在提取上下文信息上，尤其是跨文件的上下文信息的能力。
 
 ![table_8](./assets/table_8.png)
 
diff --git a/hf_mini/utils.py b/hf_mini/utils.py
@@ -1117,4 +1117,4 @@ def input_wrapper(code_string, later_code: str = "", path: str = "") -> str:
         des = LANGUAGE_WRAPPER.get(lang, "")
         if len(des) > 0 and "<AIX-SPE>" in des:
             p = des.replace("<AIX-SPE>", f"the file path is: {path}") + "\n"
-    return f"▁<AIX-SPAN-PRE>▁<AIX-SPAN-POST>{later_code}▁<AIX-SPAN-MIDDLE>{p}{code_string}"
+    return f"<s>▁<AIX-SPAN-PRE>▁<AIX-SPAN-POST>{later_code}▁<AIX-SPAN-MIDDLE>{p}{code_string}"