You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<li><strong>OmniGIRL</strong> is a multilingual & multimodal GitHub-issue-resolution benchmark with <strong>959 tasks</strong> spanning four programming languages. Inputs may include text, screenshots, rendered web pages and other modalities.</li>
95
+
<li>For realistic evaluation, <em>we recommend</em> that methods automatically examine each task’s raw input to detect available modalities (e.g., embedded webpages, images), retrieve the relevant content by themselves, and invoke the appropriate tools— instead of relying on manual hints. Doing so better assesses a solver’s <strong>general-purpose issue-resolution ability in real-world scenarios</strong>.</li>
96
+
<li>Our baseline system is released <em>for research purposes only</em>; please cite OmniGIRL if you use it.</li>
97
+
</ol>
98
+
</div>
99
+
</div>
100
+
101
+
<!-- 📨 How to Submit -->
102
+
<divid="notes" class="w-100">
103
+
<h3>📨 How to Submit</h3>
104
+
<divclass="inline-block mt-3">
105
+
<ol>
106
+
<li>Prepare a <code>.json</code> or <code>.jsonl</code> file. Each record must contain at least the keys <code>instance_id</code>, <code>model_name_or_path</code>, and <code>model_patch</code>.</li>
107
+
<li>Email the file to <ahref="mailto:guolh8@mail2.sysu.edu.cn?subject=OmniGIRL%20Submission">guolh8@mail2.sysu.edu.cn</a>.</li>
108
+
<li>We will evaluate your submission locally and update the leaderboard once the results are verified.</li>
<li>We build on prior work — <strong><ahref="https://arxiv.org/abs/2310.06770" target="_blank">SWE-bench</a></strong>, <strong><ahref="https://arxiv.org/abs/2407.01489" target="_blank">Agentless</a></strong>, and <strong><ahref="https://arxiv.org/abs/2404.05427" target="_blank">AutoCodeRover</a></strong> — which laid the groundwork for this study.</li>
145
+
<li>We thank the <strong><ahref="https://github.com/evalplus/evalplus" target="_blank">EvalPlus leaderboard</a></strong> team for releasing the elegant page template that inspired this site.</li>
146
+
<li>Finally, we are grateful to the <strong>open-source developer community</strong> for their invaluable contributions.</li>
0 commit comments