Skip to content

Commit

Permalink
Add eval instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
john-b-yang committed Feb 4, 2025
1 parent 9a2ade5 commit 138c4c1
Show file tree
Hide file tree
Showing 5 changed files with 30 additions and 4 deletions.
Binary file added img/stanford_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -4383,6 +4383,9 @@ <h3 class="text-title" style="margin-bottom:0.5em">Citation</h3>
<a href="https://princeton-nlp.github.io/">
<img src="img/princeton_seal.svg" style="height: 3em;padding-top:0.5em;padding-right: 1em" />
</a>
<a href="https://www.cs.stanford.edu/">
<img src="img/stanford_logo.png" style="height: 3em;padding-top:0.5em;padding-right: 1em;padding-left: 0.25em;" />
</a>
<a href="https://pli.princeton.edu/">
<img src="img/pli_logo.svg" style="height: 3em;padding-top:0.5em;padding-right: 1em" />
</a>
Expand Down
8 changes: 4 additions & 4 deletions multimodal.html
Original file line number Diff line number Diff line change
Expand Up @@ -105,10 +105,10 @@ <h3 style="font-size: 20px; padding-top: 1.2em">ICLR 2025</h3>
<h2 class="text-title">About</h2>
<img src="img/teaser_mm.png" style="width:80%;margin:auto;display:block;"/>
<p class="text-content">
SWE-bench Multimodal is a dataset for evaluating AI systems on visual software engineering tasks.
It contains 619 task instances from 17 popular JavaScript repositories, each featuring images crucial to problem-solving.
The dataset covers a range of challenges including UI glitches, map rendering problems, or data visualization bugs.
SWE-bench Multimodal challenges AI systems to tackle the diverse, multimodal nature of modern software development.
SWE-bench Multimodal is a dataset for evaluating AI systems on visual software engineering tasks.
It contains 619 task instances from 17 popular JavaScript repositories, each featuring images crucial to problem-solving.
The dataset covers a range of challenges including UI glitches, map rendering problems, or data visualization bugs.
SWE-bench Multimodal challenges AI systems to tackle the diverse, multimodal nature of modern software development.
</p class="text-content">
<h3 class="text-title" style="margin-bottom:0.5em">Citation</h3>
<pre id="citation" style="border-color: #2F4F4F;"><code>@misc{yang2024swebenchmultimodalaisystems,
Expand Down
20 changes: 20 additions & 0 deletions submit.html
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,26 @@ <h1 style="font-size: 60px; padding-top: 0.4em">Submit to SWE-bench</h1>
</a>
</div>
</div>
<div class="content-wrapper">
<div class="content-box">
<h3>
Evaluating on SWE-bench
</h3>
<p>
Check out the main <a href="https://github.com/swe-bench/SWE-bench">SWE-bench</a> repository
for instructions on how to generate and evaluate predictions on SWE-bench [Lite, Verified, Multimodal].
</p>
<p style="margin-top: 0.5em">
SWE-bench evaluation can be carried out either locally or via cloud compute platforms with our
<a href="https://github.com/swe-bench/sb-cli/">sb-cli</a> tool (Recommended) or
<a href="https://github.com/swe-bench/SWE-bench/blob/main/assets/evaluation.md">Modal</a>.
</p>
<p style="margin-top: 0.5em">
Evaluation for the test split of SWE-bench Multimodal is exclusively available via
<a href="https://github.com/swe-bench/sb-cli/">sb-cli</a>.
</p>
</div>
</div>
<div class="content-wrapper">
<div class="content-box">
<h3>
Expand Down
3 changes: 3 additions & 0 deletions template/template_index.html
Original file line number Diff line number Diff line change
Expand Up @@ -460,6 +460,9 @@ <h3 class="text-title" style="margin-bottom:0.5em">Citation</h3>
<a href="https://princeton-nlp.github.io/">
<img src="img/princeton_seal.svg" style="height: 3em;padding-top:0.5em;padding-right: 1em" />
</a>
<a href="https://www.cs.stanford.edu/">
<img src="img/stanford_logo.png" style="height: 3em;padding-top:0.5em;padding-right: 1em;padding-left: 0.25em;" />
</a>
<a href="https://pli.princeton.edu/">
<img src="img/pli_logo.svg" style="height: 3em;padding-top:0.5em;padding-right: 1em" />
</a>
Expand Down

0 comments on commit 138c4c1

Please sign in to comment.