diff --git a/img/stanford_logo.png b/img/stanford_logo.png new file mode 100644 index 0000000..88cbd3e Binary files /dev/null and b/img/stanford_logo.png differ diff --git a/index.html b/index.html index b51e12d..6f6a70e 100644 --- a/index.html +++ b/index.html @@ -4383,6 +4383,9 @@
- SWE-bench Multimodal is a dataset for evaluating AI systems on visual software engineering tasks. - It contains 619 task instances from 17 popular JavaScript repositories, each featuring images crucial to problem-solving. - The dataset covers a range of challenges including UI glitches, map rendering problems, or data visualization bugs. - SWE-bench Multimodal challenges AI systems to tackle the diverse, multimodal nature of modern software development. + SWE-bench Multimodal is a dataset for evaluating AI systems on visual software engineering tasks. + It contains 619 task instances from 17 popular JavaScript repositories, each featuring images crucial to problem-solving. + The dataset covers a range of challenges including UI glitches, map rendering problems, or data visualization bugs. + SWE-bench Multimodal challenges AI systems to tackle the diverse, multimodal nature of modern software development.
@misc{yang2024swebenchmultimodalaisystems,
diff --git a/submit.html b/submit.html
index 32a824f..0aacdcb 100644
--- a/submit.html
+++ b/submit.html
@@ -95,6 +95,26 @@ Submit to SWE-bench
+
+
+
+ Evaluating on SWE-bench
+
+
+ Check out the main SWE-bench repository
+ for instructions on how to generate and evaluate predictions on SWE-bench [Lite, Verified, Multimodal].
+
+
+ SWE-bench evaluation can be carried out either locally or via cloud compute platforms with our
+ sb-cli tool (Recommended) or
+ Modal.
+
+
+ Evaluation for the test split of SWE-bench Multimodal is exclusively available via
+ sb-cli.
+
+
+
diff --git a/template/template_index.html b/template/template_index.html
index b7fdaed..5ae31b8 100644
--- a/template/template_index.html
+++ b/template/template_index.html
@@ -460,6 +460,9 @@ Citation
+
+
+