Added Stochastic Monkeys project

jason-vega · Nov 5, 2024 · 1d5081f · 1d5081f
1 parent f873863
commit 1d5081f
Show file tree

Hide file tree

Showing 2 changed files with 19 additions and 4 deletions.
diff --git a/index.html b/index.html
@@ -61,9 +61,6 @@ <h2>
                             <li>
                                 <b>Safety of Large Language Models (LLMs)</b>
                                 <ul>
-                                    <li>
-                                        Statistically sound evaluation of LLM safety
-                                    </li>
                                     <li>
                                         Efficient attacks against open-source LLMs
                                     </li>
@@ -77,6 +74,24 @@ <h2>
                         <p>
                             (* denotes equal contribution)
                         </p>
+                        <div class="card bg-light text-dark">
+                            <div class="card-body">
+                                <h5 class="card-title">
+                                    Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment
+                                </h5>
+                                <h6 class="card-subtitle mb-2 text-muted">
+                                    <b>Jason Vega</b>, Junsheng Huang*, Gaokai Zhang*, Hangoo Kang*, Minjia Zhang, Gagandeep Singh
+                                </h6>
+                                <h6 class="card-subtitle mb-2 text-muted">
+                                    Arxiv, 2024 (to appear); under peer review
+                                </h6>
+                                <p class="card-text">
+                                    We show that low-resource and unsophisticated attackers, i.e. <i>stochastic monkeys</i>, can significantly improve their chances of bypassing safety alignment of SoTA LLMs with just 25 random augmentations per prompt.
+                                </p>
+                                <a href="papers/stochastic_monkeys.pdf" class="card-link">Paper</a>
+                            </div>
+                        </div>
+                        <br>
                         <div class="card bg-light text-dark">
                             <div class="card-body">
                                 <h5 class="card-title">
@@ -89,7 +104,7 @@ <h6 class="card-subtitle mb-2 text-muted">
                                     ICLR 2024, Tiny Papers
                                 </h6>
                                 <p class="card-text">
-                                    We investigate the fragility of SOTA open-source LLMs under simple, optimization-free attacks we refer to as <i>priming attacks</i> (also known as prefilling attacks), which are easy to execute and effectively bypass alignment from safety training.
+                                    We investigate the fragility of SOTA open-source LLMs under simple, optimization-free attacks we refer to as priming attacks (now known as <i>prefilling attacks</i>), which are easy to execute and effectively bypass alignment from safety training.
                                 </p>
                                 <a href="https://arxiv.org/abs/2312.12321" class="card-link">Paper</a>
                                 <a href="https://github.com/uiuc-focal-lab/llm-priming-attacks" class="card-link">Code</a>

diff --git a/papers/stochastic_monkeys.pdf b/papers/stochastic_monkeys.pdf