Skip to content

Commit

Permalink
Added Stochastic Monkeys project
Browse files Browse the repository at this point in the history
  • Loading branch information
jason-vega committed Nov 5, 2024
1 parent f873863 commit 1d5081f
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 4 deletions.
23 changes: 19 additions & 4 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,6 @@ <h2>
<li>
<b>Safety of Large Language Models (LLMs)</b>
<ul>
<li>
Statistically sound evaluation of LLM safety
</li>
<li>
Efficient attacks against open-source LLMs
</li>
Expand All @@ -77,6 +74,24 @@ <h2>
<p>
(* denotes equal contribution)
</p>
<div class="card bg-light text-dark">
<div class="card-body">
<h5 class="card-title">
Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment
</h5>
<h6 class="card-subtitle mb-2 text-muted">
<b>Jason Vega</b>, Junsheng Huang*, Gaokai Zhang*, Hangoo Kang*, Minjia Zhang, Gagandeep Singh
</h6>
<h6 class="card-subtitle mb-2 text-muted">
Arxiv, 2024 (to appear); under peer review
</h6>
<p class="card-text">
We show that low-resource and unsophisticated attackers, i.e. <i>stochastic monkeys</i>, can significantly improve their chances of bypassing safety alignment of SoTA LLMs with just 25 random augmentations per prompt.
</p>
<a href="papers/stochastic_monkeys.pdf" class="card-link">Paper</a>
</div>
</div>
<br>
<div class="card bg-light text-dark">
<div class="card-body">
<h5 class="card-title">
Expand All @@ -89,7 +104,7 @@ <h6 class="card-subtitle mb-2 text-muted">
ICLR 2024, Tiny Papers
</h6>
<p class="card-text">
We investigate the fragility of SOTA open-source LLMs under simple, optimization-free attacks we refer to as <i>priming attacks</i> (also known as prefilling attacks), which are easy to execute and effectively bypass alignment from safety training.
We investigate the fragility of SOTA open-source LLMs under simple, optimization-free attacks we refer to as priming attacks (now known as <i>prefilling attacks</i>), which are easy to execute and effectively bypass alignment from safety training.
</p>
<a href="https://arxiv.org/abs/2312.12321" class="card-link">Paper</a>
<a href="https://github.com/uiuc-focal-lab/llm-priming-attacks" class="card-link">Code</a>
Expand Down
Binary file added papers/stochastic_monkeys.pdf
Binary file not shown.

0 comments on commit 1d5081f

Please sign in to comment.