Skip to content

Commit 96db4e9

Browse files
authored
Merge pull request #128 from stanfordnlp/frankaging-patch-1
[Minor] Update README.md
2 parents 4ac51e4 + 134dd44 commit 96db4e9

File tree

1 file changed

+13
-19
lines changed

1 file changed

+13
-19
lines changed

README.md

Lines changed: 13 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
<br />
22
<div align="center">
33
<h1 align="center"><img src="https://i.ibb.co/BNkhQH3/pyvene-logo.png"></h1>
4-
<a href="https://nlp.stanford.edu/~wuzhengx/"><strong>Library Paper and Doc Are Forthcoming »</strong></a>
4+
<a href="https://arxiv.org/abs/2403.07809"><strong>Read Our Paper »</strong></a>
55
</div>
66

77
<br />
@@ -241,6 +241,18 @@ intervenable.train_alignment(
241241
```
242242
where you need to pass in a trainable dataset, and your customized loss and metrics function. The trainable interventions can later be saved on to your disk. You can also use `intervenable.evaluate()` your interventions in terms of customized objectives.
243243

244+
## Citation
245+
Library paper is forthcoming. For now, if you use this repository, please consider to cite relevant papers:
246+
```stex
247+
@article{wu2024pyvene,
248+
title={pyvene: A Library for Understanding and Improving {P}y{T}orch Models via Interventions},
249+
author={Wu, Zhengxuan and Geiger, Atticus and Arora, Aryaman and Huang, Jing and Wang, Zheng and Noah D. Goodman and Christopher D. Manning and Christopher Potts},
250+
booktitle={arXiv:2403.07809},
251+
url={arxiv.org/abs/2403.07809},
252+
year={2024}
253+
}
254+
```
255+
244256
## Related Works in Discovering Causal Mechanism of LLMs
245257
If you would like to read more works on this area, here is a list of papers that try to align or discover the causal mechanisms of LLMs.
246258
- [Causal Abstractions of Neural Networks](https://arxiv.org/abs/2106.02997): This paper introduces interchange intervention (a.k.a. activation patching or causal scrubbing). It tries to align a causal model with the model's representations.
@@ -253,21 +265,3 @@ If you would like to read more works on this area, here is a list of papers that
253265
## Star History
254266

255267
[![Star History Chart](https://api.star-history.com/svg?repos=stanfordnlp/pyvene&type=Date)](https://star-history.com/#stanfordnlp/pyvene&Date)
256-
257-
## Citation
258-
Library paper is forthcoming. For now, if you use this repository, please consider to cite relevant papers:
259-
```stex
260-
@article{geiger-etal-2023-DAS,
261-
title={Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations},
262-
author={Geiger, Atticus and Wu, Zhengxuan and Potts, Christopher and Icard, Thomas and Goodman, Noah},
263-
year={2023},
264-
booktitle={arXiv}
265-
}
266-
267-
@article{wu-etal-2023-Boundless-DAS,
268-
title={Interpretability at Scale: Identifying Causal Mechanisms in Alpaca},
269-
author={Wu, Zhengxuan and Geiger, Atticus and Icard, Thomas and Potts, Christopher and Goodman, Noah},
270-
year={2023},
271-
booktitle={NeurIPS}
272-
}
273-
```

0 commit comments

Comments
 (0)