Skip to content

Commit f732cf9

Browse files
add webpage files
1 parent 4153a1e commit f732cf9

File tree

6 files changed

+214
-0
lines changed

6 files changed

+214
-0
lines changed

analysis.png

186 KB
Loading

corrected-2.png

146 KB
Loading

corrected.png

1.23 MB
Loading

laser.html

Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
2+
<html xmlns="http://www.w3.org/1999/xhtml">
3+
4+
<head>
5+
6+
<!-- Global site tag (gtag.js) - Google Analytics -->
7+
<script async src="https://www.googletagmanager.com/gtag/js?id=G-GDXSC5Y2BD"></script>
8+
<script>
9+
window.dataLayer = window.dataLayer || [];
10+
function gtag(){dataLayer.push(arguments);}
11+
gtag('js', new Date());
12+
13+
gtag('config', 'G-GDXSC5Y2BD');
14+
</script>
15+
16+
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
17+
18+
<script src="./files/head.js"></script>
19+
20+
<meta name="viewport" content="width=device-width, initial-scale=1">
21+
22+
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
23+
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
24+
25+
<meta name="keywords" content="MIT,Microsoft Research, Machine Learning,Rank Reduction,Computer Science,Machine,Artificial,Intelligence">
26+
27+
<title>The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction</title>
28+
<link rel="stylesheet" href="./files/font.css">
29+
<link rel="stylesheet" href="./files/main.css">
30+
31+
<link rel="stylesheet" type="text/css"
32+
href="https://cdn.rawgit.com/dreampulse/computer-modern-web-font/master/fonts.css">
33+
<style>
34+
body {
35+
font-family: "Computer Modern Serif", serif;
36+
font-size: 14pt;
37+
}
38+
39+
40+
* {padding:0;margin:0;box-sizing:border-box;}
41+
#video {
42+
position: relative;
43+
padding-bottom: 45%; /* 16:9 */
44+
height: 0;
45+
}
46+
#video iframe {
47+
position: absolute;
48+
top: 0;
49+
left: 0;
50+
width: 80%;
51+
height: 100%;
52+
transform: translateX(12.5%);
53+
}
54+
55+
</style>
56+
57+
<style type="text/css">/**
58+
* Style sheet used by new LibX tooltip code
59+
*/
60+
61+
/* We insert a <div> with libx-tooltip style under the body.
62+
* This will inherit body's style - we can't afford to inherit undesirable
63+
* styles and we must redefine what we need. OTOH, some things, e.g.
64+
* font-size, might be ok to be inherited to stay within the page's tone.
65+
*/
66+
.libx-tooltip {
67+
display: none;
68+
overflow: visible;
69+
padding: 5px;
70+
z-index: 100;
71+
background-color: #eee;
72+
color: #000;
73+
font-weight: normal;
74+
font-style: normal;
75+
text-align: left;
76+
border: 2px solid #666;
77+
border-radius: 5px;
78+
-webkit-border-radius: 5px;
79+
-moz-border-radius: 5px;
80+
}
81+
82+
.libx-tooltip p {
83+
/* override default 1em margin to keep paragraphs inside a tooltip closer together. */
84+
margin: .2em;
85+
}
86+
</style><style type="text/css">/**
87+
* Style sheet used by LibX autolinking code
88+
*/
89+
.libx-autolink {
90+
}
91+
92+
</style>
93+
94+
</head>
95+
96+
<body>
97+
98+
<div class="outercontainer">
99+
<div class="container">
100+
101+
<div class="content project_title">
102+
<center>
103+
<br>
104+
<h2>The Truth Is In There: Improving Reasoning in Language Models <br>with Layer-Selective Rank Reduction</h2>
105+
<div class="authors">
106+
<a href="https://pratyushasharma.github.io/">Pratyusha Sharma</a>,
107+
<a href="https://www.jordantash.com/">Jordan Ash*</a>, and
108+
<a href="https://dipendramisra.com/">Dipendra Misra*</a>
109+
</div>
110+
<!-- <br> -->
111+
<!-- <a href="https://arxiv.org/abs/2106.02039">Paper</a> -->
112+
<!-- <a href="./trajectory-transformer-neurips-2021.pdf">Paper</a> -->
113+
<div>
114+
<span class="tag">
115+
<a href="https://arxiv.org/abs/2106.02039">Paper</a>&nbsp;
116+
<!-- <a href="./trajectory-transformer-neurips-2021.pdf">Paper</a>&nbsp; -->
117+
<a href="https://github.com/JannerM/trajectory-transformer">Code</a>&nbsp;
118+
<a href="files/bib.txt">BibTex</a>&nbsp;
119+
</span>
120+
</div>
121+
</center>
122+
</div>
123+
124+
<br><br>
125+
126+
<div class="content">
127+
<center>
128+
<div class="text">
129+
<p>
130+
<div class="title"><b>Summary</b></div>
131+
<!-- <b>
132+
<font size="5">Summary</font>
133+
</b> -->
134+
<!-- &nbsp; -->
135+
Transformer-based Large Language Models (LLMs) have become a fixture in modern machine learning. <br>
136+
Correspondingly, significant resources are allocated towards research that aims to further advance this technology, <br>typically resulting in models of increasing size that are trained on increasing amounts of data. <br>
137+
This work, however, demonstrates the surprising result that it is often possible to improve the performance of LLMs by <br>simply removing higher-order components of their constituent weight matrices in the multi-layer perception (MLP) <br>layers. This simple intervention, which we call LAyer-SElective Rank reduction LASER, can be done on a model after <br>training has completed, and requires no additional parameters or data. LASER can dramatically boost predictive <br>performance on question-answering tasks and across various modalities for which Transformers are used. <br>
138+
</p>
139+
</div>
140+
</center>
141+
</div>
142+
<br>
143+
<br>
144+
145+
<center>
146+
<img width=60% src="main.png"></img>
147+
<br>
148+
<br>
149+
<i>LAyer SElective Rank reduction (LASER) replaces a specific weight matrix W of the Transformer model by its rank-$k$ <br>approximation and observes the change in the behavior of the model. We find that this rank approximation, especially for MLP <br>weights at the latter layers of the model, often offers surprising benefits to model performance.</i>
150+
</center>
151+
<p>
152+
<br><br>
153+
154+
155+
<div class="content">
156+
<div class="text">
157+
<center>
158+
<p>
159+
<div class="title"><b>Layer Selective Rank Reduction Improves Generalization</b></div>
160+
<!-- <b>
161+
<font size="5">Layer Selective Rank Reduction Improves Generalization</font>
162+
</b> -->
163+
<!-- &nbsp; -->
164+
<!-- Predictive dynamics models often have excellent single-step error, but poor long-horizon accuracy due to compounding errors.
165+
We show that Transformers are more reliable long-horizon predictors than state-of-the-art single-step models, even in continuous Markovian domains. -->
166+
</p>
167+
<br>
168+
<br>
169+
<img width=40% src="loss.png"></img>
170+
&nbsp;&nbsp;&nbsp;&nbsp;
171+
&nbsp;&nbsp;&nbsp;&nbsp;
172+
<br>
173+
<br>
174+
<i>The effect of rank reduction across different layer types is not uniform. This figure shows the effect of rank <br>reduction for GPT-J as studied on the CounterFact dataset. The dashed line is the base model loss. In the attention <br>layers (key, query, value, out matrices), while its clear matrices could be significantly rank-reduced without damaging <br>the learned hypothesis, there is very little performance increase. Alternatively, for the multi-layer perceptron (MLP) <br>layers, rank reduction goes from uniformly harming to improving the model's performance (at layer 20).</i>
175+
</center>
176+
<br>
177+
</div>
178+
</div>
179+
<br>
180+
<br>
181+
182+
<div class="content">
183+
<div class="text">
184+
<center>
185+
<p>
186+
<div class="title"><b>LASER offers a kind of denoising procedure that makes weakly learned facts accessible</b></div>
187+
<!-- <b>
188+
<font size="5.0">
189+
Beam search as trajectory optimizer
190+
</font>
191+
</b> -->
192+
<!-- . -->
193+
<!-- Various control settings can be reduced to slight modifications of beam search with a sequence model. -->
194+
<ul>
195+
<br>
196+
<img width=40% src="corrected.png"></img>
197+
<img width=11% src="corrected-2.png"></img>
198+
<br>
199+
<li>Which datapoints benefit from LASER? We analyze how frequently in the training data ``corrected'' facts occur. <br> GPT-J is an ideal test bed for such analysis since its training data, the PILE dataset, is publicly available. <br> (a) For GPT-J evaluated on Counterfact we retrieve all the datapoints in the training data that contain a mention of <br>both the entity of interest and the answer that correspond to each sample in CounterFact. (b) A plot depicting the <br>cumulative top-10 accuracy of the model on all datapoints that occur in the training data less than or equal to <br>the frequency indicated on the x-axis. The plot looks at how the accuracy changes before and after LASER. (c) The <br>largest boost in performance occurs for low-frequency samples. Demonstrates the amount of boost offered by LASER for<br> data binned by the frequency with which corresponding facts occur in the training data. Maximal improvements in <br>accuracy are from datapoints that have less-frequent occurrences in the training data as opposed to those that occur<br> more frequently.</li>
200+
<br>
201+
<br>
202+
</ul>
203+
</p>
204+
</center>
205+
</div>
206+
</div>
207+
208+
</div>
209+
</div>
210+
211+
<br><br><br><br>
212+
213+
214+
</div></body></html>

loss.png

88.9 KB
Loading

main.png

85.4 KB
Loading

0 commit comments

Comments
 (0)