Skip to content

Commit 988391b

Browse files
committed
chore: update confs
1 parent 959e50d commit 988391b

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

arxiv.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43685,5 +43685,12 @@
4368543685
"pub_date": "2025-03-27",
4368643686
"summary": "A recent paper proposed Dynamic Tanh (DyT) as a drop-in replacement for Layer Normalization. Although the method is empirically well-motivated and appealing from a practical point of view, it lacks a theoretical foundation. In this work, we derive DyT mathematically and show that a well-defined approximation is needed to do so. By dropping said approximation, an alternative element-wise transformation is obtained, which we call Elementwise Layer Normalization (ELN). We demonstrate that ELN resembles Layer Normalization more accurately than DyT does.",
4368743687
"translated": "近期一篇论文提出将动态双曲正切函数(Dynamic Tanh,DyT)作为层归一化的即插即用替代方案。尽管该方法在实证研究方面具有充分依据且实用价值显著,但其理论基础尚不完善。本研究通过数学推导构建了DyT的理论框架,并证明该过程需要借助一个明确定义的近似条件。当舍弃该近似条件时,我们获得了一种新的逐元素变换方法,称为逐元素层归一化(Elementwise Layer Normalization,ELN)。实验表明,与DyT相比,ELN能更精确地模拟传统层归一化的行为特征。\n\n(翻译说明:\n1. 专业术语处理:\"drop-in replacement\"译为\"即插即用替代方案\",\"element-wise transformation\"译为\"逐元素变换\",符合计算机领域术语规范\n2. 学术表达优化:\"empirically well-motivated\"译为\"实证研究方面具有充分依据\",既保留原意又符合中文论文摘要文体\n3. 逻辑显化处理:\"a well-defined approximation is needed to do so\"增译为\"该过程需要借助...\",使推导过程的逻辑关系更清晰\n4. 术语一致性:\"Layer Normalization\"统一译为\"层归一化\",与国内计算机领域标准译法保持一致\n5. 被动语态转换:\"an alternative...is obtained\"译为主动句式\"我们获得\",符合中文表达习惯)"
43688+
},
43689+
{
43690+
"title": "Outlier dimensions favor frequent tokens in language models",
43691+
"url": "http://arxiv.org/abs/2503.21718v2",
43692+
"pub_date": "2025-03-27",
43693+
"summary": "We study last-layer outlier dimensions, i.e. dimensions that display extreme activations for the majority of inputs. We show that outlier dimensions arise in many different modern language models, and trace their function back to the heuristic of constantly predicting frequent words. We further show how a model can block this heuristic when it is not contextually appropriate, by assigning a counterbalancing weight mass to the remaining dimensions, and we investigate which model parameters boost outlier dimensions and when they arise during training. We conclude that outlier dimensions are a specialized mechanism discovered by many distinct models to implement a useful token prediction heuristic.",
43694+
"translated": "我们研究最后一层异常值维度,即对大多数输入表现出极端激活的维度。我们发现这种异常维度广泛存在于多种现代语言模型中,并将其功能溯源至\"持续预测高频词\"的启发式策略。进一步研究表明,当该启发式策略与上下文不符时,模型会通过为其他维度分配平衡性权重来阻断该机制。我们还探究了哪些模型参数会强化异常维度,以及它们在训练过程中何时出现。本文最终得出结论:异常维度是多种不同模型独立发现的一种专用机制,用于实现有效的词符预测启发式策略。"
4368843695
}
4368943696
]

0 commit comments

Comments
 (0)