chore: update confs

actions-user · actions-user · commit 39ba6b515ef6 · 2025-04-11T10:19:41.000Z
diff --git a/arxiv.json b/arxiv.json
@@ -44504,5 +44504,40 @@
         "pub_date": "2025-04-09",
         "summary": "Personalized preference alignment for large language models (LLMs), the process of tailoring LLMs to individual users' preferences, is an emerging research direction spanning the area of NLP and personalization. In this survey, we present an analysis of works on personalized alignment and modeling for LLMs. We introduce a taxonomy of preference alignment techniques, including training time, inference time, and additionally, user-modeling based methods. We provide analysis and discussion on the strengths and limitations of each group of techniques and then cover evaluation, benchmarks, as well as open problems in the field.",
         "translated": "大型语言模型（LLM）的个性化偏好对齐——即根据个体用户偏好定制LLM的过程——是横跨自然语言处理（NLP）与个性化领域的新兴研究方向。本综述系统分析了LLM个性化对齐与建模的相关研究，提出了一种偏好对齐技术分类框架，涵盖训练阶段对齐、推理阶段对齐以及基于用户建模的方法三大类。我们对每组技术的优势与局限进行了深入分析与讨论，继而探讨了该领域的评估体系、基准测试以及当前面临的开放性问题。  \n\n（注：根据学术文本特征，译文采用以下处理原则：  \n1. 专业术语统一：\"alignment\"译为\"对齐\"（NLP领域标准译法）；\"user-modeling\"译为\"用户建模\"  \n2. 句式重构：将原文复合句拆解为符合中文表达习惯的短句结构  \n3. 概念显化：\"taxonomy\"译为\"分类框架\"以突出其方法论属性  \n4. 逻辑显性连接：使用\"即\"解释术语，通过破折号维持原文插入语结构  \n5. 术语一致性：\"open problems\"统一译为\"开放性问题\"而非\"未解决问题\"以符合计算机领域惯例）"
+    },
+    {
+        "title": "How do Large Language Models Understand Relevance? A Mechanistic\n  Interpretability Perspective",
+        "url": "http://arxiv.org/abs/2504.07898v1",
+        "pub_date": "2025-04-10",
+        "summary": "Recent studies have shown that large language models (LLMs) can assess relevance and support information retrieval (IR) tasks such as document ranking and relevance judgment generation. However, the internal mechanisms by which off-the-shelf LLMs understand and operationalize relevance remain largely unexplored. In this paper, we systematically investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability. Using activation patching techniques, we analyze the roles of various model components and identify a multi-stage, progressive process in generating either pointwise or pairwise relevance judgment. Specifically, LLMs first extract query and document information in the early layers, then process relevance information according to instructions in the middle layers, and finally utilize specific attention heads in the later layers to generate relevance judgments in the required format. Our findings provide insights into the mechanisms underlying relevance assessment in LLMs, offering valuable implications for future research on leveraging LLMs for IR tasks.",
+        "translated": "近期研究表明，大型语言模型（LLMs）能够评估相关性并支持信息检索（IR）任务，如文档排序和相关性判断生成。然而，现成LLMs理解并实现相关性的内部机制仍鲜有研究。本文通过机制可解释性视角，系统性地探究了不同LLM模块如何参与相关性判断。运用激活修补技术，我们分析了各模型组件的作用，揭示了生成逐点或成对相关性判断的多阶段渐进过程：LLMs首先在浅层提取查询和文档信息，随后在中间层根据指令处理相关性信息，最终在深层利用特定注意力头以所需格式生成相关性判断。这些发现揭示了LLMs相关性评估的内在机制，为未来利用LLMs进行信息检索研究提供了重要启示。"
+    },
+    {
+        "title": "Siren Federate: Bridging document, relational, and graph models for\n  exploratory graph analysis",
+        "url": "http://arxiv.org/abs/2504.07815v1",
+        "pub_date": "2025-04-10",
+        "summary": "Investigative workflows require interactive exploratory analysis on large heterogeneous knowledge graphs. Current databases show limitations in enabling such task. This paper discusses the architecture of Siren Federate, a system that efficiently supports exploratory graph analysis by bridging document-oriented, relational and graph models. Technical contributions include distributed join algorithms, adaptive query planning, query plan folding, semantic caching, and semi-join decomposition for path query. Semi-join decomposition addresses the exponential growth of intermediate results in path-based queries. Experiments show that Siren Federate exhibits low latency and scales well with the amount of data, the number of users, and the number of computing nodes.",
+        "translated": "调查分析工作流需要对大规模异构知识图谱进行交互式探索分析。现有数据库系统在支持此类任务时存在明显局限。本文阐述了Siren Federate系统的架构设计，该系统通过桥接面向文档、关系型与图数据模型，高效支持图谱探索分析。核心技术贡献包括：分布式连接算法、自适应查询规划、查询计划折叠、语义缓存以及面向路径查询的半连接分解方案。其中半连接分解有效解决了基于路径查询时中间结果的指数级增长问题。实验表明，Siren Federate系统具有低延迟特性，并在数据量、用户规模及计算节点数量增长时均展现出良好的扩展性。\n\n（译文特点说明：\n1. 专业术语处理：\"heterogeneous knowledge graphs\"译为\"异构知识图谱\"，\"semi-join decomposition\"译为\"半连接分解\"，保持计算机领域术语规范\n2. 技术概念转化：\"query plan folding\"译作\"查询计划折叠\"，准确反映原意\n3. 句式重构：将原文复合句拆分为符合中文表达习惯的短句，如技术贡献部分采用分号列举式结构\n4. 被动语态转换：\"experiments show that...\"主动化为\"实验表明...\"\n5. 量词规范：\"a large number of\"统一处理为\"数量增长\"而非直译\"大量\"）"
+    },
+    {
+        "title": "FairEval: Evaluating Fairness in LLM-Based Recommendations with\n  Personality Awareness",
+        "url": "http://arxiv.org/abs/2504.07801v1",
+        "pub_date": "2025-04-10",
+        "summary": "Recent advances in Large Language Models (LLMs) have enabled their application to recommender systems (RecLLMs), yet concerns remain regarding fairness across demographic and psychological user dimensions. We introduce FairEval, a novel evaluation framework to systematically assess fairness in LLM-based recommendations. FairEval integrates personality traits with eight sensitive demographic attributes,including gender, race, and age, enabling a comprehensive assessment of user-level bias. We evaluate models, including ChatGPT 4o and Gemini 1.5 Flash, on music and movie recommendations. FairEval's fairness metric, PAFS, achieves scores up to 0.9969 for ChatGPT 4o and 0.9997 for Gemini 1.5 Flash, with disparities reaching 34.79 percent. These results highlight the importance of robustness in prompt sensitivity and support more inclusive recommendation systems.",
+        "translated": "【译文】  \n尽管大语言模型（LLMs）的最新进展已使其能够应用于推荐系统（RecLLMs），但在人口统计和心理用户维度上的公平性仍存在隐忧。本文提出FairEval——一种系统性评估基于LLM推荐公平性的新型框架。该框架将人格特质与性别、种族、年龄等八项敏感人口属性相结合，实现了用户层面偏见的全面评估。我们在音乐和电影推荐场景下测试了包括ChatGPT 4o和Gemini 1.5 Flash在内的模型。实验表明，FairEval的公平性指标PAFS在ChatGPT 4o上最高达0.9969，在Gemini 1.5 Flash上达0.9997，但差异率仍达到34.79%。这些结果不仅揭示了提示词敏感性鲁棒性的重要性，也为构建更具包容性的推荐系统提供了支撑。  \n\n【关键术语处理】  \n1. \"demographic and psychological dimensions\" → \"人口统计和心理维度\"（保留学科术语准确性）  \n2. \"personality traits\" → \"人格特质\"（心理学标准译法）  \n3. \"PAFS\" → 保留缩写（原文未展开，符合学术惯例）  \n4. \"prompt sensitivity\" → \"提示词敏感性\"（NLP领域通用译法）  \n\n【技术细节呈现】  \n- 采用破折号引出术语定义（符合中文科技文本习惯）  \n- 差异率数据保留原文百分比格式（34.79%→34.79%）  \n- 模型名称保留英文代号（ChatGPT 4o/Gemini 1.5 Flash）  \n\n【学术风格强化】  \n- \"系统性评估\"替代简单翻译\"systematically assess\"  \n- \"存在隐忧\"比直译\"concerns remain\"更符合中文论述语气  \n- \"揭示了...重要性\"增强结论的学术价值表述"
+    },
+    {
+        "title": "Plan-and-Refine: Diverse and Comprehensive Retrieval-Augmented\n  Generation",
+        "url": "http://arxiv.org/abs/2504.07794v1",
+        "pub_date": "2025-04-10",
+        "summary": "This paper studies the limitations of (retrieval-augmented) large language models (LLMs) in generating diverse and comprehensive responses, and introduces the Plan-and-Refine (P&amp;R) framework based on a two phase system design. In the global exploration phase, P&amp;R generates a diverse set of plans for the given input, where each plan consists of a list of diverse query aspects with corresponding additional descriptions. This phase is followed by a local exploitation phase that generates a response proposal for the input query conditioned on each plan and iteratively refines the proposal for improving the proposal quality. Finally, a reward model is employed to select the proposal with the highest factuality and coverage. We conduct our experiments based on the ICAT evaluation methodology--a recent approach for answer factuality and comprehensiveness evaluation. Experiments on the two diverse information seeking benchmarks adopted from non-factoid question answering and TREC search result diversification tasks demonstrate that P&amp;R significantly outperforms baselines, achieving up to a 13.1% improvement on the ANTIQUE dataset and a 15.41% improvement on the TREC dataset. Furthermore, a smaller scale user study confirms the substantial efficacy of the P&amp;R framework.",
+        "translated": "本文研究了（检索增强型）大语言模型在生成多样化和全面性响应方面的局限性，提出了基于两阶段系统设计的\"规划-优化\"（P&R）框架。在全局探索阶段，P&R会为给定输入生成多样化的规划方案，每个方案由一系列具有对应补充描述的多样化查询维度组成。随后的局部开发阶段会基于每个规划方案生成查询响应提案，并通过迭代优化持续提升提案质量。最终通过奖励模型选择事实准确性和覆盖范围最优的提案。我们采用ICAT评估方法（一种最新的答案事实性与全面性评估方案）进行实验验证。在非事实型问答和TREC搜索结果多样化任务组成的双基准测试中，P&R框架显著超越基线模型，在ANTIQUE数据集上实现13.1%的性能提升，在TREC数据集上取得15.41%的改进幅度。小规模用户研究进一步验证了该框架的显著有效性。"
+    },
+    {
+        "title": "CollEX -- A Multimodal Agentic RAG System Enabling Interactive\n  Exploration of Scientific Collections",
+        "url": "http://arxiv.org/abs/2504.07643v1",
+        "pub_date": "2025-04-10",
+        "summary": "In this paper, we introduce CollEx, an innovative multimodal agentic Retrieval-Augmented Generation (RAG) system designed to enhance interactive exploration of extensive scientific collections. Given the overwhelming volume and inherent complexity of scientific collections, conventional search systems often lack necessary intuitiveness and interactivity, presenting substantial barriers for learners, educators, and researchers. CollEx addresses these limitations by employing state-of-the-art Large Vision-Language Models (LVLMs) as multimodal agents accessible through an intuitive chat interface. By abstracting complex interactions via specialized agents equipped with advanced tools, CollEx facilitates curiosity-driven exploration, significantly simplifying access to diverse scientific collections and records therein. Our system integrates textual and visual modalities, supporting educational scenarios that are helpful for teachers, pupils, students, and researchers by fostering independent exploration as well as scientific excitement and curiosity. Furthermore, CollEx serves the research community by discovering interdisciplinary connections and complementing visual data. We illustrate the effectiveness of our system through a proof-of-concept application containing over 64,000 unique records across 32 collections from a local scientific collection from a public university.",
+        "translated": "本文介绍CollEx——一种创新型多模态智能检索增强生成（RAG）系统，旨在提升大规模科学馆藏的交互式探索体验。面对科学馆藏的海量数据与固有复杂性，传统检索系统往往缺乏必要的直观性与交互性，为学习者、教育工作者及研究人员设置了巨大障碍。CollEx通过采用前沿的大型视觉语言模型（LVLM）作为多模态智能体，经由直观的聊天界面实现访问，有效解决了这些局限性。该系统通过配备先进工具的专业化智能体对复杂交互进行抽象，支持好奇心驱动的探索行为，显著降低了各类科学馆藏及其记录的访问门槛。我们的系统整合文本与视觉模态，通过培养自主探索能力及激发科学热情与好奇心，为教师、中小学生、大学生及研究人员提供了有力的教育场景支持。此外，CollEx通过发现跨学科关联与补充视觉数据，为研究社群提供服务。我们通过一个概念验证应用（包含某公立大学本地科学馆藏中32个类别的64,000余条独特记录）验证了系统的有效性。\n\n（注：根据学术论文摘要的翻译规范，本文在以下方面进行了专业处理：\n1. 专业术语统一：\"Retrieval-Augmented Generation\"规范译为\"检索增强生成\"，\"Large Vision-Language Models\"采用学界通用译法\"大型视觉语言模型\"\n2. 长句拆分：将原文复合长句按中文表达习惯分解为多个短句，如将\"By abstracting...\"起始的英文长句拆分为两个中文分句\n3. 被动语态转换：将\"are helpful for\"等被动结构转为中文主动式表达\"提供了...支持\"\n4. 数字规范：严格保留原文数据\"64,000 unique records\"的精确性，采用中文数字书写规范\"64,000余条\"\n5. 机构名称处理：\"public university\"根据上下文译为\"公立大学\"而非字面的\"公共大学\"）"
     }
 ]