You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: arxiv.json
+35Lines changed: 35 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -44504,5 +44504,40 @@
44504
44504
"pub_date": "2025-04-09",
44505
44505
"summary": "Personalized preference alignment for large language models (LLMs), the process of tailoring LLMs to individual users' preferences, is an emerging research direction spanning the area of NLP and personalization. In this survey, we present an analysis of works on personalized alignment and modeling for LLMs. We introduce a taxonomy of preference alignment techniques, including training time, inference time, and additionally, user-modeling based methods. We provide analysis and discussion on the strengths and limitations of each group of techniques and then cover evaluation, benchmarks, as well as open problems in the field.",
"title": "How do Large Language Models Understand Relevance? A Mechanistic\n Interpretability Perspective",
44510
+
"url": "http://arxiv.org/abs/2504.07898v1",
44511
+
"pub_date": "2025-04-10",
44512
+
"summary": "Recent studies have shown that large language models (LLMs) can assess relevance and support information retrieval (IR) tasks such as document ranking and relevance judgment generation. However, the internal mechanisms by which off-the-shelf LLMs understand and operationalize relevance remain largely unexplored. In this paper, we systematically investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability. Using activation patching techniques, we analyze the roles of various model components and identify a multi-stage, progressive process in generating either pointwise or pairwise relevance judgment. Specifically, LLMs first extract query and document information in the early layers, then process relevance information according to instructions in the middle layers, and finally utilize specific attention heads in the later layers to generate relevance judgments in the required format. Our findings provide insights into the mechanisms underlying relevance assessment in LLMs, offering valuable implications for future research on leveraging LLMs for IR tasks.",
"summary": "Investigative workflows require interactive exploratory analysis on large heterogeneous knowledge graphs. Current databases show limitations in enabling such task. This paper discusses the architecture of Siren Federate, a system that efficiently supports exploratory graph analysis by bridging document-oriented, relational and graph models. Technical contributions include distributed join algorithms, adaptive query planning, query plan folding, semantic caching, and semi-join decomposition for path query. Semi-join decomposition addresses the exponential growth of intermediate results in path-based queries. Experiments show that Siren Federate exhibits low latency and scales well with the amount of data, the number of users, and the number of computing nodes.",
44520
+
"translated": "调查分析工作流需要对大规模异构知识图谱进行交互式探索分析。现有数据库系统在支持此类任务时存在明显局限。本文阐述了Siren Federate系统的架构设计,该系统通过桥接面向文档、关系型与图数据模型,高效支持图谱探索分析。核心技术贡献包括:分布式连接算法、自适应查询规划、查询计划折叠、语义缓存以及面向路径查询的半连接分解方案。其中半连接分解有效解决了基于路径查询时中间结果的指数级增长问题。实验表明,Siren Federate系统具有低延迟特性,并在数据量、用户规模及计算节点数量增长时均展现出良好的扩展性。\n\n(译文特点说明:\n1. 专业术语处理:\"heterogeneous knowledge graphs\"译为\"异构知识图谱\",\"semi-join decomposition\"译为\"半连接分解\",保持计算机领域术语规范\n2. 技术概念转化:\"query plan folding\"译作\"查询计划折叠\",准确反映原意\n3. 句式重构:将原文复合句拆分为符合中文表达习惯的短句,如技术贡献部分采用分号列举式结构\n4. 被动语态转换:\"experiments show that...\"主动化为\"实验表明...\"\n5. 量词规范:\"a large number of\"统一处理为\"数量增长\"而非直译\"大量\")"
44521
+
},
44522
+
{
44523
+
"title": "FairEval: Evaluating Fairness in LLM-Based Recommendations with\n Personality Awareness",
44524
+
"url": "http://arxiv.org/abs/2504.07801v1",
44525
+
"pub_date": "2025-04-10",
44526
+
"summary": "Recent advances in Large Language Models (LLMs) have enabled their application to recommender systems (RecLLMs), yet concerns remain regarding fairness across demographic and psychological user dimensions. We introduce FairEval, a novel evaluation framework to systematically assess fairness in LLM-based recommendations. FairEval integrates personality traits with eight sensitive demographic attributes,including gender, race, and age, enabling a comprehensive assessment of user-level bias. We evaluate models, including ChatGPT 4o and Gemini 1.5 Flash, on music and movie recommendations. FairEval's fairness metric, PAFS, achieves scores up to 0.9969 for ChatGPT 4o and 0.9997 for Gemini 1.5 Flash, with disparities reaching 34.79 percent. These results highlight the importance of robustness in prompt sensitivity and support more inclusive recommendation systems.",
"title": "Plan-and-Refine: Diverse and Comprehensive Retrieval-Augmented\n Generation",
44531
+
"url": "http://arxiv.org/abs/2504.07794v1",
44532
+
"pub_date": "2025-04-10",
44533
+
"summary": "This paper studies the limitations of (retrieval-augmented) large language models (LLMs) in generating diverse and comprehensive responses, and introduces the Plan-and-Refine (P&R) framework based on a two phase system design. In the global exploration phase, P&R generates a diverse set of plans for the given input, where each plan consists of a list of diverse query aspects with corresponding additional descriptions. This phase is followed by a local exploitation phase that generates a response proposal for the input query conditioned on each plan and iteratively refines the proposal for improving the proposal quality. Finally, a reward model is employed to select the proposal with the highest factuality and coverage. We conduct our experiments based on the ICAT evaluation methodology--a recent approach for answer factuality and comprehensiveness evaluation. Experiments on the two diverse information seeking benchmarks adopted from non-factoid question answering and TREC search result diversification tasks demonstrate that P&R significantly outperforms baselines, achieving up to a 13.1% improvement on the ANTIQUE dataset and a 15.41% improvement on the TREC dataset. Furthermore, a smaller scale user study confirms the substantial efficacy of the P&R framework.",
"title": "CollEX -- A Multimodal Agentic RAG System Enabling Interactive\n Exploration of Scientific Collections",
44538
+
"url": "http://arxiv.org/abs/2504.07643v1",
44539
+
"pub_date": "2025-04-10",
44540
+
"summary": "In this paper, we introduce CollEx, an innovative multimodal agentic Retrieval-Augmented Generation (RAG) system designed to enhance interactive exploration of extensive scientific collections. Given the overwhelming volume and inherent complexity of scientific collections, conventional search systems often lack necessary intuitiveness and interactivity, presenting substantial barriers for learners, educators, and researchers. CollEx addresses these limitations by employing state-of-the-art Large Vision-Language Models (LVLMs) as multimodal agents accessible through an intuitive chat interface. By abstracting complex interactions via specialized agents equipped with advanced tools, CollEx facilitates curiosity-driven exploration, significantly simplifying access to diverse scientific collections and records therein. Our system integrates textual and visual modalities, supporting educational scenarios that are helpful for teachers, pupils, students, and researchers by fostering independent exploration as well as scientific excitement and curiosity. Furthermore, CollEx serves the research community by discovering interdisciplinary connections and complementing visual data. We illustrate the effectiveness of our system through a proof-of-concept application containing over 64,000 unique records across 32 collections from a local scientific collection from a public university.",
0 commit comments