chore: update confs

actions-user · actions-user · commit a13f7e927645 · 2025-03-14T10:19:35.000Z
diff --git a/arxiv.json b/arxiv.json
@@ -42607,5 +42607,40 @@
         "pub_date": "2025-03-12",
         "summary": "We present TRACE, a novel system for live *common ground* tracking in situated collaborative tasks. With a focus on fast, real-time performance, TRACE tracks the speech, actions, gestures, and visual attention of participants, uses these multimodal inputs to determine the set of task-relevant propositions that have been raised as the dialogue progresses, and tracks the group's epistemic position and beliefs toward them as the task unfolds. Amid increased interest in AI systems that can mediate collaborations, TRACE represents an important step forward for agents that can engage with multiparty, multimodal discourse.",
         "translated": "我们提出了TRACE系统，这是一个用于在情境化协作任务中实时追踪*共同基础*的创新系统。TRACE专注于快速、实时的性能，追踪参与者的语音、动作、手势和视觉注意力，利用这些多模态输入来确定随着对话进展而提出的任务相关命题集合，并在任务展开过程中追踪团队对这些命题的认知立场和信念。随着人们对能够调解协作的AI系统的兴趣日益增加，TRACE代表了能够参与多方、多模态对话的智能体向前迈出的重要一步。"
+    },
+    {
+        "title": "GBSVR: Granular Ball Support Vector Regression",
+        "url": "http://arxiv.org/abs/2503.10539v1",
+        "pub_date": "2025-03-13",
+        "summary": "Support Vector Regression (SVR) and its variants are widely used to handle regression tasks, however, since their solution involves solving an expensive quadratic programming problem, it limits its application, especially when dealing with large datasets. Additionally, SVR uses an epsilon-insensitive loss function which is sensitive to outliers and therefore can adversely affect its performance. We propose Granular Ball Support Vector Regression (GBSVR) to tackle problem of regression by using granular ball concept. These balls are useful in simplifying complex data spaces for machine learning tasks, however, to the best of our knowledge, they have not been sufficiently explored for regression problems. Granular balls group the data points into balls based on their proximity and reduce the computational cost in SVR by replacing the large number of data points with far fewer granular balls. This work also suggests a discretization method for continuous-valued attributes to facilitate the construction of granular balls. The effectiveness of the proposed approach is evaluated on several benchmark datasets and it outperforms existing state-of-the-art approaches",
+        "translated": "支持向量回归（SVR）及其变体被广泛用于处理回归任务。然而，由于其求解过程涉及解决一个计算代价高昂的二次规划问题，这限制了其应用，尤其是在处理大规模数据集时。此外，SVR使用了一种对异常值敏感的ε不敏感损失函数，这可能会对其性能产生不利影响。我们提出了**粒度球支持向量回归（GBSVR）**，通过利用粒度球的概念来解决回归问题。这些粒度球在简化机器学习任务的复杂数据空间方面非常有用，但据我们所知，它们在回归问题中的应用尚未得到充分探索。粒度球根据数据点的邻近性将其分组为球，并通过用更少的粒度球替换大量数据点来降低SVR的计算成本。本文还提出了一种连续值属性的离散化方法，以促进粒度球的构建。我们在多个基准数据集上评估了所提出方法的有效性，结果表明其优于现有的最先进方法。"
+    },
+    {
+        "title": "Resource efficient data transmission on animals based on machine\n  learning",
+        "url": "http://arxiv.org/abs/2503.10277v1",
+        "pub_date": "2025-03-13",
+        "summary": "Bio-loggers, electronic devices used to track animal behaviour through various sensors, have become essential in wildlife research.   Despite continuous improvements in their capabilities, bio-loggers still face significant limitations in storage, processing, and data transmission due to the constraints of size and weight, which are necessary to avoid disturbing the animals.   This study aims to explore how selective data transmission, guided by machine learning, can reduce the energy consumption of bio-loggers, thereby extending their operational lifespan without requiring hardware modifications.",
+        "translated": "生物记录器（bio-loggers）是通过各种传感器追踪动物行为的电子设备，在野生动物研究中已成为不可或缺的工具。尽管其功能不断改进，但由于尺寸和重量的限制（这是为了避免干扰动物而必须的），生物记录器在存储、处理和数据传输方面仍然面临显著的限制。本研究旨在探索在机器学习指导下的选择性数据传输如何减少生物记录器的能耗，从而在不进行硬件修改的情况下延长其使用寿命。"
+    },
+    {
+        "title": "ImageScope: Unifying Language-Guided Image Retrieval via Large\n  Multimodal Model Collective Reasoning",
+        "url": "http://arxiv.org/abs/2503.10166v1",
+        "pub_date": "2025-03-13",
+        "summary": "With the proliferation of images in online content, language-guided image retrieval (LGIR) has emerged as a research hotspot over the past decade, encompassing a variety of subtasks with diverse input forms. While the development of large multimodal models (LMMs) has significantly facilitated these tasks, existing approaches often address them in isolation, requiring the construction of separate systems for each task. This not only increases system complexity and maintenance costs, but also exacerbates challenges stemming from language ambiguity and complex image content, making it difficult for retrieval systems to provide accurate and reliable results. To this end, we propose ImageScope, a training-free, three-stage framework that leverages collective reasoning to unify LGIR tasks. The key insight behind the unification lies in the compositional nature of language, which transforms diverse LGIR tasks into a generalized text-to-image retrieval process, along with the reasoning of LMMs serving as a universal verification to refine the results. To be specific, in the first stage, we improve the robustness of the framework by synthesizing search intents across varying levels of semantic granularity using chain-of-thought (CoT) reasoning. In the second and third stages, we then reflect on retrieval results by verifying predicate propositions locally, and performing pairwise evaluations globally. Experiments conducted on six LGIR datasets demonstrate that ImageScope outperforms competitive baselines. Comprehensive evaluations and ablation studies further confirm the effectiveness of our design.",
+        "translated": "随着在线内容中图像的激增，语言引导的图像检索（LGIR）在过去十年中已成为一个研究热点，涵盖了多种输入形式的子任务。尽管大型多模态模型（LMMs）的发展极大地促进了这些任务，但现有方法通常孤立地处理这些任务，需要为每个任务构建独立的系统。这不仅增加了系统的复杂性和维护成本，还加剧了由于语言歧义和复杂图像内容带来的挑战，使得检索系统难以提供准确和可靠的结果。为此，我们提出了ImageScope，一个无需训练的三阶段框架，利用集体推理来统一LGIR任务。统一的关键在于语言的组合性质，它将各种LGIR任务转化为广义的文本到图像检索过程，并通过LMMs的推理作为通用验证来优化结果。具体来说，在第一阶段，我们通过使用链式思维（CoT）推理在不同语义粒度上综合搜索意图，提高了框架的鲁棒性。在第二和第三阶段，我们通过局部验证谓词命题和全局进行成对评估来反思检索结果。在六个LGIR数据集上进行的实验表明，ImageScope优于竞争基线。全面的评估和消融研究进一步证实了我们设计的有效性。"
+    },
+    {
+        "title": "Conversational Gold: Evaluating Personalized Conversational Search\n  System using Gold Nuggets",
+        "url": "http://arxiv.org/abs/2503.09902v1",
+        "pub_date": "2025-03-12",
+        "summary": "The rise of personalized conversational search systems has been driven by advancements in Large Language Models (LLMs), enabling these systems to retrieve and generate answers for complex information needs. However, the automatic evaluation of responses generated by Retrieval Augmented Generation (RAG) systems remains an understudied challenge. In this paper, we introduce a new resource for assessing the retrieval effectiveness and relevance of response generated by RAG systems, using a nugget-based evaluation framework. Built upon the foundation of TREC iKAT 2023, our dataset extends to the TREC iKAT 2024 collection, which includes 17 conversations and 20,575 relevance passage assessments, together with 2,279 extracted gold nuggets, and 62 manually written gold answers from NIST assessors. While maintaining the core structure of its predecessor, this new collection enables a deeper exploration of generation tasks in conversational settings. Key improvements in iKAT 2024 include: (1) ``gold nuggets'' -- concise, essential pieces of information extracted from relevant passages of the collection -- which serve as a foundation for automatic response evaluation; (2) manually written answers to provide a gold standard for response evaluation; (3) unanswerable questions to evaluate model hallucination; (4) expanded user personas, providing richer contextual grounding; and (5) a transition from Personal Text Knowledge Base (PTKB) ranking to PTKB classification and selection. Built on this resource, we provide a framework for long-form answer generation evaluation, involving nuggets extraction and nuggets matching, linked to retrieval. This establishes a solid resource for advancing research in personalized conversational search and long-form answer generation. Our resources are publicly available at https://github.com/irlabamsterdam/CONE-RAG.",
+        "translated": "随着大型语言模型（LLMs）的进步，个性化对话搜索系统的兴起推动了这些系统为复杂信息需求检索和生成答案的能力。然而，检索增强生成（RAG）系统生成的响应的自动评估仍然是一个未被充分研究的挑战。在本文中，我们引入了一种新的资源，用于评估RAG系统生成的响应的检索效果和相关性，采用基于信息片段（nugget）的评估框架。基于TREC iKAT 2023的基础，我们的数据集扩展到了TREC iKAT 2024集合，其中包括17个对话和20,575个相关段落评估，以及2,279个提取的黄金信息片段和来自NIST评估员的62个手工编写的黄金答案。在保持其前身核心结构的同时，这一新集合使得对话环境中的生成任务得以更深入的探索。iKAT 2024的关键改进包括：（1）“黄金信息片段”——从集合的相关段落中提取的简洁、关键的信息片段——作为自动响应评估的基础；（2）手工编写的答案，为响应评估提供黄金标准；（3）无法回答的问题，用于评估模型幻觉；（4）扩展的用户角色，提供更丰富的上下文基础；（5）从个人文本知识库（PTKB）排名过渡到PTKB分类和选择。基于这一资源，我们提供了一个长答案生成评估框架，涉及信息片段提取和信息片段匹配，并与检索相关联。这为推进个性化对话搜索和长答案生成的研究建立了坚实的资源。我们的资源可在https://github.com/irlabamsterdam/CONE-RAG 公开获取。"
+    },
+    {
+        "title": "Improving the Reusability of Conversational Search Test Collections",
+        "url": "http://arxiv.org/abs/2503.09899v1",
+        "pub_date": "2025-03-12",
+        "summary": "Incomplete relevance judgments limit the reusability of test collections. When new systems are compared to previous systems that contributed to the pool, they often face a disadvantage. This is due to pockets of unjudged documents (called holes) in the test collection that the new systems return. The very nature of Conversational Search (CS) means that these holes are potentially larger and more problematic when evaluating systems. In this paper, we aim to extend CS test collections by employing Large Language Models (LLMs) to fill holes by leveraging existing judgments. We explore this problem using TREC iKAT 23 and TREC CAsT 22 collections, where information needs are highly dynamic and the responses are much more varied, leaving bigger holes to fill. Our experiments reveal that CS collections show a trend towards less reusability in deeper turns. Also, fine-tuning the Llama 3.1 model leads to high agreement with human assessors, while few-shot prompting the ChatGPT results in low agreement with humans. Consequently, filling the holes of a new system using ChatGPT leads to a higher change in the location of the new system. While regenerating the assessment pool with few-shot prompting the ChatGPT model and using it for evaluation achieves a high rank correlation with human-assessed pools. We show that filling the holes using few-shot training the Llama 3.1 model enables a fairer comparison between the new system and the systems contributed to the pool. Our hole-filling model based on few-shot training of the Llama 3.1 model can improve the reusability of test collections.",
+        "translated": "不完整的相关性判断限制了测试集的可重用性。当新系统与之前参与池构建的系统进行比较时，新系统往往处于劣势。这是由于新系统返回的文档在测试集中存在未被判断的部分（称为“空洞”）。对话式搜索（CS）的特性意味着，在评估系统时，这些空洞可能更大且更具问题性。在本文中，我们旨在通过利用大语言模型（LLMs）并结合现有的判断来填补这些空洞，从而扩展CS测试集。我们使用TREC iKAT 23和TREC CAsT 22数据集来探索这一问题，这些数据集中的信息需求高度动态，响应更加多样化，因此留下了更大的空洞需要填补。我们的实验表明，CS测试集在更深轮次的对话中显示出可重用性下降的趋势。此外，微调Llama 3.1模型能够与人类评估者达成高度一致，而通过少量样本提示ChatGPT则与人类评估者的共识较低。因此，使用ChatGPT填补新系统的空洞会导致新系统的排名位置发生较大变化。而通过少量样本提示ChatGPT模型重新生成评估池并用于评估时，能够获得与人类评估池高度一致的排名相关性。我们展示了通过少量样本训练Llama 3.1模型填补空洞，能够在新系统与参与池构建的系统之间实现更公平的比较。基于少量样本训练的Llama 3.1模型填补空洞的方法，能够提高测试集的可重用性。"
     }
 ]