You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: arxiv.json
+35Lines changed: 35 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -45729,5 +45729,40 @@
45729
45729
"pub_date": "2025-04-25",
45730
45730
"summary": "High-resolution image (HRI) understanding aims to process images with a large number of pixels, such as pathological images and agricultural aerial images, both of which can exceed 1 million pixels. Vision Large Language Models (VLMs) can allegedly handle HRIs, however, there is a lack of a comprehensive benchmark for VLMs to evaluate HRI understanding. To address this gap, we introduce HRScene, a novel unified benchmark for HRI understanding with rich scenes. HRScene incorporates 25 real-world datasets and 2 synthetic diagnostic datasets with resolutions ranging from 1,024 $\\times$ 1,024 to 35,503 $\\times$ 26,627. HRScene is collected and re-annotated by 10 graduate-level annotators, covering 25 scenarios, ranging from microscopic to radiology images, street views, long-range pictures, and telescope images. It includes HRIs of real-world objects, scanned documents, and composite multi-image. The two diagnostic evaluation datasets are synthesized by combining the target image with the gold answer and distracting images in different orders, assessing how well models utilize regions in HRI. We conduct extensive experiments involving 28 VLMs, including Gemini 2.0 Flash and GPT-4o. Experiments on HRScene show that current VLMs achieve an average accuracy of around 50% on real-world tasks, revealing significant gaps in HRI understanding. Results on synthetic datasets reveal that VLMs struggle to effectively utilize HRI regions, showing significant Regional Divergence and lost-in-middle, shedding light on future research.",
"title": "LLM-Generated Fake News Induces Truth Decay in News Ecosystem: A Case\n Study on Neural News Recommendation",
45735
+
"url": "http://arxiv.org/abs/2504.20013v1",
45736
+
"pub_date": "2025-04-28",
45737
+
"summary": "Online fake news moderation now faces a new challenge brought by the malicious use of large language models (LLMs) in fake news production. Though existing works have shown LLM-generated fake news is hard to detect from an individual aspect, it remains underexplored how its large-scale release will impact the news ecosystem. In this study, we develop a simulation pipeline and a dataset with ~56k generated news of diverse types to investigate the effects of LLM-generated fake news within neural news recommendation systems. Our findings expose a truth decay phenomenon, where real news is gradually losing its advantageous position in news ranking against fake news as LLM-generated news is involved in news recommendation. We further provide an explanation about why truth decay occurs from a familiarity perspective and show the positive correlation between perplexity and news ranking. Finally, we discuss the threats of LLM-generated fake news and provide possible countermeasures. We urge stakeholders to address this emerging challenge to preserve the integrity of news ecosystems.",
"title": "Chatbot Arena Meets Nuggets: Towards Explanations and Diagnostics in the\n Evaluation of LLM Responses",
45742
+
"url": "http://arxiv.org/abs/2504.20006v1",
45743
+
"pub_date": "2025-04-28",
45744
+
"summary": "Battles, or side-by-side comparisons in so called arenas that elicit human preferences, have emerged as a popular approach to assessing the output quality of LLMs. Recently, this idea has been extended to retrieval-augmented generation (RAG) systems. While undoubtedly representing an advance in evaluation, battles have at least two drawbacks, particularly in the context of complex information-seeking queries: they are neither explanatory nor diagnostic. Recently, the nugget evaluation methodology has emerged as a promising approach to evaluate the quality of RAG answers. Nuggets decompose long-form LLM-generated answers into atomic facts, highlighting important pieces of information necessary in a \"good\" response. In this work, we apply our AutoNuggetizer framework to analyze data from roughly 7K Search Arena battles provided by LMArena in a fully automatic manner. Our results show a significant correlation between nugget scores and human preferences, showcasing promise in our approach to explainable and diagnostic system evaluations.",
"title": "Knowledge Distillation of Domain-adapted LLMs for Question-Answering in\n Telecom",
45749
+
"url": "http://arxiv.org/abs/2504.20000v1",
45750
+
"pub_date": "2025-04-28",
45751
+
"summary": "Knowledge Distillation (KD) is one of the approaches to reduce the size of Large Language Models (LLMs). A LLM with smaller number of model parameters (student) is trained to mimic the performance of a LLM of a larger size (teacher model) on a specific task. For domain-specific tasks, it is not clear if teacher or student model, or both, must be considered for domain adaptation. In this work, we study this problem from perspective of telecom domain Question-Answering (QA) task. We systematically experiment with Supervised Fine-tuning (SFT) of teacher only, SFT of student only and SFT of both prior to KD. We design experiments to study the impact of vocabulary (same and different) and KD algorithms (vanilla KD and Dual Space KD, DSKD) on the distilled model. Multi-faceted evaluation of the distillation using 14 different metrics (N-gram, embedding and LLM-based metrics) is considered. Experimental results show that SFT of teacher improves performance of distilled model when both models have same vocabulary, irrespective of algorithm and metrics. Overall, SFT of both teacher and student results in better performance across all metrics, although the statistical significance of the same depends on the vocabulary of the teacher models.",
"summary": "Recent research on graph neural networks (GNNs) has explored mechanisms for capturing local uncertainty and exploiting graph hierarchies to mitigate data sparsity and leverage structural properties. However, the synergistic integration of these two approaches remains underexplored. In this work, we introduce a novel architecture, the Hierarchical Uncertainty-Aware Graph Neural Network (HU-GNN), which unifies multi-scale representation learning, principled uncertainty estimation, and self-supervised embedding diversity within a single end-to-end framework. Specifically, HU-GNN adaptively forms node clusters and estimates uncertainty at multiple structural scales from individual nodes to higher levels. These uncertainty estimates guide a robust message-passing mechanism and attention weighting, effectively mitigating noise and adversarial perturbations while preserving predictive accuracy on both node- and graph-level tasks. We also offer key theoretical contributions, including a probabilistic formulation, rigorous uncertainty-calibration guarantees, and formal robustness bounds. Finally, by incorporating recent advances in graph contrastive learning, HU-GNN maintains diverse, structurally faithful embeddings. Extensive experiments on standard benchmarks demonstrate that our model achieves state-of-the-art robustness and interpretability.",
45759
+
"translated": "最近关于图神经网络(GNNs)的研究探索了两种机制:通过捕捉局部不确定性来缓解数据稀疏性,以及利用图层次结构来挖掘拓扑特性。然而,这两种方法的协同整合仍未得到充分研究。本文提出了一种新颖的层次化不确定性感知图神经网络架构(HU-GNN),该架构将多尺度表征学习、理论驱动的概率估计与自监督嵌入多样性统一在端到端框架中。具体而言,HU-GNN能够在从单一节点到高层集群的多级结构尺度上自适应形成节点聚类并量化不确定性。这些不确定性估计可指导鲁棒的消息传递机制和注意力权重分配,在保持节点级和图级任务预测精度的同时,有效抑制噪声与对抗扰动。我们还提供了关键理论贡献,包括概率形式化建模、严格的不确定性校准保证以及形式化的鲁棒性边界证明。通过融入图对比学习的最新进展,HU-GNN能够保持具有结构保真性的多样化嵌入。在标准基准测试上的大量实验表明,我们的模型实现了最先进的鲁棒性和可解释性。\n\n(翻译说明:\n1. 专业术语处理:\"principled uncertainty estimation\"译为\"理论驱动的概率估计\"以强调其方法论基础,\"message-passing mechanism\"保留为专业术语\"消息传递机制\"\n2. 技术概念转化:\"adversarial perturbations\"译为\"对抗扰动\"符合领域惯例,\"structural scales\"译为\"结构尺度\"保持专业一致性\n3. 句式重构:将原文\"estimates uncertainty at multiple structural scales from individual nodes to higher levels\"拆分为中文特有的流水句式\n4. 学术风格保持:\"synergistic integration\"译为\"协同整合\"符合学术用语,\"state-of-the-art\"译为\"最先进的\"为标准译法\n5. 理论贡献部分采用\"形式化建模\"\"严格保证\"等措辞体现论文的理论严谨性)"
"summary": "Retrieval-augmented generation (RAG) has become a transformative approach for enhancing large language models (LLMs) by grounding their outputs in external knowledge sources. Yet, a critical question persists: how can vast volumes of external knowledge be managed effectively within the input constraints of LLMs? Traditional methods address this by chunking external documents into smaller, fixed-size segments. While this approach alleviates input limitations, it often fragments context, resulting in incomplete retrieval and diminished coherence in generation. To overcome these shortcomings, two advanced techniques, late chunking and contextual retrieval, have been introduced, both aiming to preserve global context. Despite their potential, their comparative strengths and limitations remain unclear. This study presents a rigorous analysis of late chunking and contextual retrieval, evaluating their effectiveness and efficiency in optimizing RAG systems. Our results indicate that contextual retrieval preserves semantic coherence more effectively but requires greater computational resources. In contrast, late chunking offers higher efficiency but tends to sacrifice relevance and completeness.",
0 commit comments