Doragd
diff --git a/‎papers/cikm/cikm2023.md
Lines changed: 283 additions & 283 deletions b/‎papers/cikm/cikm2023.md
Lines changed: 283 additions & 283 deletions
diff --git a/‎papers/cikm/cikm2024.md
Lines changed: 210 additions & 210 deletions b/‎papers/cikm/cikm2024.md
Lines changed: 210 additions & 210 deletions
diff --git a/‎papers/ecir/ecir2024.md
Lines changed: 1 addition & 1 deletion b/‎papers/ecir/ecir2024.md
Lines changed: 1 addition & 1 deletion
@@ -8,7 +8,7 @@
 |[Incorporating Query Recommendation for Improving In-Car Conversational Search](https://doi.org/10.1007/978-3-031-56069-9_36)|Md. Rashad Al Hasan Rony, Soumya Ranjan Sahoo, Abbas Goher Khan, Ken E. Friedl, Viju Sudhi, Christian Süß||||[code](https://paperswithcode.com/search?q_meta=&q_type=&q=Incorporating+Query+Recommendation+for+Improving+In-Car+Conversational+Search)|0|
 |[ChatGPT Goes Shopping: LLMs Can Predict Relevance in eCommerce Search](https://doi.org/10.1007/978-3-031-56066-8_1)|Beatriz Soviero, Daniel Kuhn, Alexandre Salle, Viviane Pereira Moreira||||[code](https://paperswithcode.com/search?q_meta=&q_type=&q=ChatGPT+Goes+Shopping:+LLMs+Can+Predict+Relevance+in+eCommerce+Search)|0|
 |[Lottery4CVR: Neuron-Connection Level Sharing for Multi-task Learning in Video Conversion Rate Prediction](https://doi.org/10.1007/978-3-031-56069-9_31)|Xuanji Xiao, Jimmy Chen, Yuzhen Liu, Xing Yao, Pei Liu, Chaosheng Fan||||[code](https://paperswithcode.com/search?q_meta=&q_type=&q=Lottery4CVR:+Neuron-Connection+Level+Sharing+for+Multi-task+Learning+in+Video+Conversion+Rate+Prediction)|0|
-|[Utilizing Low-Dimensional Molecular Embeddings for Rapid Chemical Similarity Search](https://doi.org/10.1007/978-3-031-56060-6_3)|Kathryn E. Kirchoff, James Wellnitz, Joshua E. Hochuli, Travis Maxfield, Konstantin I. Popov, Shawn M. Gomez, Alexander Tropsha|Department of Computer Science, UNC Chapel Hill.; Department of Pharmacology, UNC Chapel Hill.; Eshelman School of Pharmacy, UNC Chapel Hill.|Nearest neighbor-based similarity searching is a common task in chemistry, with notable use cases in drug discovery. Yet, some of the most commonly used approaches for this task still leverage a brute-force approach. In practice this can be computationally costly and overly time-consuming, due in part to the sheer size of modern chemical databases. Previous computational advancements for this task have generally relied on improvements to hardware or dataset-specific tricks that lack generalizability. Approaches that leverage lower-complexity searching algorithms remain relatively underexplored. However, many of these algorithms are approximate solutions and/or struggle with typical high-dimensional chemical embeddings. Here we evaluate whether a combination of low-dimensional chemical embeddings and a k-d tree data structure can achieve fast nearest neighbor queries while maintaining performance on standard chemical similarity search benchmarks. We examine different dimensionality reductions of standard chemical embeddings as well as a learned, structurally-aware embedding-SmallSA-for this task. With this framework, searches on over one billion chemicals execute in less than a second on a single CPU core, five orders of magnitude faster than the brute-force approach. We also demonstrate that SmallSA achieves competitive performance on chemical similarity benchmarks.|基于最近邻的相似性搜索是化学中的一个常见任务，在药物发现中有着显著的应用案例。然而，这项任务中一些最常用的方法仍然使用蛮力方法。在实践中，由于现代化学品数据库的庞大规模，这可能会造成计算成本高昂和过度耗时。此任务之前的计算改进通常依赖于对缺乏普遍性的硬件或数据集特定技巧的改进。利用低复杂度搜索算法的方法仍然相对缺乏探索。然而，许多这些算法是近似解决方案和/或与典型的高维化学嵌入斗争。在这里，我们评估是否结合低维化学嵌入和 k-d 树数据结构可以实现快速最近邻查询，同时保持标准化学相似性搜索基准的性能。我们考察了不同维度的标准化学嵌入降低以及一个学习，结构意识的嵌入-SmallSA-为这项任务。有了这个框架，超过十亿种化学物质的搜索在不到一秒钟的时间内在一个 CPU 核心上执行，比蛮力搜索数量级快5倍。我们亦证明 SmallSA 在化学相似性基准方面取得具竞争力的表现。|[code](https://paperswithcode.com/search?q_meta=&q_type=&q=Utilizing+Low-Dimensional+Molecular+Embeddings+for+Rapid+Chemical+Similarity+Search)|0|
+|[Utilizing Low-Dimensional Molecular Embeddings for Rapid Chemical Similarity Search](https://doi.org/10.1007/978-3-031-56060-6_3)|Kathryn E. Kirchoff, James Wellnitz, Joshua E. Hochuli, Travis Maxfield, Konstantin I. Popov, Shawn M. Gomez, Alexander Tropsha|Eshelman School of Pharmacy, UNC Chapel Hill.; Department of Pharmacology, UNC Chapel Hill.; Department of Computer Science, UNC Chapel Hill.|Nearest neighbor-based similarity searching is a common task in chemistry, with notable use cases in drug discovery. Yet, some of the most commonly used approaches for this task still leverage a brute-force approach. In practice this can be computationally costly and overly time-consuming, due in part to the sheer size of modern chemical databases. Previous computational advancements for this task have generally relied on improvements to hardware or dataset-specific tricks that lack generalizability. Approaches that leverage lower-complexity searching algorithms remain relatively underexplored. However, many of these algorithms are approximate solutions and/or struggle with typical high-dimensional chemical embeddings. Here we evaluate whether a combination of low-dimensional chemical embeddings and a k-d tree data structure can achieve fast nearest neighbor queries while maintaining performance on standard chemical similarity search benchmarks. We examine different dimensionality reductions of standard chemical embeddings as well as a learned, structurally-aware embedding-SmallSA-for this task. With this framework, searches on over one billion chemicals execute in less than a second on a single CPU core, five orders of magnitude faster than the brute-force approach. We also demonstrate that SmallSA achieves competitive performance on chemical similarity benchmarks.|基于最近邻的相似性搜索是化学中的一个常见任务，在药物发现中有着显著的应用案例。然而，这项任务中一些最常用的方法仍然使用蛮力方法。在实践中，由于现代化学品数据库的庞大规模，这可能会造成计算成本高昂和过度耗时。此任务之前的计算改进通常依赖于对缺乏普遍性的硬件或数据集特定技巧的改进。利用低复杂度搜索算法的方法仍然相对缺乏探索。然而，许多这些算法是近似解决方案和/或与典型的高维化学嵌入斗争。在这里，我们评估是否结合低维化学嵌入和 k-d 树数据结构可以实现快速最近邻查询，同时保持标准化学相似性搜索基准的性能。我们考察了不同维度的标准化学嵌入降低以及一个学习，结构意识的嵌入-SmallSA-为这项任务。有了这个框架，超过十亿种化学物质的搜索在不到一秒钟的时间内在一个 CPU 核心上执行，比蛮力搜索数量级快5倍。我们亦证明 SmallSA 在化学相似性基准方面取得具竞争力的表现。|[code](https://paperswithcode.com/search?q_meta=&q_type=&q=Utilizing+Low-Dimensional+Molecular+Embeddings+for+Rapid+Chemical+Similarity+Search)|0|
 |[Evaluating the Impact of Content Deletion on Tabular Data Similarity and Retrieval Using Contextual Word Embeddings](https://doi.org/10.1007/978-3-031-56060-6_28)|Alberto Berenguer, David Tomás, JoseNorberto Mazón||||[code](https://paperswithcode.com/search?q_meta=&q_type=&q=Evaluating+the+Impact+of+Content+Deletion+on+Tabular+Data+Similarity+and+Retrieval+Using+Contextual+Word+Embeddings)|0|
 |[RIGHT: Retrieval-Augmented Generation for Mainstream Hashtag Recommendation](https://doi.org/10.1007/978-3-031-56027-9_3)|RunZe Fan, Yixing Fan, Jiangui Chen, Jiafeng Guo, Ruqing Zhang, Xueqi Cheng||Automatic mainstream hashtag recommendation aims to accurately provide users with concise and popular topical hashtags before publication. Generally, mainstream hashtag recommendation faces challenges in the comprehensive difficulty of newly posted tweets in response to new topics, and the accurate identification of mainstream hashtags beyond semantic correctness. However, previous retrieval-based methods based on a fixed predefined mainstream hashtag list excel in producing mainstream hashtags, but fail to understand the constant flow of up-to-date information. Conversely, generation-based methods demonstrate a superior ability to comprehend newly posted tweets, but their capacity is constrained to identifying mainstream hashtags without additional features. Inspired by the recent success of the retrieval-augmented technique, in this work, we attempt to adopt this framework to combine the advantages of both approaches. Meantime, with the help of the generator component, we could rethink how to further improve the quality of the retriever component at a low cost. Therefore, we propose RetrIeval-augmented Generative Mainstream HashTag Recommender (RIGHT), which consists of three components: 1) a retriever seeks relevant hashtags from the entire tweet-hashtags set; 2) a selector enhances mainstream identification by introducing global signals; and 3) a generator incorporates input tweets and selected hashtags to directly generate the desired hashtags. The experimental results show that our method achieves significant improvements over state-of-the-art baselines. Moreover, RIGHT can be easily integrated into large language models, improving the performance of ChatGPT by more than 10%.|自动主流话题标签推荐的目的是准确地为用户提供简洁和流行的话题标签发布前。一般来说，主流话题标签推荐面临的挑战包括: 新发布的推文在回应新话题方面的综合难度，以及在语义正确性之外对主流话题标签的准确识别。然而，以往基于固定预定义主流标签列表的检索方法在生成主流标签方面表现出色，但不能理解不断更新的信息流。相反，基于生成的方法展示了理解新发布的 tweet 的优越能力，但它们的能力仅限于识别主流标签，而没有其他特性。受近年来检索增强技术的成功启发，本文尝试采用这一框架将两种方法的优点结合起来。同时，借助于发生器组件，我们可以重新思考如何以较低的成本进一步提高检索器组件的质量。因此，我们提出了 RetrIeval 增强的生成主流 HashTag 推荐器(RIGHT) ，它由三个组成部分组成: 1)检索器从整个 tweet-HashTag 集中寻找相关的 HashTag; 2)选择器通过引入全局信号增强主流识别; 3)生成器结合输入 tweet 和选定的 HashTag 直接生成所需的 HashTag。实验结果表明，我们的方法取得了显着的改进，在最先进的基线。此外，可以很容易地将权限集成到大型语言模型中，使 ChatGPT 的性能提高10% 以上。|[code](https://paperswithcode.com/search?q_meta=&q_type=&q=RIGHT:+Retrieval-Augmented+Generation+for+Mainstream+Hashtag+Recommendation)|0|
 |[Exploring the Nexus Between Retrievability and Query Generation Strategies](https://doi.org/10.1007/978-3-031-56066-8_16)|Aman Sinha, Priyanshu Raj Mall, Dwaipayan Roy||Quantifying bias in retrieval functions through document retrievability scores is vital for assessing recall-oriented retrieval systems. However, many studies investigating retrieval model bias lack validation of their query generation methods as accurate representations of retrievability for real users and their queries. This limitation results from the absence of established criteria for query generation in retrievability assessments. Typically, researchers resort to using frequent collocations from document corpora when no query log is available. In this study, we address the issue of reproducibility and seek to validate query generation methods by comparing retrievability scores generated from artificially generated queries to those derived from query logs. Our findings demonstrate a minimal or negligible correlation between retrievability scores from artificial queries and those from query logs. This suggests that artificially generated queries may not accurately reflect retrievability scores as derived from query logs. We further explore alternative query generation techniques, uncovering a variation that exhibits the highest correlation. This alternative approach holds promise for improving reproducibility when query logs are unavailable.|通过文档可检索性评分量化检索功能中的偏差对于评估面向回忆的检索系统至关重要。然而，许多研究检索模型偏倚的研究缺乏验证其查询生成方法作为准确表示的可检索性的真实用户和他们的查询。这种局限性是由于在可检索性评估中缺乏确定的查询生成标准造成的。通常，当没有查询日志可用时，研究人员会使用文档语料库中的频繁搭配。在这项研究中，我们解决了重复性的问题，并寻求验证查询生成方法，通过比较从人工生成的查询和从查询日志得到的查询可检索性得分。我们的研究结果表明，人工查询和查询日志的可检索性得分之间的相关性很小，甚至可以忽略不计。这表明人工生成的查询可能不能准确地反映从查询日志中获得的可检索性得分。我们进一步探索替代的查询生成技术，发现具有最高相关性的变体。这种替代方法有望在查询日志不可用时提高可重复性。|[code](https://paperswithcode.com/search?q_meta=&q_type=&q=Exploring+the+Nexus+Between+Retrievability+and+Query+Generation+Strategies)|0|