chore: update confs

actions-user · actions-user · commit 00c9001ebf5f · 2025-08-21T10:20:27.000Z
diff --git a/arxiv.json b/arxiv.json
@@ -53429,5 +53429,40 @@
         "pub_date": "2025-08-19",
         "summary": "Recently, multimodal large language models (MLLMs) have achieved significant advancements across various domains, and corresponding evaluation benchmarks have been continuously refined and improved. In this process, benchmarks in the scientific domain have played an important role in assessing the reasoning capabilities of MLLMs. However, existing benchmarks still face three key challenges: 1) Insufficient evaluation of models' reasoning abilities in multilingual scenarios; 2) Inadequate assessment of MLLMs' comprehensive modality coverage; 3) Lack of fine-grained annotation of scientific knowledge points. To address these gaps, we propose MME-SCI, a comprehensive and challenging benchmark. We carefully collected 1,019 high-quality question-answer pairs, which involve 3 distinct evaluation modes. These pairs cover four subjects, namely mathematics, physics, chemistry, and biology, and support five languages: Chinese, English, French, Spanish, and Japanese. We conducted extensive experiments on 16 open-source models and 4 closed-source models, and the results demonstrate that MME-SCI is widely challenging for existing MLLMs. For instance, under the Image-only evaluation mode, o4-mini achieved accuracy of only 52.11%, 24.73%, 36.57%, and 29.80% in mathematics, physics, chemistry, and biology, respectively, indicating a significantly higher difficulty level compared to existing benchmarks. More importantly, using MME-SCI's multilingual and fine-grained knowledge attributes, we analyzed existing models' performance in depth and identified their weaknesses in specific domains. The Data and Evaluation Code are available at https://github.com/JCruan519/MME-SCI.",
         "translated": "近年来，多模态大语言模型（MLLMs）在各领域取得显著进展，相应的评估基准也在持续完善。在此过程中，科学领域的基准测试对衡量MLLMs的推理能力发挥着重要作用。然而现有基准仍面临三大挑战：1）对多语言场景下模型推理能力的评估不足；2）对多模态覆盖全面性的考量不充分；3）缺乏对科学知识点的细粒度标注。为弥补这些缺陷，我们提出综合性强、挑战度高的基准测试MME-SCI。通过精心收集1,019组高质量问答对，涵盖数学、物理、化学、生物四大学科，支持中、英、法、西、日五种语言，并包含三种差异化评估模式。我们对16个开源模型和4个闭源模型进行广泛实验，结果表明MME-SCI对现有MLLMs构成普遍挑战——例如在纯图像评估模式下，o4-mini在数理化生四个科目的准确率仅分别为52.11%、24.73%、36.57%和29.80%，难度显著超越现有基准。更重要的是，借助MME-SCI的多语言与细粒度知识属性，我们深入剖析了现有模型表现，揭示了其在特定领域的薄弱环节。数据集与评估代码已开源：https://github.com/JCruan519/MME-SCI。\n\n（注：根据学术规范，翻译中对模型名称\"o4-mini\"保留原文写法，数据精度与原文完全一致，专业术语如\"multimodal large language models\"采用学界通用译法\"多模态大语言模型\"，长难句按中文习惯拆分重构，同时保留原文的严谨性与技术细节）"
+    },
+    {
+        "title": "Benefiting from Negative yet Informative Feedback by Contrasting\n  Opposing Sequential Patterns",
+        "url": "http://arxiv.org/abs/2508.14786v1",
+        "pub_date": "2025-08-20",
+        "summary": "We consider the task of learning from both positive and negative feedback in a sequential recommendation scenario, as both types of feedback are often present in user interactions. Meanwhile, conventional sequential learning models usually focus on considering and predicting positive interactions, ignoring that reducing items with negative feedback in recommendations improves user satisfaction with the service. Moreover, the negative feedback can potentially provide a useful signal for more accurate identification of true user interests. In this work, we propose to train two transformer encoders on separate positive and negative interaction sequences. We incorporate both types of feedback into the training objective of the sequential recommender using a composite loss function that includes positive and negative cross-entropy as well as a cleverly crafted contrastive term, that helps better modeling opposing patterns. We demonstrate the effectiveness of this approach in terms of increasing true-positive metrics compared to state-of-the-art sequential recommendation methods while reducing the number of wrongly promoted negative items.",
+        "translated": "本文探讨了在序列化推荐场景中同时利用正负反馈进行学习的方法，因为用户交互中通常同时存在这两种反馈类型。传统序列学习模型通常仅关注正反馈交互的考量与预测，忽视了减少推荐结果中负反馈项目对提升用户服务满意度的重要作用。事实上，负反馈能为精准识别用户真实兴趣提供有效信号。本研究提出分别使用两个Transformer编码器处理正负交互序列，通过融合正负交叉熵损失函数与精心设计的对比项构成复合损失目标，将双重反馈信号纳入序列推荐器的训练体系。该对比项有助于更好地建模对立模式，实验结果表明：相较于最先进的序列推荐方法，本方案在提升真阳性指标的同时有效减少了错误推荐负向项目的数量。"
+    },
+    {
+        "title": "OneLoc: Geo-Aware Generative Recommender Systems for Local Life Service",
+        "url": "http://arxiv.org/abs/2508.14646v1",
+        "pub_date": "2025-08-20",
+        "summary": "Local life service is a vital scenario in Kuaishou App, where video recommendation is intrinsically linked with store's location information. Thus, recommendation in our scenario is challenging because we should take into account user's interest and real-time location at the same time. In the face of such complex scenarios, end-to-end generative recommendation has emerged as a new paradigm, such as OneRec in the short video scenario, OneSug in the search scenario, and EGA in the advertising scenario. However, in local life service, an end-to-end generative recommendation model has not yet been developed as there are some key challenges to be solved. The first challenge is how to make full use of geographic information. The second challenge is how to balance multiple objectives, including user interests, the distance between user and stores, and some other business objectives. To address the challenges, we propose OneLoc. Specifically, we leverage geographic information from different perspectives: (1) geo-aware semantic ID incorporates both video and geographic information for tokenization, (2) geo-aware self-attention in the encoder leverages both video location similarity and user's real-time location, and (3) neighbor-aware prompt captures rich context information surrounding users for generation. To balance multiple objectives, we use reinforcement learning and propose two reward functions, i.e., geographic reward and GMV reward. With the above design, OneLoc achieves outstanding offline and online performance. In fact, OneLoc has been deployed in local life service of Kuaishou App. It serves 400 million active users daily, achieving 21.016% and 17.891% improvements in terms of gross merchandise value (GMV) and orders numbers.",
+        "translated": "本地生活服务是快手App的重要场景，其中视频推荐与商家位置信息存在天然关联。因此，该场景的推荐系统面临独特挑战：需同时兼顾用户兴趣与实时地理位置。面对此类复杂场景，端到端生成式推荐已成为新范式，例如短视频场景的OneRec、搜索场景的OneSug以及广告场景的EGA。然而在本地生活服务领域，由于存在若干关键挑战，端到端生成式推荐模型尚未实现。首要挑战是如何充分利用地理信息，其次是如何平衡用户兴趣、用户与商家距离及其他商业目标的多重目标。\n\n为解决这些挑战，我们提出OneLoc模型。具体而言，我们从三个维度利用地理信息：(1) 通过融合视频内容与地理信息的语义ID生成技术实现地理感知的token化；(2) 在编码器中采用地理感知自注意力机制，同时考虑视频位置相似度与用户实时定位；(3) 通过邻居感知提示机制捕捉用户周边的丰富上下文信息以辅助生成。为平衡多目标优化，我们采用强化学习框架并设计两种奖励函数：地理奖励与GMV（商品交易总额）奖励。\n\n基于上述设计，OneLoc在离线与在线场景均取得显著效果。目前该模型已部署于快手本地生活服务平台，每日服务4亿活跃用户，在GMV和订单量两个关键指标上分别实现21.016%和17.891%的提升。"
+    },
+    {
+        "title": "MISS: Multi-Modal Tree Indexing and Searching with Lifelong Sequential\n  Behavior for Retrieval Recommendation",
+        "url": "http://arxiv.org/abs/2508.14515v1",
+        "pub_date": "2025-08-20",
+        "summary": "Large-scale industrial recommendation systems typically employ a two-stage paradigm of retrieval and ranking to handle huge amounts of information. Recent research focuses on improving the performance of retrieval model. A promising way is to introduce extensive information about users and items. On one hand, lifelong sequential behavior is valuable. Existing lifelong behavior modeling methods in ranking stage focus on the interaction of lifelong behavior and candidate items from retrieval stage. In retrieval stage, it is difficult to utilize lifelong behavior because of a large corpus of candidate items. On the other hand, existing retrieval methods mostly relay on interaction information, potentially disregarding valuable multi-modal information. To solve these problems, we represent the pioneering exploration of leveraging multi-modal information and lifelong sequence model within the advanced tree-based retrieval model. We propose Multi-modal Indexing and Searching with lifelong Sequence (MISS), which contains a multi-modal index tree and a multi-modal lifelong sequence modeling module. Specifically, for better index structure, we propose multi-modal index tree, which is built using the multi-modal embedding to precisely represent item similarity. To precisely capture diverse user interests in user lifelong sequence, we propose collaborative general search unit (Co-GSU) and multi-modal general search unit (MM-GSU) for multi-perspective interests searching.",
+        "translated": "大规模工业推荐系统通常采用召回与排序的两阶段范式来处理海量信息。当前研究重点聚焦于提升召回模型的性能，其中引入丰富的用户与商品信息成为有效路径。一方面，终身序列行为数据具有重要价值。排序阶段现有的终身行为建模方法主要关注终身行为与召回阶段候选商品的交互，而在召回阶段因候选商品库规模庞大，难以有效利用终身行为数据。另一方面，现有召回方法大多依赖交互信息，可能忽略宝贵的多模态信息。针对这些问题，我们首次在基于树的先进召回模型中探索融合多模态信息与终身序列建模，提出多模态索引与终身序列搜索框架（MISS）。该框架包含多模态索引树和多模态终身序列建模模块：通过构建多模态索引树优化索引结构，利用多模态嵌入精准表征商品相似度；通过设计协同通用搜索单元（Co-GSU）和多模态通用搜索单元（MM-GSU），从多视角兴趣搜索中精确捕捉用户终身序列中的多样化兴趣。"
+    },
+    {
+        "title": "DGenCTR: Towards a Universal Generative Paradigm for Click-Through Rate\n  Prediction via Discrete Diffusion",
+        "url": "http://arxiv.org/abs/2508.14500v1",
+        "pub_date": "2025-08-20",
+        "summary": "Recent advances in generative models have inspired the field of recommender systems to explore generative approaches, but most existing research focuses on sequence generation, a paradigm ill-suited for click-through rate (CTR) prediction. CTR models critically depend on a large number of cross-features between the target item and the user to estimate the probability of clicking on the item, and discarding these cross-features will significantly impair model performance. Therefore, to harness the ability of generative models to understand data distributions and thereby alleviate the constraints of traditional discriminative models in label-scarce space, diverging from the item-generation paradigm of sequence generation methods, we propose a novel sample-level generation paradigm specifically designed for the CTR task: a two-stage Discrete Diffusion-Based Generative CTR training framework (DGenCTR). This two-stage framework comprises a diffusion-based generative pre-training stage and a CTR-targeted supervised fine-tuning stage for CTR. Finally, extensive offline experiments and online A/B testing conclusively validate the effectiveness of our framework.",
+        "translated": "生成模型的近期突破激励推荐系统领域探索生成式方法，但现有研究多聚焦于序列生成范式，该范式不适用于点击率（CTR）预测任务。CTR模型的核心在于利用目标项与用户间的大量交叉特征来估算点击概率，舍弃这些交叉特征将导致模型性能显著下降。为此，我们创新性地提出面向CTR任务的样本级生成范式——基于离散扩散的两阶段生成式CTR训练框架（DGenCTR），通过发挥生成模型理解数据分布的优势，缓解传统判别模型在标签稀疏场景下的局限性。该框架包含基于扩散模型的生成式预训练阶段和CTR定向监督微调阶段。最终，大量离线实验与在线A/B测试均验证了本框架的有效性。"
+    },
+    {
+        "title": "Global-Distribution Aware Scenario-Specific Variational Representation\n  Learning Framework",
+        "url": "http://arxiv.org/abs/2508.14493v1",
+        "pub_date": "2025-08-20",
+        "summary": "With the emergence of e-commerce, the recommendations provided by commercial platforms must adapt to diverse scenarios to accommodate users' varying shopping preferences. Current methods typically use a unified framework to offer personalized recommendations for different scenarios. However, they often employ shared bottom representations, which partially hinders the model's capacity to capture scenario uniqueness. Ideally, users and items should exhibit specific characteristics in different scenarios, prompting the need to learn scenario-specific representations to differentiate scenarios. Yet, variations in user and item interactions across scenarios lead to data sparsity issues, impeding the acquisition of scenario-specific representations. To learn robust scenario-specific representations, we introduce a Global-Distribution Aware Scenario-Specific Variational Representation Learning Framework (GSVR) that can be directly applied to existing multi-scenario methods. Specifically, considering the uncertainty stemming from limited samples, our approach employs a probabilistic model to generate scenario-specific distributions for each user and item in each scenario, estimated through variational inference (VI). Additionally, we introduce the global knowledge-aware multinomial distributions as prior knowledge to regulate the learning of the posterior user and item distributions, ensuring similarities among distributions for users with akin interests and items with similar side information. This mitigates the risk of users or items with fewer records being overwhelmed in sparse scenarios. Extensive experimental results affirm the efficacy of GSVR in assisting existing multi-scenario recommendation methods in learning more robust representations.",
+        "translated": "随着电子商务的发展，商业平台提供的推荐服务必须适应多样化场景以满足用户不同的购物偏好。现有方法通常采用统一框架为不同场景提供个性化推荐，但这些方法往往使用共享的底层表征，这在一定程度上限制了模型捕捉场景独特性的能力。理想情况下，用户和商品在不同场景中应呈现特定特征，因此需要学习场景专属表征以区分不同场景。然而，用户与商品在不同场景中的交互差异会导致数据稀疏性问题，阻碍场景专属表征的获取。为学习鲁棒的场景专属表征，我们提出一种全局分布感知的场景专属变分表征学习框架（GSVR），该框架可直接应用于现有多场景推荐方法。具体而言，考虑到有限样本带来的不确定性，我们采用概率模型为每个场景中的用户和商品生成场景专属分布，并通过变分推理（VI）进行估计。此外，我们引入全局知识感知的多项分布作为先验知识，用以规范后验用户和商品分布的学习，确保兴趣相似的用户分布和辅助信息相似的商品分布保持相近性。这种做法有效降低了记录较少的用户或商品在稀疏场景中被淹没的风险。大量实验结果证实，GSVR能有效帮助现有多场景推荐方法学习更具鲁棒性的表征。"
     }
 ]