From 1c48b3c7fbab9e12fd72e4b7e969313d27035950 Mon Sep 17 00:00:00 2001 From: wangrongsheng Date: Mon, 17 Apr 2023 11:13:27 +0800 Subject: [PATCH] * update 2023-04-17 11:13:27 --- LLMs-papers.md | 31 +++++++++++++++++++++++++++++++ data/2023-04-17.json | 2 +- 2 files changed, 32 insertions(+), 1 deletion(-) diff --git a/LLMs-papers.md b/LLMs-papers.md index 2fbe058..b0a3cda 100644 --- a/LLMs-papers.md +++ b/LLMs-papers.md @@ -14,3 +14,34 @@ Corpus Models](http://arxiv.org/abs/2112.07055v2) 9. [Unsupervised Improvement of Factual Knowledge in Language Models](http://arxiv.org/abs/2304.01597v1) 10. [Multilingual Text Classification for Dravidian Languages](http://arxiv.org/abs/2112.01705v1) +11. [Can Large Language Models design a Robot?](http://arxiv.org/abs/2303.15324v1) +12. [HinFlair: pre-trained contextual string embeddings for pos tagging and + text classification in the Hindi language](http://arxiv.org/abs/2101.06949v1) +13. [Jigsaw: Large Language Models meet Program Synthesis](http://arxiv.org/abs/2112.02969v1) +14. [Larger-Scale Transformers for Multilingual Masked Language Modeling](http://arxiv.org/abs/2105.00572v1) +15. [Exploring Cross-lingual Textual Style Transfer with Large Multilingual + Language Models](http://arxiv.org/abs/2206.02252v1) +16. [In What Languages are Generative Language Models the Most Formal? + Analyzing Formality Distribution across Languages](http://arxiv.org/abs/2302.12299v1) +17. [Low-Resource Language Modelling of South African Languages](http://arxiv.org/abs/2104.00772v1) +18. [GreenPLM: Cross-lingual pre-trained language models conversion with + (almost) no cost](http://arxiv.org/abs/2211.06993v2) +19. [BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting](http://arxiv.org/abs/2212.09535v1) +20. [LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with + Knowledge Distillation](http://arxiv.org/abs/2302.08387v1) +21. [Benchmarking Language Models for Code Syntax Understanding](http://arxiv.org/abs/2210.14473v1) +22. [Can Character-based Language Models Improve Downstream Task Performance + in Low-Resource and Noisy Language Scenarios?](http://arxiv.org/abs/2110.13658v1) +23. [Learning Translation Quality Evaluation on Low Resource Languages from + Large Language Models](http://arxiv.org/abs/2302.03491v1) +24. [Investigating the Translation Performance of a Large Multilingual + Language Model: the Case of BLOOM](http://arxiv.org/abs/2303.01911v1) +25. [Investigating Robustness of Dialog Models to Popular Figurative Language + Constructs](http://arxiv.org/abs/2110.00687v1) +26. [DPRK-BERT: The Supreme Language Model](http://arxiv.org/abs/2112.00567v1) +27. [MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural + Code Generation](http://arxiv.org/abs/2208.08227v4) +28. [Subword Segmental Language Modelling for Nguni Languages](http://arxiv.org/abs/2210.06525v1) +29. [TabLLM: Few-shot Classification of Tabular Data with Large Language + Models](http://arxiv.org/abs/2210.10723v2) +30. [Language Model Supervision for Handwriting Recognition Model Adaptation](http://arxiv.org/abs/1808.01423v1) diff --git a/data/2023-04-17.json b/data/2023-04-17.json index 1592c3f..4eec936 100644 --- a/data/2023-04-17.json +++ b/data/2023-04-17.json @@ -1 +1 @@ -[{"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper clarifies why bias cannot be completely mitigated in MachineLearning (ML) and proposes an end-to-end methodology to translate the ethicalprinciple of justice and fairness into the practice of ML development as anongoing agreement with stakeholders. The pro-ethical iterative processpresented in the paper aims to challenge asymmetric power dynamics in thefairness decision making within ML design and support ML development teams toidentify, mitigate and monitor bias at each step of ML systems development. Theprocess also provides guidance on how to explain the always imperfecttrade-offs in terms of bias to users.", "output": "Fairness: from the ethical principle to the practice of Machine Learning development as an ongoing agreement with stakeholders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text-to-Image (T2I) generation is enabling new applications that supportcreators, designers, and general end users of productivity software bygenerating illustrative content with high photorealism starting from a givendescriptive text as a prompt. Such models are however trained on massiveamounts of web data, which surfaces the peril of potential harmful biases thatmay leak in the generation process itself. In this paper, we take amulti-dimensional approach to studying and quantifying common social biases asreflected in the generated images, by focusing on how occupations, personalitytraits, and everyday situations are depicted across representations of(perceived) gender, age, race, and geographical location. Through an extensiveset of both automated and human evaluation experiments we present findings fortwo popular T2I models: DALLE-v2 and Stable Diffusion. Our results reveal thatthere exist severe occupational biases of neutral prompts majorly excludinggroups of people from results for both models. Such biases can get mitigated byincreasing the amount of specification in the prompt itself, although theprompting mitigation will not address discrepancies in image quality or otherusages of the model or its representations in other scenarios. Further, weobserve personality traits being associated with only a limited set of peopleat the intersection of race, gender, and age. Finally, an analysis ofgeographical location representations on everyday situations (e.g., park, food,weddings) shows that for most situations, images generated through defaultlocation-neutral prompts are closer and more similar to images generated forlocations of United States and Germany.", "output": "Social Biases through the Text-to-Image Generation Lens."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Next generation cars embed intelligent assessment of car driving safetythrough innovative solutions often based on usage of artificial intelligence.The safety driving monitoring can be carried out using several methodologieswidely treated in scientific literature. In this context, the author proposesan innovative approach that uses ad-hoc bio-sensing system suitable toreconstruct the physio-based attentional status of the car driver. Toreconstruct the car driver physiological status, the author proposed the use ofa bio-sensing probe consisting of a coupled LEDs at Near infrared (NiR)spectrum with a photodetector. This probe placed over the monitored subjectallows to detect a physiological signal called PhotoPlethysmoGraphy (PPG). ThePPG signal formation is regulated by the change in oxygenated andnon-oxygenated hemoglobin concentration in the monitored subject bloodstreamwhich will be directly connected to cardiac activity in turn regulated by theAutonomic Nervous System (ANS) that characterizes the subject's attentionlevel. This so designed car driver drowsiness monitoring will be combined withfurther driving safety assessment based on correlated intelligent drivingscenario understanding.", "output": "Deep Learning Systems for Advanced Driving Assistance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We applied physics-informed neural networks to solve the constitutiverelations for nonlinear, path-dependent material behavior. As a result, thetrained network not only satisfies all thermodynamic constraints but alsoinstantly provides information about the current material state (i.e., freeenergy, stress, and the evolution of internal variables) under any givenloading scenario without requiring initial data. One advantage of this work isthat it bypasses the repetitive Newton iterations needed to solve nonlinearequations in complex material models. Additionally, strategies are provided toreduce the required order of derivation for obtaining the tangent operator. Thetrained model can be directly used in any finite element package (or othernumerical methods) as a user-defined material model. However, challenges remainin the proper definition of collocation points and in integrating severalnon-equality constraints that become active or non-active simultaneously. Wetested this methodology on rate-independent processes such as the classical vonMises plasticity model with a nonlinear hardening law, as well as local damagemodels for interface cracking behavior with a nonlinear softening law. Finally,we discuss the potential and remaining challenges for future developments ofthis new approach.", "output": "Learning solution of nonlinear constitutive material models using physics-informed neural networks: COMM-PINN."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Combinatorial optimization (CO) aims to efficiently find the best solution toNP-hard problems ranging from statistical physics to social media marketing. Awide range of CO applications can benefit from local search methods becausethey allow reversible action over greedy policies. Deep Q-learning (DQN) usingmessage-passing neural networks (MPNN) has shown promise in replicating thelocal search behavior and obtaining comparable results to the local searchalgorithms. However, the over-smoothing and the information loss during theiterations of message passing limit its robustness across applications, and thelarge message vectors result in memory inefficiency. Our paper introducesRELS-DQN, a lightweight DQN framework that exhibits the local search behaviorwhile providing practical scalability. Using the RELS-DQN model trained on oneapplication, it can generalize to various applications by providing solutionvalues higher than or equal to both the local search algorithms and theexisting DQN models while remaining efficient in runtime and memory.", "output": "RELS-DQN: A Robust and Efficient Local Search Framework for Combinatorial Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the continuous improvement of computing power and deep learningalgorithms in recent years, the foundation model has grown in popularity.Because of its powerful capabilities and excellent performance, this technologyis being adopted and applied by an increasing number of industries. In theintelligent transportation industry, artificial intelligence faces thefollowing typical challenges: few shots, poor generalization, and a lack ofmulti-modal techniques. Foundation model technology can significantly alleviatethe aforementioned issues. To address these, we designed the 1st FoundationModel Challenge, with the goal of increasing the popularity of foundation modeltechnology in traffic scenarios and promoting the rapid development of theintelligent transportation industry. The challenge is divided into two tracks:all-in-one and cross-modal image retrieval. Furthermore, we provide a newbaseline and benchmark for the two tracks, called Open-TransMind. According toour knowledge, Open-TransMind is the first open-source transportationfoundation model with multi-task and multi-modal capabilities. Simultaneously,Open-TransMind can achieve state-of-the-art performance on detection,classification, and segmentation datasets of traffic scenarios. Our source codeis available at ", "output": "Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent Transportation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reinforcement learning demonstrates significant potential in automaticallybuilding control policies in numerous domains, but shows low efficiency whenapplied to robot manipulation tasks due to the curse of dimensionality. Tofacilitate the learning of such tasks, prior knowledge or heuristics thatincorporate inherent simplification can effectively improve the learningperformance. This paper aims to define and incorporate the natural symmetrypresent in physical robotic environments. Then, sample-efficient policies aretrained by exploiting the expert demonstrations in symmetrical environmentsthrough an amalgamation of reinforcement and behavior cloning, which gives theoff-policy learning process a diverse yet compact initiation. Furthermore, itpresents a rigorous framework for a recent concept and explores its scope forrobot manipulation tasks. The proposed method is validated via twopoint-to-point reaching tasks of an industrial arm, with and without anobstacle, in a simulation experiment study. A PID controller, which tracks thelinear joint-space trajectories with hard-coded temporal logic to produceinterim midpoints, is used to generate demonstrations in the study. The resultsof the study present the effect of the number of demonstrations and quantifythe magnitude of behavior cloning to exemplify the possible improvement ofmodel-free reinforcement learning in common manipulation tasks. A comparisonstudy between the proposed method and a traditional off-policy reinforcementlearning algorithm indicates its advantage in learning performance andpotential value for applications.", "output": "Exploiting Symmetry and Heuristic Demonstrations in Off-policy Reinforcement Learning for Robotic Manipulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Simulation is essential to reinforcement learning (RL) before implementationin the real world, especially for safety-critical applications like robotmanipulation. Conventionally, RL agents are sensitive to the discrepanciesbetween the simulation and the real world, known as the sim-to-real gap. Theapplication of domain randomization, a technique used to fill this gap, islimited to the imposition of heuristic-randomized models. We investigate theproperties of intrinsic stochasticity of real-time simulation (RT-IS) ofoff-the-shelf simulation software and its potential to improve the robustnessof RL methods and the performance of domain randomization. Firstly, we conductanalytical studies to measure the correlation of RT-IS with the occupation ofthe computer hardware and validate its comparability with the naturalstochasticity of a physical robot. Then, we apply the RT-IS feature in thetraining of an RL agent. The simulation and physical experiment results verifythe feasibility and applicability of RT-IS to robust RL agent design for robotmanipulation tasks. The RT-IS-powered robust RL agent outperforms conventionalRL agents on robots with modeling uncertainties. It requires fewer heuristicrandomization and achieves better generalizability than the conventionaldomain-randomization-powered agents. Our findings provide a new perspective onthe sim-to-real problem in practical applications like robot manipulationtasks.", "output": "Exploiting Intrinsic Stochasticity of Real-Time Simulation to Facilitate Robust Reinforcement Learning for Robot Manipulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In biomedical research and artificial intelligence, access to large,well-balanced, and representative datasets is crucial for developingtrustworthy applications that can be used in real-world scenarios. However,obtaining such datasets can be challenging, as they are often restricted tohospitals and specialized facilities. To address this issue, the study proposesto generate highly realistic synthetic faces exhibiting drug abuse traitsthrough augmentation. The proposed method, called \"3DG-GA\", Deep De-identifiedanonymous Dataset Generation, uses Genetics Algorithm as a strategy forsynthetic faces generation. The algorithm includes GAN artificial facegeneration, forgery detection, and face recognition. Initially, a dataset of120 images of actual facial drug abuse is used. By preserving, the drug traits,the 3DG-GA provides a dataset containing 3000 synthetic facial drug abuseimages. The dataset will be open to the scientific community, which canreproduce our results and benefit from the generated datasets while avoidinglegal or ethical restrictions.", "output": "Generation of artificial facial drug abuse images using Deep De-identified anonymous Dataset augmentation through Genetics Algorithm (3DG-GA)."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The short-form videos have explosive popularity and have dominated the newsocial media trends. Prevailing short-video platforms,~textit{e.g.}, Kuaishou(Kwai), TikTok, Instagram Reels, and YouTube Shorts, have changed the way weconsume and create content. For video content creation and understanding, theshot boundary detection (SBD) is one of the most essential components invarious scenarios. In this work, we release a new public Short video sHotbOundary deTection dataset, named SHOT, consisting of 853 complete short videosand 11,606 shot annotations, with 2,716 high quality shot boundary annotationsin 200 test videos. Leveraging this new data wealth, we propose to optimize themodel design for video SBD, by conducting neural architecture search in asearch space encapsulating various advanced 3D ConvNets and Transformers. Ourproposed approach, named AutoShot, achieves higher F1 scores than previousstate-of-the-art approaches, e.g., outperforming TransNetV2 by 4.2%, when beingderived and evaluated on our newly constructed SHOT dataset. Moreover, tovalidate the generalizability of the AutoShot architecture, we directlyevaluate it on another three public datasets: ClipShots, BBC and RAI, and theF1 scores of AutoShot outperform previous state-of-the-art approaches by 1.1%,0.9% and 1.2%, respectively. The SHOT dataset and code can be found in .", "output": "AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "IoT is one of the fastest-growing technologies and it is estimated that morethan a billion devices would be utilized across the globe by the end of 2030.To maximize the capability of these connected entities, trust and reputationamong IoT entities is essential. Several trust management models have beenproposed in the IoT environment; however, these schemes have not fullyaddressed the IoT devices features, such as devices role, device type and itsdynamic behavior in a smart environment. As a result, traditional trust andreputation models are insufficient to tackle these characteristics anduncertainty risks while connecting nodes to the network. Whilst continuousstudy has been carried out and various articles suggest promising solutions inconstrained environments, research on trust and reputation is still at itsinfancy. In this paper, we carry out a comprehensive literature review onstate-of-the-art research on the trust and reputation of IoT devices andsystems. Specifically, we first propose a new structure, namely a new taxonomy,to organize the trust and reputation models based on the ways trust is managed.The proposed taxonomy comprises of traditional trust management-based systemsand artificial intelligence-based systems, and combine both the classes whichencourage the existing schemes to adapt these emerging concepts. Thiscollaboration between the conventional mathematical and the advanced ML modelsresult in design schemes that are more robust and efficient. Then we drill downto compare and analyse the methods and applications of these systems based oncommunity-accepted performance metrics, e.g. scalability, delay,cooperativeness and efficiency. Finally, built upon the findings of theanalysis, we identify and discuss open research issues and challenges, andfurther speculate and point out future research directions.", "output": "IoT trust and reputation: a survey and taxonomy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "ChatGPT has recently gathered attention from the general public and academiaas a tool that is able to generate plausible and human-sounding text answers tovarious questions. One potential use, or abuse, of ChatGPT is in answeringvarious questions or even generating whole essays and research papers in anacademic or classroom setting. While recent works have explored the use ofChatGPT in the context of humanities, business school, or medical school, thiswork explores how ChatGPT performs in the context of an introductory computerengineering course. This work assesses ChatGPT's aptitude in answering quizzes,homework, exam, and laboratory questions in an introductory-level computerengineering course. This work finds that ChatGPT can do well on questionsasking about generic concepts. However, predictably, as a text-only tool, itcannot handle questions with diagrams or figures, nor can it generate diagramsand figures. Further, also clearly, the tool cannot do hands-on labexperiments, breadboard assembly, etc., but can generate plausible answers tosome laboratory manual questions. One of the key observations presented in thiswork is that the ChatGPT tool could not be used to pass all components of thecourse. Nevertheless, it does well on quizzes and short-answer questions. Onthe other hand, plausible, human-sounding answers could confuse students whengenerating incorrect but still plausible answers.", "output": "Analyzing ChatGPT's Aptitude in an Introductory Computer Engineering Course."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Artificial General Intelligence (AGI) is poised to revolutionize a variety ofsectors, including healthcare, finance, transportation, and education. Withinhealthcare, AGI is being utilized to analyze clinical medical notes, recognizepatterns in patient data, and aid in patient management. Agriculture is anothercritical sector that impacts the lives of individuals worldwide. It serves as afoundation for providing food, fiber, and fuel, yet faces several challenges,such as climate change, soil degradation, water scarcity, and food security.AGI has the potential to tackle these issues by enhancing crop yields, reducingwaste, and promoting sustainable farming practices. It can also help farmersmake informed decisions by leveraging real-time data, leading to more efficientand effective farm management. This paper delves into the potential futureapplications of AGI in agriculture, such as agriculture image processing,natural language processing (NLP), robotics, knowledge graphs, andinfrastructure, and their impact on precision livestock and precision crops. Byleveraging the power of AGI, these emerging technologies can provide farmerswith actionable insights, allowing for optimized decision-making and increasedproductivity. The transformative potential of AGI in agriculture is vast, andthis paper aims to highlight its potential to revolutionize the industry.", "output": "AGI for Agriculture."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The task of intercepting a target moving along a rectilinear or circulartrajectory by a Dubins' car is formulated as a time-optimal control problemwith an arbitrary direction of the car's velocity at the interception moment.To solve this problem and to synthesize interception trajectories, neuralnetwork methods of unsupervised learning based on the Deep Deterministic PolicyGradient algorithm are used. The analysis of the obtained control laws andinterception trajectories in comparison with the analytical solutions of theinterception problem is performed. The mathematical modeling for the parametersof the target movement that the neural network had not seen before duringtraining is carried out. Model experiments are conducted to test the stabilityof the neural solution. The effectiveness of using neural network methods forthe synthesis of interception trajectories for given classes of targetmovements is shown.", "output": "Neural Network Algorithm for Intercepting Targets Moving Along Known Trajectories by a Dubins' Car."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As technology progresses, smart automated systems will serve an increasinglyimportant role in the agricultural industry. Current existing vision systemsfor yield estimation face difficulties in occlusion and scalability as theyutilize a camera system that is large and expensive, which are unsuitable fororchard environments. To overcome these problems, this paper presents a sizemeasurement method combining a machine learning model and depth images capturedfrom three low cost RGBD cameras to detect and measure the height and width oftomatoes. The performance of the presented system is evaluated on a labenvironment with real tomato fruits and fake leaves to simulate occlusion inthe real farm environment. To improve accuracy by addressing fruit occlusion,our three-camera system was able to achieve a height measurement accuracy of0.9114 and a width accuracy of 0.9443.", "output": "Visual based Tomato Size Measurement System for an Indoor Farming Environment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Improving performance in multiple domains is a challenging task, and oftenrequires significant amounts of data to train and test models. Active learningtechniques provide a promising solution by enabling models to select the mostinformative samples for labeling, thus reducing the amount of labeled datarequired to achieve high performance. In this paper, we present an activelearning-based framework for improving performance across multiple domains. Ourapproach consists of two stages: first, we use an initial set of labeled datato train a base model, and then we iteratively select the most informativesamples for labeling to refine the model. We evaluate our approach on severalmulti-domain datasets, including image classification, sentiment analysis, andobject recognition. Our experiments demonstrate that our approach consistentlyoutperforms baseline methods and achieves state-of-the-art performance onseveral datasets. We also show that our method is highly efficient, requiringsignificantly fewer labeled samples than other active learning-based methods.Overall, our approach provides a practical and effective solution for improvingperformance across multiple domains using active learning techniques.", "output": "Optimizing Multi-Domain Performance with Active Learning-based Improvement Strategies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-Agent Reinforcement Learning (MARL) discovers policies that maximizereward but do not have safety guarantees during the learning and deploymentphases. Although shielding with Linear Temporal Logic (LTL) is a promisingformal method to ensure safety in single-agent Reinforcement Learning (RL), itresults in conservative behaviors when scaling to multi-agent scenarios.Additionally, it poses computational challenges for synthesizing shields incomplex multi-agent environments. This work introduces Model-based DynamicShielding (MBDS) to support MARL algorithm design. Our algorithm synthesizesdistributive shields, which are reactive systems running in parallel with eachMARL agent, to monitor and rectify unsafe behaviors. The shields candynamically split, merge, and recompute based on agents' states. This designenables efficient synthesis of shields to monitor agents in complexenvironments without coordination overheads. We also propose an algorithm tosynthesize shields without prior knowledge of the dynamics model. The proposedalgorithm obtains an approximate world model by interacting with theenvironment during the early stage of exploration, making our MBDS enjoy formalsafety guarantees with high probability. We demonstrate in simulations that ourframework can surpass existing baselines in terms of safety guarantees andlearning performance.", "output": "Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present NeRFVS, a novel neural radiance fields (NeRF) based method toenable free navigation in a room. NeRF achieves impressive performance inrendering images for novel views similar to the input views while suffering fornovel views that are significantly different from the training views. Toaddress this issue, we utilize the holistic priors, including pseudo depth mapsand view coverage information, from neural reconstruction to guide the learningof implicit neural representations of 3D indoor scenes. Concretely, anoff-the-shelf neural reconstruction method is leveraged to generate a geometryscaffold. Then, two loss functions based on the holistic priors are proposed toimprove the learning of NeRF: 1) A robust depth loss that can tolerate theerror of the pseudo depth map to guide the geometry learning of NeRF; 2) Avariance loss to regularize the variance of implicit neural representations toreduce the geometry and color ambiguity in the learning procedure. These twoloss functions are modulated during NeRF optimization according to the viewcoverage information to reduce the negative influence brought by the viewcoverage imbalance. Extensive results demonstrate that our NeRFVS outperformsstate-of-the-art view synthesis methods quantitatively and qualitatively onindoor scenes, achieving high-fidelity free navigation results.", "output": "NeRFVS: Neural Radiance Fields for Free View Synthesis via Geometry Scaffolds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Heterogeneous graph neural networks aim to discover discriminative nodeembeddings and relations from multi-relational networks.One challenge ofheterogeneous graph learning is the design of learnable meta-paths, whichsignificantly influences the quality of learned embeddings.Thus, in this paper,we propose an Attributed Multi-Order Graph Convolutional Network (AMOGCN),which automatically studies meta-paths containing multi-hop neighbors from anadaptive aggregation of multi-order adjacency matrices. The proposed modelfirst builds different orders of adjacency matrices from manually designed nodeconnections. After that, an intact multi-order adjacency matrix is attachedfrom the automatic fusion of various orders of adjacency matrices. This processis supervised by the node semantic information, which is extracted from thenode homophily evaluated by attributes. Eventually, we utilize a one-layersimplifying graph convolutional network with the learned multi-order adjacencymatrix, which is equivalent to the cross-hop node information propagation withmulti-layer graph neural networks. Substantial experiments reveal that AMOGCNgains superior semi-supervised classification performance compared withstate-of-the-art competitors.", "output": "Attributed Multi-order Graph Convolutional Network for Heterogeneous Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes a framework for developing forecasting models bystreamlining the connections between core components of the developmentalprocess. The proposed framework enables swift and robust integration of newdatasets, experimentation on different algorithms, and selection of the bestmodels. We start with the datasets of different issues and apply pre-processingsteps to clean and engineer meaningful representations of time-series data. Toidentify robust training configurations, we introduce a novel mechanism ofmultiple cross-validation strategies. We apply different evaluation metrics tofind the best-suited models for varying applications. One of the referentapplications is our participation in the intelligent forecasting competitionheld by the United States Agency of International Development (USAID). Finally,we leverage the flexibility of the framework by applying different evaluationmetrics to assess the performance of the models in inventory managementsettings.", "output": "Streamlined Framework for Agile Forecasting Model Development towards Efficient Inventory Management."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The structural re-parameterization (SRP) technique is a novel deep learningtechnique that achieves interconversion between different network architecturesthrough equivalent parameter transformations. This technique enables themitigation of the extra costs for performance improvement during training, suchas parameter size and inference time, through these transformations duringinference, and therefore SRP has great potential for industrial and practicalapplications. The existing SRP methods have successfully considered manycommonly used architectures, such as normalizations, pooling methods,multi-branch convolution. However, the widely used self-attention modulescannot be directly implemented by SRP due to these modules usually act on thebackbone network in a multiplicative manner and the modules' output isinput-dependent during inference, which limits the application scenarios ofSRP. In this paper, we conduct extensive experiments from a statisticalperspective and discover an interesting phenomenon Stripe Observation, whichreveals that channel attention values quickly approach some constant vectorsduring training. This observation inspires us to propose a simple-yet-effectiveattention-alike structural re-parameterization (ASR) that allows us to achieveSRP for a given network while enjoying the effectiveness of the self-attentionmechanism. Extensive experiments conducted on several standard benchmarksdemonstrate the effectiveness of ASR in generally improving the performance ofexisting backbone networks, self-attention modules, and SRP methods without anyelaborated model crafting. We also analyze the limitations and provideexperimental or theoretical evidence for the strong robustness of the proposedASR.", "output": "ASR: Attention-alike Structural Re-parameterization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformer is beneficial for image denoising tasks since it can modellong-range dependencies to overcome the limitations presented by inductiveconvolutional biases. However, directly applying the transformer structure toremove noise is challenging because its complexity grows quadratically with thespatial resolution. In this paper, we propose an efficient Dual-branchDeformable Transformer (DDT) denoising network which captures both local andglobal interactions in parallel. We divide features with a fixed patch size anda fixed number of patches in local and global branches, respectively. Inaddition, we apply deformable attention operation in both branches, which helpsthe network focus on more important regions and further reduces computationalcomplexity. We conduct extensive experiments on real-world and syntheticdenoising tasks, and the proposed DDT achieves state-of-the-art performancewith significantly fewer computational costs.", "output": "DDT: Dual-branch Deformable Transformer for Image Denoising."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a generic framework for establishing the decidability of a widerange of logical entailment problems (briefly called querying), based on theexistence of countermodels that are structurally simple, gauged by certaintypes of width measures (with treewidth and cliquewidth as popular examples).As an important special case of our framework, we identify logics exhibitingwidth-finite finitely universal model sets, warranting decidable entailment fora wide range of homomorphism-closed queries, subsuming a diverse set ofpractically relevant query languages. As a particularly powerful width measure,we propose Blumensath's partitionwidth, which subsumes various other commonlyconsidered width measures and exhibits highly favorable computational andstructural properties. Focusing on the formalism of existential rules as apopular showcase, we explain how finite partitionwidth sets of rules subsumeother known abstract decidable classes but -- leveraging existing notions ofstratification -- also cover a wide range of new rulesets. We expose naturallimitations for fitting the class of finite unification sets into our pictureand provide several options for remedy.", "output": "Decidability of Querying First-Order Theories via Countermodels of Finite Width."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Evaluating the general abilities of foundation models to tackle human-leveltasks is a vital aspect of their development and application in the pursuit ofArtificial General Intelligence (AGI). Traditional benchmarks, which rely onartificial datasets, may not accurately represent human-level capabilities. Inthis paper, we introduce AGIEval, a novel benchmark specifically designed toassess foundation model in the context of human-centric standardized exams,such as college entrance exams, law school admission tests, math competitions,and lawyer qualification tests. We evaluate several state-of-the-art foundationmodels, including GPT-4, ChatGPT, and Text-Davinci-003, using this benchmark.Impressively, GPT-4 surpasses average human performance on SAT, LSAT, and mathcompetitions, attaining a 95% accuracy rate on the SAT Math test and a 92.5%accuracy on the English test of the Chinese national college entrance exam.This demonstrates the extraordinary performance of contemporary foundationmodels. In contrast, we also find that GPT-4 is less proficient in tasks thatrequire complex reasoning or specific domain knowledge. Our comprehensiveanalyses of model capabilities (understanding, knowledge, reasoning, andcalculation) reveal these models' strengths and limitations, providing valuableinsights into future directions for enhancing their general capabilities. Byconcentrating on tasks pertinent to human cognition and decision-making, ourbenchmark delivers a more meaningful and robust evaluation of foundationmodels' performance in real-world scenarios. The data, code, and all modeloutputs are released in ", "output": "AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Exact computation of the partition function is known to be intractable,necessitating approximate inference techniques. Existing methods forapproximate inference are slow to converge for many benchmarks. The control ofaccuracy-complexity trade-off is also non-trivial in many of these methods. Wepropose a novel incremental build-infer-approximate (IBIA) framework forapproximate inference that addresses these issues. In this framework, theprobabilistic graphical model is converted into a sequence of clique treeforests (SCTF) with bounded clique sizes. We show that the SCTF can be used toefficiently compute the partition function. We propose two new algorithms whichare used to construct the SCTF and prove the correctness of both. The first isan algorithm for incremental construction of CTFs that is guaranteed to give avalid CTF with bounded clique sizes and the second is an approximationalgorithm that takes a calibrated CTF as input and yields a valid andcalibrated CTF with reduced clique sizes as the output. We have evaluated ourmethod using several benchmark sets from recent UAI competitions and ourresults show good accuracies with competitive runtimes.", "output": "IBIA: An Incremental Build-Infer-Approximate Framework for Approximate Inference of Partition Function."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Semantic networks provide a useful tool to understand how related conceptsare retrieved from memory. However, most current network approaches usepairwise links to represent memory recall patterns. Pairwise connectionsneglect higher-order associations, i.e. relationships between more than twoconcepts at a time. These higher-order interactions might covariate with (andthus contain information about) how similar concepts are along psycholinguisticdimensions like arousal, valence, familiarity, gender and others. We overcomethese limits by introducing feature-rich cognitive hypergraphs as quantitativemodels of human memory where: (i) concepts recalled together can all engage inhyperlinks involving also more than two concepts at once (cognitive hypergraphaspect), and (ii) each concept is endowed with a vector of psycholinguisticfeatures (feature-rich aspect). We build hypergraphs from word association dataand use evaluation methods from machine learning features to predict conceptconcreteness. Since concepts with similar concreteness tend to cluster togetherin human memory, we expect to be able to leverage this structure. Using wordassociation data from the Small World of Words dataset, we compared a pairwisenetwork and a hypergraph with N=3586 concepts/nodes. Interpretable artificialintelligence models trained on (1) psycholinguistic features only, (2)pairwise-based feature aggregations, and on (3) hypergraph-based aggregationsshow significant differences between pairwise and hypergraph links.Specifically, our results show that higher-order and feature-rich hypergraphmodels contain richer information than pairwise networks leading to improvedprediction of word concreteness. The relation with previous studies aboutconceptual clustering and compartmentalisation in associative knowledge andhuman memory are discussed.", "output": "Towards hypergraph cognitive networks as feature-rich models of knowledge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Being able to create meaningful symbols and proficiently use them for highercognitive functions such as communication, reasoning, planning, etc., isessential and unique for human intelligence. Current deep neural networks arestill far behind human's ability to create symbols for such higher cognitivefunctions. Here we propose a solution, named SEA-net, to endow neural networkswith ability of symbol creation, semantic understanding and communication.SEA-net generates symbols that dynamically configure the network to performspecific tasks. These symbols capture compositional semantic information thatenables the system to acquire new functions purely by symbolic manipulation orcommunication. In addition, we found that these self-generated symbols exhibitan intrinsic structure resembling that of natural language, suggesting a commonframework underlying the generation and understanding of symbols in both humanbrains and artificial neural networks. We hope that it will be instrumental inproducing more capable systems in the future that can synergize the strengthsof connectionist and symbolic approaches for AI.", "output": "Emergence of Symbols in Neural Networks for Semantic Understanding and Communication."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The lack of interpretability of the Vision Transformer may hinder its use incritical real-world applications despite its effectiveness. To overcome thisissue, we propose a post-hoc interpretability method called VISION DIFFMASK,which uses the activations of the model's hidden layers to predict the relevantparts of the input that contribute to its final predictions. Our approach usesa gating mechanism to identify the minimal subset of the original input thatpreserves the predicted distribution over classes. We demonstrate thefaithfulness of our method, by introducing a faithfulness task, and comparingit to other state-of-the-art attribution methods on CIFAR-10 and ImageNet-1K,achieving compelling results. To aid reproducibility and further extension ofour work, we open source our implementation:", "output": "VISION DIFFMASK: Faithful Interpretation of Vision Transformers with Differentiable Patch Masking."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose a novel fully unsupervised framework that learnsaction representations suitable for the action segmentation task from thesingle input video itself, without requiring any training data. Our method is adeep metric learning approach rooted in a shallow network with a triplet lossoperating on similarity distributions and a novel triplet selection strategythat effectively models temporal and semantic priors to discover actions in thenew representational space. Under these circumstances, we successfully recovertemporal boundaries in the learned action representations with higher qualitycompared with existing unsupervised approaches. The proposed method isevaluated on two widely used benchmark datasets for the action segmentationtask and it achieves competitive performance by applying a generic clusteringalgorithm on the learned representations.", "output": "Leveraging triplet loss for unsupervised action segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Predicting high-fidelity future human poses, from a historically observedsequence, is decisive for intelligent robots to interact with humans. Deepend-to-end learning approaches, which typically train a generic pre-trainedmodel on external datasets and then directly apply it to all test samples,emerge as the dominant solution to solve this issue. Despite encouragingprogress, they remain non-optimal, as the unique properties (e.g., motionstyle, rhythm) of a specific sequence cannot be adapted. More generally, attest-time, once encountering unseen motion categories (out-of-distribution),the predicted poses tend to be unreliable. Motivated by this observation, wepropose a novel test-time adaptation framework that leverages twoself-supervised auxiliary tasks to help the primary forecasting network adaptto the test sequence. In the testing phase, our model can adjust the modelparameters by several gradient updates to improve the generation quality.However, due to catastrophic forgetting, both auxiliary tasks typically tend tothe low ability to automatically present the desired positive incentives forthe final prediction performance. For this reason, we also propose ameta-auxiliary learning scheme for better adaptation. In terms of generalsetup, our approach obtains higher accuracy, and under two new experimentaldesigns for out-of-distribution data (unseen subjects and categories), achievessignificant improvements.", "output": "Meta-Auxiliary Learning for Adaptive Human Pose Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces a comprehensive, multi-stage machine learningmethodology that effectively integrates information systems and artificialintelligence to enhance decision-making processes within the domain ofoperations research. The proposed framework adeptly addresses commonlimitations of existing solutions, such as the neglect of data-drivenestimation for vital production parameters, exclusive generation of pointforecasts without considering model uncertainty, and lacking explanationsregarding the sources of such uncertainty. Our approach employs QuantileRegression Forests for generating interval predictions, alongside both localand global variants of SHapley Additive Explanations for the examinedpredictive process monitoring problem. The practical applicability of theproposed methodology is substantiated through a real-world production planningcase study, emphasizing the potential of prescriptive analytics in refiningdecision-making procedures. This paper accentuates the imperative of addressingthese challenges to fully harness the extensive and rich data resourcesaccessible for well-informed decision-making.", "output": "Quantifying and Explaining Machine Learning Uncertainty in Predictive Process Monitoring: An Operations Research Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a systematic investigation into the effectiveness ofSelf-Supervised Learning (SSL) methods for Electrocardiogram (ECG) arrhythmiadetection. We begin by conducting a novel distribution analysis on threepopular ECG-based arrhythmia datasets: PTB-XL, Chapman, and Ribeiro. To thebest of our knowledge, our study is the first to quantify these distributionsin this area. We then perform a comprehensive set of experiments usingdifferent augmentations and parameters to evaluate the effectiveness of variousSSL methods, namely SimCRL, BYOL, and SwAV, for ECG representation learning,where we observe the best performance achieved by SwAV. Furthermore, ouranalysis shows that SSL methods achieve highly competitive results to thoseachieved by supervised state-of-the-art methods. To further assess theperformance of these methods on both In-Distribution (ID) andOut-of-Distribution (OOD) ECG data, we conduct cross-dataset training andtesting experiments. Our comprehensive experiments show almost identicalresults when comparing ID and OOD schemes, indicating that SSL techniques canlearn highly effective representations that generalize well across differentOOD datasets. This finding can have major implications for ECG-based arrhythmiadetection. Lastly, to further analyze our results, we perform detailedper-disease studies on the performance of the SSL methods on the threedatasets.", "output": "In-Distribution and Out-of-Distribution Self-supervised ECG Representation Learning for Arrhythmia Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Certified defense methods against adversarial perturbations have beenrecently investigated in the black-box setting with a zeroth-order (ZO)perspective. However, these methods suffer from high model variance with lowperformance on high-dimensional datasets due to the ineffective design of thedenoiser and are limited in their utilization of ZO techniques. To this end, wepropose a certified ZO preprocessing technique for removing adversarialperturbations from the attacked image in the black-box setting using only modelqueries. We propose a robust UNet denoiser (RDUNet) that ensures the robustnessof black-box models trained on high-dimensional datasets. We propose a novelblack-box denoised smoothing (DS) defense mechanism, ZO-RUDS, by prepending ourRDUNet to the black-box model, ensuring black-box defense. We further proposeZO-AE-RUDS in which RDUNet followed by autoencoder (AE) is prepended to theblack-box model. We perform extensive experiments on four classificationdatasets, CIFAR-10, CIFAR-10, Tiny Imagenet, STL-10, and the MNIST dataset forimage reconstruction tasks. Our proposed defense methods ZO-RUDS and ZO-AE-RUDSbeat SOTA with a huge margin of $35%$ and $9%$, for low dimensional(CIFAR-10) and with a margin of $20.61%$ and $23.51%$ for high-dimensional(STL-10) datasets, respectively.", "output": "Certified Zeroth-order Black-Box Defense with Robust UNet Denoiser."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vision transformers have been applied successfully for image recognitiontasks. There have been either multi-headed self-attention based (ViTcite{dosovitskiy2020image}, DeIT, cite{touvron2021training}) similar to theoriginal work in textual models or more recently based on spectral layers(Fnetcite{lee2021fnet}, GFNetcite{rao2021global},AFNOcite{guibas2021efficient}). We hypothesize that both spectral andmulti-headed attention plays a major role. We investigate this hypothesisthrough this work and observe that indeed combining spectral and multi-headedattention layers provides a better transformer architecture. We thus proposethe novel Spectformer architecture for transformers that combines spectral andmulti-headed attention layers. We believe that the resulting representationallows the transformer to capture the feature representation appropriately andit yields improved performance over other transformer representations. Forinstance, it improves the top-1 accuracy by 2% on ImageNet compared to bothGFNet-H and LiT. SpectFormer-S reaches 84.25% top-1 accuracy on ImageNet-1K(state of the art for small version). Further, Spectformer-L achieves 85.7%that is the state of the art for the comparable base version of thetransformers. We further ensure that we obtain reasonable results in otherscenarios such as transfer learning on standard datasets such as CIFAR-10,CIFAR-100, Oxford-IIIT-flower, and Standford Car datasets. We then investigateits use in downstream tasks such of object detection and instance segmentationon the MS-COCO dataset and observe that Spectformer shows consistentperformance that is comparable to the best backbones and can be furtheroptimized and improved. Hence, we believe that combined spectral and attentionlayers are what are needed for vision transformers.", "output": "SpectFormer: Frequency and Attention is what you need in a Vision Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "AfriSenti-SemEval Shared Task 12 of SemEval-2023. The task aims to performmonolingual sentiment classification (sub-task A) for 12 African languages,multilingual sentiment classification (sub-task B), and zero-shot sentimentclassification (task C). For sub-task A, we conducted experiments usingclassical machine learning classifiers, Afro-centric language models, andlanguage-specific models. For task B, we fine-tuned multilingual pre-trainedlanguage models that support many of the languages in the task. For task C, weused we make use of a parameter-efficient Adapter approach that leveragesmonolingual texts in the target language for effective zero-shot transfer. Ourfindings suggest that using pre-trained Afro-centric language models improvesperformance for low-resource African languages. We also ran experiments usingadapters for zero-shot tasks, and the results suggest that we can obtainpromising results by using adapters with a limited amount of resources.", "output": "Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using Afro-centric Language Models and Adapters for Low-resource African Languages."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Preserving the individuals' privacy in sharing spatial-temporal datasets iscritical to prevent re-identification attacks based on unique trajectories.Existing privacy techniques tend to propose ideal privacy-utility tradeoffs,however, largely ignore the fairness implications of mobility models andwhether such techniques perform equally for different groups of users. Thequantification between fairness and privacy-aware models is still unclear andthere barely exists any defined sets of metrics for measuring fairness in thespatial-temporal context. In this work, we define a set of fairness metricsdesigned explicitly for human mobility, based on structural similarity andentropy of the trajectories. Under these definitions, we examine the fairnessof two state-of-the-art privacy-preserving models that rely on GAN andrepresentation learning to reduce the re-identification rate of users for datasharing. Our results show that while both models guarantee group fairness interms of demographic parity, they violate individual fairness criteria,indicating that users with highly similar trajectories receive disparateprivacy gain. We conclude that the tension between the re-identification taskand individual fairness needs to be considered for future spatial-temporal dataanalysis and modelling to achieve a privacy-preserving fairness-aware setting.", "output": "Analysing Fairness of Privacy-Utility Mobility Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The ability of image and video generation models to create photorealisticimages has reached unprecedented heights, making it difficult to distinguishbetween real and fake images in many cases. However, despite this progress, agap remains between the quality of generated images and those found in the realworld. To address this, we have reviewed a vast body of literature from bothacademic publications and social media to identify qualitative shortcomings inimage generation models, which we have classified into five categories. Byunderstanding these failures, we can identify areas where these models needimprovement, as well as develop strategies for detecting deep fakes. Theprevalence of deep fakes in today's society is a serious concern, and ourfindings can help mitigate their negative impact.", "output": "Qualitative Failures of Image Generation Models and Their Application in Detecting Deepfakes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sleep abnormalities can have severe health consequences. Automated sleepstaging, i.e. labelling the sequence of sleep stages from the patient'sphysiological recordings, could simplify the diagnostic process. Previous workon automated sleep staging has achieved great results, mainly relying on theEEG signal. However, often multiple sources of information are available beyondEEG. This can be particularly beneficial when the EEG recordings are noisy oreven missing completely. In this paper, we propose CoRe-Sleep, a CoordinatedRepresentation multimodal fusion network that is particularly focused onimproving the robustness of signal analysis on imperfect data. We demonstratehow appropriately handling multimodal information can be the key to achievingsuch robustness. CoRe-Sleep tolerates noisy or missing modalities segments,allowing training on incomplete data. Additionally, it shows state-of-the-artperformance when testing on both multimodal and unimodal data using a singlemodel on SHHS-1, the largest publicly available study that includes sleep stagelabels. The results indicate that training the model on multimodal data doespositively influence performance when tested on unimodal data. This work aimsat bridging the gap between automated analysis tools and their clinicalutility.", "output": "CoRe-Sleep: A Multimodal Fusion Framework for Time Series Robust to Imperfect Modalities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which isdemonstrated to be one small step for generative AI (GAI), but one giant leapfor artificial general intelligence (AGI). Since its official release inNovember 2022, ChatGPT has quickly attracted numerous users with extensivemedia coverage. Such unprecedented attention has also motivated numerousresearchers to investigate ChatGPT from various aspects. According to Googlescholar, there are more than 500 articles with ChatGPT in their titles ormentioning it in their abstracts. Considering this, a review is urgentlyneeded, and our work fills this gap. Overall, this work is the first to surveyChatGPT with a comprehensive review of its underlying technology, applications,and challenges. Moreover, we present an outlook on how ChatGPT might evolve torealize general-purpose AIGC (a.k.a. AI-generated content), which will be asignificant milestone for the development of AGI.", "output": "One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning-based wearable human activity recognition (WHAR) modelsenable the development of various smart and connected community applicationssuch as sleep pattern monitoring, medication reminders, cognitive healthassessment, sports analytics, etc. However, the widespread adoption of theseWHAR models is impeded by their degraded performance in the presence of datadistribution heterogeneities caused by the sensor placement at different bodypositions, inherent biases and heterogeneities across devices, and personal andenvironmental diversities. Various traditional machine learning algorithms andtransfer learning techniques have been proposed in the literature to addressthe underpinning challenges of handling such data heterogeneities. Domainadaptation is one such transfer learning techniques that has gained significantpopularity in recent literature. In this paper, we survey the recent progressof domain adaptation techniques in the Inertial Measurement Unit (IMU)-basedhuman activity recognition area, discuss potential future directions.", "output": "Domain Adaptation for Inertial Measurement Unit-based Human Activity Recognition: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the Proposal for a Regulation laying down harmonised rules on ArtificialIntelligence (AI Act) the European Union provides the first regulatory documentthat applies to the entire complex of AI systems. While some fear that theregulation leaves too much room for interpretation and thus bring littlebenefit to society, others expect that the regulation is too restrictive and,thus, blocks progress and innovation, as well as hinders the economic successof companies within the EU. Without a systematic approach, it is difficult toassess how it will actually impact the AI landscape. In this paper, we suggesta systematic approach that we applied on the initial draft of the AI Act thathas been released in April 2021. We went through several iterations ofcompiling the list of AI products and projects in and from Germany, which theLernende Systeme platform lists, and then classified them according to the AIAct together with experts from the fields of computer science and law. Ourstudy shows a need for more concrete formulation, since for some provisions itis often unclear whether they are applicable in a specific case or not. Apartfrom that, it turns out that only about 30% of the AI systems considered wouldbe regulated by the AI Act, the rest would be classified as low-risk. However,as the database is not representative, the results only provide a firstassessment. The process presented can be applied to any collections, and alsorepeated when regulations are about to change. This allows fears of over- orunder-regulation to be investigated before the regulations comes into effect.", "output": "Quantitative study about the estimated impact of the AI Act."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Objective digital data is scarce yet needed in many domains to enableresearch that can transform the standard of healthcare. While data fromconsumer-grade wearables and smartphones is more accessible, there is criticalneed for similar data from clinical-grade devices used by patients with adiagnosed condition. The prevalence of wearable medical devices in the diabetesdomain sets the stage for unique research and development within this field andbeyond. However, the scarcity of open-source datasets presents a major barrierto progress. To facilitate broader research on diabetes-relevant problems andaccelerate development of robust computational solutions, we provide theDiaTrend dataset. The DiaTrend dataset is composed of intensive longitudinaldata from wearable medical devices, including a total of 27,561 days ofcontinuous glucose monitor data and 8,220 days of insulin pump data from 54patients with diabetes. This dataset is useful for developing novel analyticsolutions that can reduce the disease burden for people living with diabetesand increase knowledge on chronic condition management in outpatient settings.", "output": "DiaTrend: A dataset from advanced diabetes technology to enable development of novel analytic solutions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This chapter presents some of the fundamental assumptions and principles thatcould form the philosophical foundation of GeoAI and spatial data science.Instead of reviewing the well-established characteristics of spatial data(analysis), including interaction, neighborhoods, and autocorrelation, thechapter highlights themes such as sustainability, bias in training data,diversity in schema knowledge, and the (potential lack of) neutrality of GeoAIsystems from a unifying ethical perspective. Reflecting on our profession'sethical implications will assist us in conducting potentially disruptiveresearch more responsibly, identifying pitfalls in designing, training, anddeploying GeoAI-based systems, and developing a shared understanding of thebenefits but also potential dangers of artificial intelligence and machinelearning research across academic fields, all while sharing our unique(geo)spatial perspective with others.", "output": "Philosophical Foundations of GeoAI: Exploring Sustainability, Diversity, and Bias in GeoAI and Spatial Data Science."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Passive radio frequency (PRF)-based indoor positioning systems (IPS) haveattracted researchers' attention due to their low price, easy and customizableconfiguration, and non-invasive design. This paper proposes a PRF-basedthree-dimensional (3D) indoor positioning system (PIPS), which is able to usesignals of opportunity (SoOP) for positioning and also capture a scenariosignature. PIPS passively monitors SoOPs containing scenario signatures througha single receiver. Moreover, PIPS leverages the Dynamic Data DrivenApplications System (DDDAS) framework to devise and customize the samplingfrequency, enabling the system to use the most impacted frequency band as therated frequency band. Various regression methods within three ensemble learningstrategies are used to train and predict the receiver position. The PRFspectrum of 60 positions is collected in the experimental scenario, and threecriteria are applied to evaluate the performance of PIPS. Experimental resultsshow that the proposed PIPS possesses the advantages of high accuracy,configurability, and robustness.", "output": "Passive Radio Frequency-based 3D Indoor Positioning System via Ensemble Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper considers reliable and secure Spectrum Sensing (SS) based onFederated Learning (FL) in the Cognitive Radio (CR) environment. Motivation,architectures, and algorithms of FL in SS are discussed. Security and privacythreats on these algorithms are overviewed, along with possible countermeasuresto such attacks. Some illustrative examples are also provided, with designrecommendations for FL-based SS in future CRs.", "output": "Secure Federated Learning for Cognitive Radio Sensing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Correct identification and categorization of plant diseases are crucial forensuring the safety of the global food supply and the overall financial successof stakeholders. In this regard, a wide range of solutions has been madeavailable by introducing deep learning-based classification systems fordifferent staple crops. Despite being one of the most important commercialcrops in many parts of the globe, research proposing a smart solution forautomatically classifying apple leaf diseases remains relatively unexplored.This study presents a technique for identifying apple leaf diseases based ontransfer learning. The system extracts features using a pretrainedEfficientNetV2S architecture and passes to a classifier block for effectiveprediction. The class imbalance issues are tackled by utilizing runtime dataaugmentation. The effect of various hyperparameters, such as input resolution,learning rate, number of epochs, etc., has been investigated carefully. Thecompetence of the proposed pipeline has been evaluated on the apple leafdisease subset from the publicly available `PlantVillage' dataset, where itachieved an accuracy of 99.21%, outperforming the existing works.", "output": "An Efficient Transfer Learning-based Approach for Apple Leaf Disease Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Power-seeking behavior is a key source of risk from advanced AI, but ourtheoretical understanding of this phenomenon is relatively limited. Building onexisting theoretical results demonstrating power-seeking incentives for mostreward functions, we investigate how the training process affects power-seekingincentives and show that they are still likely to hold for trained agents undersome simplifying assumptions. We formally define the training-compatible goalset (the set of goals consistent with the training rewards) and assume that thetrained agent learns a goal from this set. In a setting where the trained agentfaces a choice to shut down or avoid shutdown in a new situation, we prove thatthe agent is likely to avoid shutdown. Thus, we show that power-seekingincentives can be probable (likely to arise for trained agents) and predictive(allowing us to predict undesirable behavior in new situations).", "output": "Power-seeking can be probable and predictive for trained agents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper assesses the accuracy, reliability and bias of the Large LanguageModel (LLM) ChatGPT-4 on the text analysis task of classifying the politicalaffiliation of a Twitter poster based on the content of a tweet. The LLM iscompared to manual annotation by both expert classifiers and crowd workers,generally considered the gold standard for such tasks. We use Twitter messagesfrom United States politicians during the 2020 election, providing a groundtruth against which to measure accuracy. The paper finds that ChatGPT-4 hasachieves higher accuracy, higher reliability, and equal or lower bias than thehuman classifiers. The LLM is able to correctly annotate messages that requirereasoning on the basis of contextual knowledge, and inferences around theauthor's intentions - traditionally seen as uniquely human abilities. Thesefindings suggest that LLM will have substantial impact on the use of textualdata in the social sciences, by enabling interpretive research at a scale.", "output": "ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political Twitter Messages with Zero-Shot Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning algorithms play an important role in a variety of importantdecision-making processes, including targeted advertisement displays, home loanapprovals, and criminal behavior predictions. Given the far-reaching impact ofthese algorithms, it is crucial that they operate fairly, free from bias orprejudice towards certain groups in the population. Ensuring impartiality inthese algorithms is essential for promoting equality and avoidingdiscrimination. To this end we introduce a unified framework for randomizedsubset selection that incorporates group fairness constraints. Our probleminvolves a global utility function and a set of group utility functions foreach group, here a group refers to a group of individuals (e.g., people)sharing the same attributes (e.g., gender). Our aim is to generate adistribution across feasible subsets, specifying the selection probability ofeach feasible set, to maximize the global utility function while meeting apredetermined quota for each group utility function in expectation. Note thatthere may not necessarily be any direct connections between the global utilityfunction and each group utility function. We demonstrate that this frameworkunifies and generalizes many significant applications in machine learning andoperations research. Our algorithmic results either improves the best knownresult or provide the first approximation algorithms for new applications.", "output": "Beyond Submodularity: A Unified Framework of Randomized Set Selection with Group Fairness Constraints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural network (DNN) models are valuable intellectual property of modelowners, constituting a competitive advantage. Therefore, it is crucial todevelop techniques to protect against model theft. Model ownership resolution(MOR) is a class of techniques that can deter model theft. A MOR scheme enablesan accuser to assert an ownership claim for a suspect model by presentingevidence, such as a watermark or fingerprint, to show that the suspect modelwas stolen or derived from a source model owned by the accuser. Most of theexisting MOR schemes prioritize robustness against malicious suspects, ensuringthat the accuser will win if the suspect model is indeed a stolen model.In this paper, we show that common MOR schemes in the literature arevulnerable to a different, equally important but insufficiently explored,robustness concern: a malicious accuser. We show how malicious accusers cansuccessfully make false claims against independent suspect models that were notstolen. Our core idea is that a malicious accuser can deviate (withoutdetection) from the specified MOR process by finding (transferable) adversarialexamples that successfully serve as evidence against independent suspectmodels. To this end, we first generalize the procedures of common MOR schemesand show that, under this generalization, defending against false claims is aschallenging as preventing (transferable) adversarial examples. Via systematicempirical evaluation we demonstrate that our false claim attacks always succeedin all prominent MOR schemes with realistic configurations, including against areal-world model: Amazon's Rekognition API.", "output": "False Claims against Model Ownership Resolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "To address the challenges of digital intelligence in the digital economy,artificial intelligence-generated content (AIGC) has emerged. AIGC usesartificial intelligence to assist or replace manual content generation bygenerating content based on user-inputted keywords or requirements. Thedevelopment of large model algorithms has significantly strengthened thecapabilities of AIGC, which makes AIGC products a promising generative tool andadds convenience to our lives. As an upstream technology, AIGC has unlimitedpotential to support different downstream applications. It is important toanalyze AIGC's current capabilities and shortcomings to understand how it canbe best utilized in future applications. Therefore, this paper provides anextensive overview of AIGC, covering its definition, essential conditions,cutting-edge capabilities, and advanced features. Moreover, it discusses thebenefits of large-scale pre-trained models and the industrial chain of AIGC.Furthermore, the article explores the distinctions between auxiliary generationand automatic generation within AIGC, providing examples of text generation.The paper also examines the potential integration of AIGC with the Metaverse.Lastly, the article highlights existing issues and suggests some futuredirections for application.", "output": "AI-Generated Content (AIGC): A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent approaches have attempted to personalize dialogue systems byleveraging profile information into models. However, this knowledge is scarceand difficult to obtain, which makes the extraction/generation of profileinformation from dialogues a fundamental asset. To surpass this limitation, weintroduce the Profile Generation Task (PGTask). We contribute with a newdataset for this problem, comprising profile sentences aligned with relatedutterances, extracted from a corpus of dialogues. Furthermore, usingstate-of-the-art methods, we provide a benchmark for profile generation on thisnovel dataset. Our experiments disclose the challenges of profile generation,and we hope that this introduces a new research direction.", "output": "PGTask: Introducing the Task of Profile Generation from Dialogues."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Controllable text generation (CTG) by large language models has a hugepotential to transform education for teachers and students alike. Specifically,high quality and diverse question generation can dramatically reduce the loadon teachers and improve the quality of their educational content. Recent workin this domain has made progress with generation, but fails to show that realteachers judge the generated questions as sufficiently useful for the classroomsetting; or if instead the questions have errors and/or pedagogically unhelpfulcontent. We conduct a human evaluation with teachers to assess the quality andusefulness of outputs from combining CTG and question taxonomies (Bloom's and adifficulty taxonomy). The results demonstrate that the questions generated arehigh quality and sufficiently useful, showing their promise for widespread usein the classroom setting.", "output": "How Useful are Educational Questions Generated by Large Language Models?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The remarkable performance of overparameterized deep neural networks (DNNs)must arise from an interplay between network architecture, training algorithms,and structure in the data. To disentangle these three components, we apply aBayesian picture, based on the functions expressed by a DNN, to supervisedlearning. The prior over functions is determined by the network, and is variedby exploiting a transition between ordered and chaotic regimes. For Booleanfunction classification, we approximate the likelihood using the error spectrumof functions on data. When combined with the prior, this accurately predictsthe posterior, measured for DNNs trained with stochastic gradient descent. Thisanalysis reveals that structured data, combined with an intrinsic Occam'srazor-like inductive bias towards (Kolmogorov) simple functions that is strongenough to counteract the exponential growth of the number of functions withcomplexity, is a key to the success of DNNs.", "output": "Do deep neural networks have an inbuilt Occam's razor?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spatial control is a core capability in controllable image generation.Advancements in layout-guided image generation have shown promising results onin-distribution (ID) datasets with similar spatial configurations. However, itis unclear how these models perform when facing out-of-distribution (OOD)samples with arbitrary, unseen layouts. In this paper, we propose LayoutBench,a diagnostic benchmark for layout-guided image generation that examines fourcategories of spatial control skills: number, position, size, and shape. Webenchmark two recent representative layout-guided image generation methods andobserve that the good ID layout control may not generalize well to arbitrarylayouts in the wild (e.g., objects at the boundary). Next, we proposeIterInpaint, a new baseline that generates foreground and background regions ina step-by-step manner via inpainting, demonstrating stronger generalizabilitythan existing models on OOD layouts in LayoutBench. We perform quantitative andqualitative evaluation and fine-grained analysis on the four LayoutBench skillsto pinpoint the weaknesses of existing models. Lastly, we show comprehensiveablation studies on IterInpaint, including training task ratio, crop&paste vs.repaint, and generation order. Project website: ", "output": "Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Few-shot learning (FSL) techniques seek to learn the underlying patterns indata using fewer samples, analogous to how humans learn from limitedexperience. In this limited-data scenario, the challenges associated with deepneural networks, such as shortcut learning and texture bias behaviors, arefurther exacerbated. Moreover, the significance of addressing shortcut learningis not yet fully explored in the few-shot setup. To address these issues, wepropose LSFSL, which enforces the model to learn more generalizable featuresutilizing the implicit prior information present in the data. Throughcomprehensive analyses, we demonstrate that LSFSL-trained models are lessvulnerable to alteration in color schemes, statistical correlations, andadversarial perturbations leveraging the global semantics in the data. Ourfindings highlight the potential of incorporating relevant priors in few-shotapproaches to increase robustness and generalization.", "output": "LSFSL: Leveraging Shape Information in Few-shot Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Individual human decision-makers may benefit from different forms of supportto improve decision outcomes. However, a key question is which form of supportwill lead to accurate decisions at a low cost. In this work, we proposelearning a decision support policy that, for a given input, chooses which formof support, if any, to provide. We consider decision-makers for whom we have noprior information and formalize learning their respective policies as amulti-objective optimization problem that trades off accuracy and cost. Usingtechniques from stochastic contextual bandits, we propose $texttt{THREAD}$, anonline algorithm to personalize a decision support policy for eachdecision-maker, and devise a hyper-parameter tuning strategy to identify acost-performance trade-off using simulated human behavior. We providecomputational experiments to demonstrate the benefits of $texttt{THREAD}$compared to offline baselines. We then introduce $texttt{Modiste}$, aninteractive tool that provides $texttt{THREAD}$ with an interface. We conducthuman subject experiments to show how $texttt{Modiste}$ learns policiespersonalized to each decision-maker and discuss the nuances of learningdecision support policies online for real users.", "output": "Learning Personalized Decision Support Policies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a method to estimate the mechanical parameters of fabrics using acasual capture setup with a depth camera. Our approach enables to createmechanically-correct digital representations of real-world textile materials,which is a fundamental step for many interactive design and engineeringapplications. As opposed to existing capture methods, which typically requireexpensive setups, video sequences, or manual intervention, our solution cancapture at scale, is agnostic to the optical appearance of the textile, andfacilitates fabric arrangement by non-expert operators. To this end, we proposea sim-to-real strategy to train a learning-based framework that can take asinput one or multiple images and outputs a full set of mechanical parameters.Thanks to carefully designed data augmentation and transfer learning protocols,our solution generalizes to real images despite being trained only on syntheticdata, hence successfully closing the sim-to-real loop.Key in our work is todemonstrate that evaluating the regression accuracy based on the similarity atparameter space leads to an inaccurate distances that do not match the humanperception. To overcome this, we propose a novel metric for fabric drapesimilarity that operates on the image domain instead on the parameter space,allowing us to evaluate our estimation within the context of a similarity rank.We show that out metric correlates with human judgments about the perception ofdrape similarity, and that our model predictions produce perceptually accurateresults compared to the ground truth parameters.", "output": "How Will It Drape Like? Capturing Fabric Mechanics from Depth Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Understanding verbs is crucial to modelling how people and objects interactwith each other and the environment through space and time. Recently,state-of-the-art video-language models based on CLIP have been shown to havelimited verb understanding and to rely extensively on nouns, restricting theirperformance in real-world video applications that require action and temporalunderstanding. In this work, we improve verb understanding for CLIP-basedvideo-language models by proposing a new Verb-Focused Contrastive (VFC)framework. This consists of two main components: (1) leveraging pretrainedlarge language models (LLMs) to create hard negatives for cross-modalcontrastive learning, together with a calibration strategy to balance theoccurrence of concepts in positive and negative pairs; and (2) enforcing afine-grained, verb phrase alignment loss. Our method achieves state-of-the-artresults for zero-shot performance on three downstream tasks that focus on verbunderstanding: video-text matching, video question-answering and videoclassification. To the best of our knowledge, this is the first work whichproposes a method to alleviate the verb understanding problem, and does notsimply highlight it.", "output": "Verbs in Action: Improving verb understanding in video-language models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Interpretability methods are valuable only if their explanations faithfullydescribe the explained model. In this work, we consider neural networks whosepredictions are invariant under a specific symmetry group. This includespopular architectures, ranging from convolutional to graph neural networks. Anyexplanation that faithfully explains this type of model needs to be inagreement with this invariance property. We formalize this intuition throughthe notion of explanation invariance and equivariance by leveraging theformalism from geometric deep learning. Through this rigorous formalism, wederive (1) two metrics to measure the robustness of any interpretability methodwith respect to the model symmetry group; (2) theoretical robustness guaranteesfor some popular interpretability methods and (3) a systematic approach toincrease the invariance of any interpretability method with respect to asymmetry group. By empirically measuring our metrics for explanations of modelsassociated with various modalities and symmetry groups, we derive a set of 5guidelines to allow users and developers of interpretability methods to producerobust explanations.", "output": "Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Areas under ROC (AUROC) and precision-recall curves (AUPRC) are commonmetrics for evaluating classification performance for imbalanced problems.Compared with AUROC, AUPRC is a more appropriate metric for highly imbalanceddatasets. While stochastic optimization of AUROC has been studied extensively,principled stochastic optimization of AUPRC has been rarely explored. In thiswork, we propose a principled technical method to optimize AUPRC for deeplearning. Our approach is based on maximizing the averaged precision (AP),which is an unbiased point estimator of AUPRC. We cast the objective into a sumof {it dependent compositional functions} with inner functions dependent onrandom variables of the outer level. We propose efficient adaptive andnon-adaptive stochastic algorithms named SOAP with {it provable convergenceguarantee under mild conditions} by leveraging recent advances in stochasticcompositional optimization. Extensive experimental results on image and graphdatasets demonstrate that our proposed method outperforms prior methods onimbalanced problems in terms of AUPRC. To the best of our knowledge, our workrepresents the first attempt to optimize AUPRC with provable convergence. TheSOAP has been implemented in the libAUC library at~url{", "output": "Stochastic Optimization of Areas Under Precision-Recall Curves with Provable Convergence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As consensus across the various published AI ethics principles is approached,a gap remains between high-level principles and practical techniques that canbe readily adopted to design and develop responsible AI systems. We examine thepractices and experiences of researchers and engineers from Australia'snational scientific research agency (CSIRO), who are involved in designing anddeveloping AI systems for many application areas. Semi-structured interviewswere used to examine how the practices of the participants relate to and alignwith a set of high-level AI ethics principles proposed by the AustralianGovernment. The principles comprise: (1) privacy protection and security, (2)reliability and safety, (3) transparency and explainability, (4) fairness, (5)contestability, (6) accountability, (7) human-centred values, (8) human, socialand environmental wellbeing. Discussions on the gained insights from theinterviews include various tensions and trade-offs between the principles, andprovide suggestions for implementing each high-level principle. We also presentsuggestions aiming to enhance associated support mechanisms.", "output": "AI Ethics Principles in Practice: Perspectives of Designers and Developers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Natural language processing (NLP) aims at investigating the interactionsbetween agents and humans, processing and analyzing large amounts of naturallanguage data. Large-scale language models play an important role in currentnatural language processing. However, the challenges of explainability andcomplexity come along with the developments of language models. One way is tointroduce logical relations and rules into natural language processing models,such as making use of Automated Planning. Automated planning (AI planning)focuses on building symbolic domain models and synthesizing plans to transitinitial states to goals based on domain models. Recently, there have beenplenty of works related to these two fields, which have the abilities togenerate explicit knowledge, e.g., preconditions and effects of action models,and learn from tacit knowledge, e.g., neural models, respectively. IntegratingAI planning and natural language processing effectively improves thecommunication between human and intelligent agents. This paper outlines thecommons and relations between AI planning and natural language processing,argues that each of them can effectively impact on the other one by five areas:(1) planning-based text understanding, (2) planning-based natural languageprocessing, (3) planning-based explainability, (4) text-based human-robotinteraction, and (5) applications. We also explore some potential future issuesbetween AI planning and natural language processing. To the best of ourknowledge, this survey is the first work that addresses the deep connectionsbetween AI planning and Natural language processing.", "output": "Integrating AI Planning with Natural Language Processing: A Combination of Explicit and Tacit Knowledge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "When cast into the Deep Reinforcement Learning framework, many robotics tasksrequire solving a long horizon and sparse reward problem, where learningalgorithms struggle. In such context, Imitation Learning (IL) can be a powerfulapproach to bootstrap the learning process. However, most IL methods requireseveral expert demonstrations which can be prohibitively difficult to acquire.Only a handful of IL algorithms have shown efficiency in the context of anextreme low expert data regime where a single expert demonstration isavailable. In this paper, we present a novel algorithm designed to imitatecomplex robotic tasks from the states of an expert trajectory. Based on asequential inductive bias, our method divides the complex task into smallerskills. The skills are learned into a goal-conditioned policy that is able tosolve each skill individually and chain skills to solve the entire task. Weshow that our method imitates a non-holonomic navigation task and scales to acomplex simulated robotic manipulation task with very high sample efficiency.", "output": "Divide & Conquer Imitation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A significant gap remains between today's visual pattern recognition modelsand human-level visual cognition especially when it comes to few-shot learningand compositional reasoning of novel concepts. We introduce Bongard-HOI, a newvisual reasoning benchmark that focuses on compositional learning ofhuman-object interactions (HOIs) from natural images. It is inspired by twodesirable characteristics from the classical Bongard problems (BPs): 1)few-shot concept learning, and 2) context-dependent reasoning. We carefullycurate the few-shot instances with hard negatives, where positive and negativeimages only disagree on action labels, making mere recognition of objectcategories insufficient to complete our benchmarks. We also design multipletest sets to systematically study the generalization of visual learning models,where we vary the overlap of the HOI concepts between the training and testsets of few-shot instances, from partial to no overlaps. Bongard-HOI presents asubstantial challenge to today's visual recognition models. Thestate-of-the-art HOI detection model achieves only 62% accuracy on few-shotbinary prediction while even amateur human testers on MTurk have 91% accuracy.With the Bongard-HOI benchmark, we hope to further advance research efforts invisual reasoning, especially in holistic perception-reasoning systems andbetter representation learning.", "output": "Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The most popular methods for measuring importance of the variables in a blackbox prediction algorithm make use of synthetic inputs that combine predictorvariables from multiple subjects. These inputs can be unlikely, physicallyimpossible, or even logically impossible. As a result, the predictions for suchcases can be based on data very unlike any the black box was trained on. Wethink that users cannot trust an explanation of the decision of a predictionalgorithm when the explanation uses such values. Instead we advocate a methodcalled Cohort Shapley that is grounded in economic game theory and unlike mostother game theoretic methods, it uses only actually observed data to quantifyvariable importance. Cohort Shapley works by narrowing the cohort of subjectsjudged to be similar to a target subject on one or more features. We illustrateit on an algorithmic fairness problem where it is essential to attributeimportance to protected variables that the model was not trained on.", "output": "Variable importance without impossible data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Incorporating personal preference is crucial in advanced machine translationtasks. Despite the recent advancement of machine translation, it remains ademanding task to properly reflect personal style. In this paper, we introducea personalized automatic post-editing framework to address this challenge,which effectively generates sentences considering distinct personal behaviors.To build this framework, we first collect post-editing data that connotes theuser preference from a live machine translation system. Specifically,real-world users enter source sentences for translation and edit themachine-translated outputs according to the user's preferred style. We thenpropose a model that combines a discriminator module and user-specificparameters on the APE framework. Experimental results show that the proposedmethod outperforms other baseline models on four different metrics (i.e., BLEU,TER, YiSi-1, and human evaluation).", "output": "PePe: Personalized Post-editing Model utilizing User-generated Post-edits."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "How can we design protein sequences folding into the desired structureseffectively and efficiently? AI methods for structure-based protein design haveattracted increasing attention in recent years; however, few methods cansimultaneously improve the accuracy and efficiency due to the lack ofexpressive features and autoregressive sequence decoder. To address theseissues, we propose PiFold, which contains a novel residue featurizer and PiGNNlayers to generate protein sequences in a one-shot way with improved recovery.Experiments show that PiFold could achieve 51.66% recovery on CATH 4.2, whilethe inference speed is 70 times faster than the autoregressive competitors. Inaddition, PiFold achieves 58.72% and 60.42% recovery scores on TS50 andTS500, respectively. We conduct comprehensive ablation studies to reveal therole of different types of protein features and model designs, inspiringfurther simplification and improvement. The PyTorch code is available athref{", "output": "PiFold: Toward effective and efficient protein inverse folding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Bayesian neural networks (BNNs) have received an increased interest in thelast years. In BNNs, a complete posterior distribution of the unknown weightand bias parameters of the network is produced during the training stage. Thisprobabilistic estimation offers several advantages with respect to point-wiseestimates, in particular, the ability to provide uncertainty quantificationwhen predicting new data. This feature inherent to the Bayesian paradigm, isuseful in countless machine learning applications. It is particularly appealingin areas where decision-making has a crucial impact, such as medical healthcareor autonomous driving. The main challenge of BNNs is the computational cost ofthe training procedure since Bayesian techniques often face a severe curse ofdimensionality. Adaptive importance sampling (AIS) is one of the most prominentMonte Carlo methodologies benefiting from sounded convergence guarantees andease for adaptation. This work aims to show that AIS constitutes a successfulapproach for designing BNNs. More precisely, we propose a novel algorithmPMCnet that includes an efficient adaptation mechanism, exploiting geometricinformation on the complex (often multimodal) posterior distribution. Numericalresults illustrate the excellent performance and the improved explorationcapabilities of the proposed method for both shallow and deep neural networks.", "output": "Efficient Bayes Inference in Neural Networks through Adaptive Importance Sampling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Classifier-free guided diffusion models have recently been shown to be highlyeffective at high-resolution image generation, and they have been widely usedin large-scale diffusion frameworks including DALLE-2, Stable Diffusion andImagen. However, a downside of classifier-free guided diffusion models is thatthey are computationally expensive at inference time since they requireevaluating two diffusion models, a class-conditional model and an unconditionalmodel, tens to hundreds of times. To deal with this limitation, we propose anapproach to distilling classifier-free guided diffusion models into models thatare fast to sample from: Given a pre-trained classifier-free guided model, wefirst learn a single model to match the output of the combined conditional andunconditional models, and then we progressively distill that model to adiffusion model that requires much fewer sampling steps. For standard diffusionmodels trained on the pixel-space, our approach is able to generate imagesvisually comparable to that of the original model using as few as 4 samplingsteps on ImageNet 64x64 and CIFAR-10, achieving FID/IS scores comparable tothat of the original model while being up to 256 times faster to sample from.For diffusion models trained on the latent-space (e.g., Stable Diffusion), ourapproach is able to generate high-fidelity images using as few as 1 to 4denoising steps, accelerating inference by at least 10-fold compared toexisting methods on ImageNet 256x256 and LAION datasets. We further demonstratethe effectiveness of our approach on text-guided image editing and inpainting,where our distilled model is able to generate high-quality results using as fewas 2-4 denoising steps.", "output": "On Distillation of Guided Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a new task to benchmark scene understanding of embodied agents:Situated Question Answering in 3D Scenes (SQA3D). Given a scene context (e.g.,3D scan), SQA3D requires the tested agent to first understand its situation(position, orientation, etc.) in the 3D scene as described by text, then reasonabout its surrounding environment and answer a question under that situation.Based upon 650 scenes from ScanNet, we provide a dataset centered around 6.8kunique situations, along with 20.4k descriptions and 33.4k diverse reasoningquestions for these situations. These questions examine a wide spectrum ofreasoning capabilities for an intelligent agent, ranging from spatial relationcomprehension to commonsense understanding, navigation, and multi-hopreasoning. SQA3D imposes a significant challenge to current multi-modalespecially 3D reasoning models. We evaluate various state-of-the-art approachesand find that the best one only achieves an overall score of 47.20%, whileamateur human participants can reach 90.06%. We believe SQA3D could facilitatefuture embodied AI research with stronger situation understanding and reasoningcapability.", "output": "SQA3D: Situated Question Answering in 3D Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The development of mobility-on-demand services, rich transportation datasources, and autonomous vehicles (AVs) creates significant opportunities forshared-use AV mobility services (SAMSs) to provide accessible anddemand-responsive personal mobility. SAMS fleet operation involves multipleinterrelated decisions, with a primary focus on efficiently fulfillingpassenger ride requests with a high level of service quality. This paperfocuses on improving the efficiency and service quality of a SAMS vehicle fleetvia anticipatory repositioning of idle vehicles. The rebalancing problem isformulated as a Markov Decision Process, which we propose solving using anadvantage actor critic (A2C) reinforcement learning-based method. The proposedapproach learns a rebalancing policy that anticipates future demand andcooperates with an optimization-based assignment strategy. The approach allowsfor centralized repositioning decisions and can handle large vehicle fleetssince the problem size does not change with the fleet size. Using New York Citytaxi data and an agent-based simulation tool, two versions of the A2C AVrepositioning approach are tested. The first version, A2C-AVR(A), learns toanticipate future demand based on past observations, while the second,A2C-AVR(B), uses demand forecasts. The models are compared to anoptimization-based rebalancing approach and show significant reduction in meanpassenger waiting times, with a slightly increased percentage of empty fleetmiles travelled. The experiments demonstrate the model's ability to anticipatefuture demand and its transferability to cases unseen at the training stage.", "output": "Anticipatory Fleet Repositioning for Shared-use Autonomous Mobility Services: An Optimization and Learning-Based Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning (ML) enabled classification models are becoming increasinglypopular for tackling the sheer volume and speed of online misinformation andother content that could be identified as harmful. In building these models,data scientists need to take a stance on the legitimacy, authoritativeness andobjectivity of the sources of ``truth\" used for model training and testing.This has political, ethical and epistemic implications which are rarelyaddressed in technical papers. Despite (and due to) their reported highaccuracy and performance, ML-driven moderation systems have the potential toshape online public debate and create downstream negative impacts such as unduecensorship and the reinforcing of false beliefs. Using collaborativeethnography and theoretical insights from social studies of science andexpertise, we offer a critical analysis of the process of building ML modelsfor (mis)information classification: we identify a series of algorithmiccontingencies--key moments during model development that could lead todifferent future outcomes, uncertainty and harmful effects as these tools aredeployed by social media platforms. We conclude by offering a tentative pathtoward reflexive and responsible development of ML tools for moderatingmisinformation and other harmful content online.", "output": "Addressing contingency in algorithmic (mis)information classification: Toward a responsible machine learning agenda."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of iterative machine teaching, where a teachersequentially provides examples based on the status of a learner under adiscrete input space (i.e., a pool of finite samples), which greatly limits theteacher's capability. To address this issue, we study iterative teaching undera continuous input space where the input example (i.e., image) can be eithergenerated by solving an optimization problem or drawn directly from acontinuous distribution. Specifically, we propose data hallucination teaching(DHT) where the teacher can generate input data intelligently based on labels,the learner's status and the target concept. We study a number of challengingteaching setups (e.g., linear/neural learners in omniscient and black-boxsettings). Extensive empirical results verify the effectiveness of DHT.", "output": "Iterative Teaching by Data Hallucination."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Due to the rapid advancements in recent years, medical image analysis islargely dominated by deep learning (DL). However, building powerful and robustDL models requires training with large multi-party datasets. While multiplestakeholders have provided publicly available datasets, the ways in which thesedata are labeled vary widely. For Instance, an institution might provide adataset of chest radiographs containing labels denoting the presence ofpneumonia, while another institution might have a focus on determining thepresence of metastases in the lung. Training a single AI model utilizing allthese data is not feasible with conventional federated learning (FL). Thisprompts us to propose an extension to the widespread FL process, namelyflexible federated learning (FFL) for collaborative training on such data.Using 695,000 chest radiographs from five institutions from across the globe -each with differing labels - we demonstrate that having heterogeneously labeleddatasets, FFL-based training leads to significant performance increase comparedto conventional FL training, where only the uniformly annotated images areutilized. We believe that our proposed algorithm could accelerate the processof bringing collaborative training methods from research and simulation phaseto the real-world applications in healthcare.", "output": "Collaborative Training of Medical Artificial Intelligence Models with non-uniform Labels."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data augmentation is a widely used technique in machine learning to improvemodel performance. However, existing data augmentation techniques in naturallanguage understanding (NLU) may not fully capture the complexity of naturallanguage variations, and they can be challenging to apply to large datasets.This paper proposes the Random Position Noise (RPN) algorithm, a novel dataaugmentation technique that operates at the word vector level. RPN modifies theword embeddings of the original text by introducing noise based on the existingvalues of selected word vectors, allowing for more fine-grained modificationsand better capturing natural language variations. Unlike traditional dataaugmentation methods, RPN does not require gradients in the computational graphduring virtual sample updates, making it simpler to apply to large datasets.Experimental results demonstrate that RPN consistently outperforms existingdata augmentation techniques across various NLU tasks, including sentimentanalysis, natural language inference, and paraphrase detection. Moreover, RPNperforms well in low-resource settings and is applicable to any model featuringa word embeddings layer. The proposed RPN algorithm is a promising approach forenhancing NLU performance and addressing the challenges associated withtraditional data augmentation techniques in large-scale NLU tasks. Ourexperimental results demonstrated that the RPN algorithm achievedstate-of-the-art performance in all seven NLU tasks, thereby highlighting itseffectiveness and potential for real-world NLU applications.", "output": "RPN: A Word Vector Level Data Augmentation Algorithm in Deep Learning for Language Understanding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "AI-based code generators are an emerging solution for automatically writingprograms starting from descriptions in natural language, by using deep neuralnetworks (Neural Machine Translation, NMT). In particular, code generators havebeen used for ethical hacking and offensive security testing by generatingproof-of-concept attacks. Unfortunately, the evaluation of code generatorsstill faces several issues. The current practice uses output similaritymetrics, i.e., automatic metrics that compute the textual similarity ofgenerated code with ground-truth references. However, it is not clear whatmetric to use, and which metric is most suitable for specific contexts. Thiswork analyzes a large set of output similarity metrics on offensive codegenerators. We apply the metrics on two state-of-the-art NMT models using twodatasets containing offensive assembly and Python code with their descriptionsin the English language. We compare the estimates from the automatic metricswith human evaluation and provide practical insights into their strengths andlimitations.", "output": "Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text-guided image editing can have a transformative impact in supportingcreative applications. A key challenge is to generate edits that are faithfulto input text prompts, while consistent with input images. We present ImagenEditor, a cascaded diffusion model built, by fine-tuning Imagen on text-guidedimage inpainting. Imagen Editor's edits are faithful to the text prompts, whichis accomplished by using object detectors to propose inpainting masks duringtraining. In addition, Imagen Editor captures fine details in the input imageby conditioning the cascaded pipeline on the original high resolution image. Toimprove qualitative and quantitative evaluation, we introduce EditBench, asystematic benchmark for text-guided image inpainting. EditBench evaluatesinpainting edits on natural and generated images exploring objects, attributes,and scenes. Through extensive human evaluation on EditBench, we find thatobject-masking during training leads to across-the-board improvements intext-image alignment -- such that Imagen Editor is preferred over DALL-E 2 andStable Diffusion -- and, as a cohort, these models are better atobject-rendering than text-rendering, and handle material/color/size attributesbetter than count/shape attributes.", "output": "Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper demonstrates an approach for learning highly semantic imagerepresentations without relying on hand-crafted data-augmentations. Weintroduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), anon-generative approach for self-supervised learning from images. The ideabehind I-JEPA is simple: from a single context block, predict therepresentations of various target blocks in the same image. A core designchoice to guide I-JEPA towards producing semantic representations is themasking strategy; specifically, it is crucial to (a) sample target blocks withsufficiently large scale (semantic), and to (b) use a sufficiently informative(spatially distributed) context block. Empirically, when combined with VisionTransformers, we find I-JEPA to be highly scalable. For instance, we train aViT-Huge/14 on ImageNet using 16 A100 GPUs in under 72 hours to achieve strongdownstream performance across a wide range of tasks, from linear classificationto object counting and depth prediction.", "output": "Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "AI advice is becoming increasingly popular, e.g., in investment and medicaltreatment decisions. As this advice is typically imperfect, decision-makershave to exert discretion as to whether actually follow that advice: they haveto \"appropriately\" rely on correct and turn down incorrect advice. However,current research on appropriate reliance still lacks a common definition aswell as an operational measurement concept. Additionally, no in-depthbehavioral experiments have been conducted that help understand the factorsinfluencing this behavior. In this paper, we propose Appropriateness ofReliance (AoR) as an underlying, quantifiable two-dimensional measurementconcept. We develop a research model that analyzes the effect of providingexplanations for AI advice. In an experiment with 200 participants, wedemonstrate how these explanations influence the AoR, and, thus, theeffectiveness of AI advice. Our work contributes fundamental concepts for theanalysis of reliance behavior and the purposeful design of AI advisors.", "output": "Appropriate Reliance on AI Advice: Conceptualization and the Effect of Explanations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reliability of machine learning evaluation -- the consistency of observedevaluation scores across replicated model training runs -- is affected byseveral sources of nondeterminism which can be regarded as measurement noise.Current tendencies to remove noise in order to enforce reproducibility ofresearch results neglect inherent nondeterminism at the implementation leveland disregard crucial interaction effects between algorithmic noise factors anddata properties. This limits the scope of conclusions that can be drawn fromsuch experiments. Instead of removing noise, we propose to incorporate severalsources of variance, including their interaction with data properties, into ananalysis of significance and reliability of machine learning evaluation, withthe aim to draw inferences beyond particular instances of trained models. Weshow how to use linear mixed effects models (LMEMs) to analyze performanceevaluation scores, and to conduct statistical inference with a generalizedlikelihood ratio test (GLRT). This allows us to incorporate arbitrary sourcesof noise like meta-parameter variations into statistical significance testing,and to assess performance differences conditional on data properties.Furthermore, a variance component analysis (VCA) enables the analysis of thecontribution of noise sources to overall variance and the computation of areliability coefficient by the ratio of substantial to total variance.", "output": "Towards Inferential Reproducibility of Machine Learning Research."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces Block Data Representations (BDR), a framework forexploring and evaluating a wide spectrum of narrow-precision formats for deeplearning. It enables comparison of popular quantization standards, and throughBDR, new formats based on shared microexponents (MX) are identified, whichoutperform other state-of-the-art quantization approaches, includingnarrow-precision floating-point and block floating-point. MX utilizes multiplelevels of quantization scaling with ultra-fine scaling factors based on sharedmicroexponents in the hardware. The effectiveness of MX is demonstrated onreal-world models including large-scale generative pretraining and inferencing,and production-scale recommendation systems.", "output": "With Shared Microexponents, A Little Shifting Goes a Long Way."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Direct physical interaction with robots is becoming increasingly important inflexible production scenarios, but robots without protective fences also pose agreater risk to the operator. In order to keep the risk potential low,relatively simple measures are prescribed for operation, such as stopping therobot if there is physical contact or if a safety distance is violated.Although human injuries can be largely avoided in this way, all such solutionshave in common that real cooperation between humans and robots is hardlypossible and therefore the advantages of working with such systems cannotdevelop its full potential. In human-robot collaboration scenarios, moresophisticated solutions are required that make it possible to adapt the robot'sbehavior to the operator and/or the current situation. Most importantly, duringfree robot movement, physical contact must be allowed for meaningfulinteraction and not recognized as a collision. However, here lies a keychallenge for future systems: detecting human contact by using robotproprioception and machine learning algorithms. This work uses the Deep MetricLearning (DML) approach to distinguish between non-contact robot movement,intentional contact aimed at physical human-robot interaction, and collisionsituations. The achieved results are promising and show show that DML achieves98.6% accuracy, which is 4% higher than the existing standards (i.e. a deeplearning network trained without DML). It also indicates a promisinggeneralization capability for easy portability to other robots (target robots)by detecting contact (distinguishing between contactless and intentional oraccidental contact) without having to retrain the model with target robot data.", "output": "Improving safety in physical human-robot collaboration via deep metric learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning agent behaviors from observational data has shown to improve ourunderstanding of their decision-making processes, advancing our ability toexplain their interactions with the environment and other agents. Whilemultiple learning techniques have been proposed in the literature, there is oneparticular setting that has not been explored yet: multi agent systems whereagent identities remain anonymous. For instance, in financial markets labeleddata that identifies market participant strategies is typically proprietary,and only the anonymous state-action pairs that result from the interaction ofmultiple market participants are publicly available. As a result, sequences ofagent actions are not observable, restricting the applicability of existingwork. In this paper, we propose a Policy Clustering algorithm, called K-SHAP,that learns to group anonymous state-action pairs according to the agentpolicies. We frame the problem as an Imitation Learning (IL) task, and we learna world-policy able to mimic all the agent behaviors upon differentenvironmental states. We leverage the world-policy to explain each anonymousobservation through an additive feature attribution method called SHAP (SHapleyAdditive exPlanations). Finally, by clustering the explanations we show that weare able to identify different agent policies and group observationsaccordingly. We evaluate our approach on simulated synthetic market data and areal-world financial dataset. We show that our proposal significantly andconsistently outperforms the existing methods, identifying different agentstrategies.", "output": "K-SHAP: Policy Clustering Algorithm for Anonymous State-Action Pairs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text-to-image generation (TTIG) models, a recent addition to creative AI, cangenerate images based on a text description. These models have begun to rivalthe work of professional creatives, and sparked discussions on the future ofcreative work, loss of jobs, and copyright issues, amongst other importantimplications. To support the sustainable adoption of TTIG, we must providerich, reliable and transparent insights into how professionals perceive, adoptand use TTIG. Crucially though, the public debate is shallow, narrow andlacking transparency, while academic work has focused on studying the use ofTTIG in a general artist population, but not on the perceptions and attitudesof professionals in a specific industry. In this paper, we contribute aqualitative, exploratory interview study on TTIG in the Finnish videogameindustry. Through a Template Analysis on semi-structured interviews with 14game professionals, we reveal 12 overarching themes, structured into 49sub-themes on professionals' perception, adoption and use of TTIG systems ingames industry practice. Experiencing (yet another) change of roles andcreative processes, our participants' reflections can inform discussions withinthe industry, be used by policymakers to inform urgently needed legislation,and support researchers in games, HCI and AI to support the sustainable,professional use of TTIG to benefit people and games as cultural artefacts.", "output": "\"An Adapt-or-Die Type of Situation\": Perception, Adoption, and Use of Text-To-Image-Generation AI by Game Industry Professionals."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) have taken the scientific world by storm,changing the landscape of natural language processing and human-computerinteraction. These powerful tools can answer complex questions and,surprisingly, perform challenging creative tasks (e.g., generate code andapplications to solve problems, write stories, pieces of music, etc.). In thispaper, we present a collaborative game design framework that combinesinteractive evolution and large language models to simulate the typical humandesign process. We use the former to exploit users' feedback for selecting themost promising ideas and large language models for a very complex creative task- the recombination and variation of ideas. In our framework, the processstarts with a brief and a set of candidate designs, either generated using alanguage model or proposed by the users. Next, users collaborate on the designprocess by providing feedback to an interactive genetic algorithm that selects,recombines, and mutates the most promising designs. We evaluated our frameworkon three game design tasks with human designers who collaborated remotely.", "output": "ChatGPT and Other Large Language Models as Evolutionary Engines for Online Interactive Collaborative Game Design."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vibration signals have been increasingly utilized in various engineeringfields for analysis and monitoring purposes, including structural healthmonitoring, fault diagnosis and damage detection, where vibration signals canprovide valuable information about the condition and integrity of structures.In recent years, there has been a growing trend towards the use of vibrationsignals in the field of bioengineering. Activity-induced structural vibrations,particularly footstep-induced signals, are useful for analyzing the movement ofbiological systems such as the human body and animals, providing valuableinformation regarding an individual's gait, body mass, and posture, making theman attractive tool for health monitoring, security, and human-computerinteraction. However, the presence of various types of noise can compromise theaccuracy of footstep-induced signal analysis. In this paper, we propose a novelensemble model that leverages both the ensemble of multiple signals and ofrecurrent and convolutional neural network predictions. The proposed modelconsists of three stages: preprocessing, hybrid modeling, and ensemble. In thepreprocessing stage, features are extracted using the Fast Fourier Transformand wavelet transform to capture the underlying physics-governed dynamics ofthe system and extract spatial and temporal features. In the hybrid modelingstage, a bi-directional LSTM is used to denoise the noisy signal concatenatedwith FFT results, and a CNN is used to obtain a condensed featurerepresentation of the signal. In the ensemble stage, three layers of afully-connected neural network are used to produce the final denoised signal.The proposed model addresses the challenges associated with structuralvibration signals, which outperforms the prevailing algorithms for a wide rangeof noise levels, evaluated using PSNR, SNR, and WMAPE.", "output": "Structural Vibration Signal Denoising Using Stacking Ensemble of Hybrid CNN-RNN."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present an artificial intelligence system to remotely assess the motorperformance of individuals with Parkinson's disease (PD). Participantsperformed a motor task (i.e., tapping fingers) in front of a webcam, and datafrom 250 global participants were rated by three expert neurologists followingthe Movement Disorder Society Unified Parkinson's Disease Rating Scale(MDS-UPDRS). The neurologists' ratings were highly reliable, with anintra-class correlation coefficient (ICC) of 0.88. We developed computeralgorithms to obtain objective measurements that align with the MDS-UPDRSguideline and are strongly correlated with the neurologists' ratings. Ourmachine learning model trained on these measures outperformed an MDS-UPDRScertified rater, with a mean absolute error (MAE) of 0.59 compared to therater's MAE of 0.79. However, the model performed slightly worse than theexpert neurologists (0.53 MAE). The methodology can be replicated for similarmotor tasks, providing the possibility of evaluating individuals with PD andother movement disorders remotely, objectively, and in areas with limitedaccess to neurological care.", "output": "Using AI to Measure Parkinson's Disease Severity at Home."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Frank-Wolfe algorithm is a popular method in structurally constrainedmachine learning applications, due to its fast per-iteration complexity.However, one major limitation of the method is a slow rate of convergence thatis difficult to accelerate due to erratic, zig-zagging step directions, evenasymptotically close to the solution. We view this as an artifact ofdiscretization; that is to say, the Frank-Wolfe emph{flow}, which is itstrajectory at asymptotically small step sizes, does not zig-zag, and reducingdiscretization error will go hand-in-hand in producing a more stabilizedmethod, with better convergence properties. We propose two improvements: amultistep Frank-Wolfe method that directly applies optimized higher-orderdiscretization schemes; and an LMO-averaging scheme with reduced discretizationerror, and whose local convergence rate over general convex sets acceleratesfrom a rate of $O(1/k)$ to up to $O(1/k^{3/2})$.", "output": "Reducing Discretization Error in the Frank-Wolfe Method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL)where multiple agents interact in the same environment and whose goal is tomaximize the individual returns. Challenges arise when scaling up the number ofagents due to the resultant non-stationarity that the many agents introduce. Inorder to address this issue, Mean Field Games (MFG) rely on the symmetry andhomogeneity assumptions to approximate games with very large populations.Recently, deep Reinforcement Learning has been used to scale MFG to games withlarger number of states. Current methods rely on smoothing techniques such asaveraging the q-values or the updates on the mean-field distribution. This workpresents a different approach to stabilize the learning based on proximalupdates on the mean-field policy. We name our algorithm Mean Field ProximalPolicy Optimization (MF-PPO), and we empirically show the effectiveness of ourmethod in the OpenSpiel framework.", "output": "Regularization of the policy updates for stabilizing Mean Field Games."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As influencers play considerable roles in social media marketing, companiesincrease the budget for influencer marketing. Hiring effective influencers iscrucial in social influencer marketing, but it is challenging to find the rightinfluencers among hundreds of millions of social media users. In this paper, wepropose InfluencerRank that ranks influencers by their effectiveness based ontheir posting behaviors and social relations over time. To represent theposting behaviors and social relations, the graph convolutional neural networksare applied to model influencers with heterogeneous networks during differenthistorical periods. By learning the network structure with the embedded nodefeatures, InfluencerRank can derive informative representations for influencersat each period. An attentive recurrent neural network finally distinguisheshighly effective influencers from other influencers by capturing the knowledgeof the dynamics of influencer representations over time. Extensive experimentshave been conducted on an Instagram dataset that consists of 18,397 influencerswith their 2,952,075 posts published within 12 months. The experimental resultsdemonstrate that InfluencerRank outperforms existing baseline methods. Anin-depth analysis further reveals that all of our proposed features and modelcomponents are beneficial to discover effective influencers.", "output": "InfluencerRank: Discovering Effective Influencers via Graph Convolutional Attentive Recurrent Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning image classification and image generation using the same set ofnetwork parameters is a challenging problem. Recent advanced approaches performwell in one task often exhibit poor performance in the other. This workintroduces an energy-based classifier and generator, namely EGC, which canachieve superior performance in both tasks using a single neural network.Unlike a conventional classifier that outputs a label given an image (i.e., aconditional distribution $p(y|mathbf{x})$), the forward pass in EGC is aclassifier that outputs a joint distribution $p(mathbf{x},y)$, enabling animage generator in its backward pass by marginalizing out the label $y$. Thisis done by estimating the energy and classification probability given a noisyimage in the forward pass, while denoising it using the score functionestimated in the backward pass. EGC achieves competitive generation resultscompared with state-of-the-art approaches on ImageNet-1k, CelebA-HQ and LSUNChurch, while achieving superior classification accuracy and robustness againstadversarial attacks on CIFAR-10. This work represents the first successfulattempt to simultaneously excel in both tasks using a single set of networkparameters. We believe that EGC bridges the gap between discriminative andgenerative learning.", "output": "EGC: Image Generation and Classification via a Diffusion Energy-Based Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video captioning aims to convey dynamic scenes from videos using naturallanguage, facilitating the understanding of spatiotemporal information withinour environment. Although there have been recent advances, generating detailedand enriched video descriptions continues to be a substantial challenge. Inthis work, we introduce Video ChatCaptioner, an innovative approach forcreating more comprehensive spatiotemporal video descriptions. Our methodemploys a ChatGPT model as a controller, specifically designed to select framesfor posing video content-driven questions. Subsequently, a robust algorithm isutilized to answer these visual queries. This question-answer frameworkeffectively uncovers intricate video details and shows promise as a method forenhancing video content. Following multiple conversational rounds, ChatGPT cansummarize enriched video content based on previous conversations. Wequalitatively demonstrate that our Video ChatCaptioner can generate captionscontaining more visual details about the videos. The code is publicly availableat ", "output": "Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human intelligence has the remarkable ability to assemble basic skills intocomplex ones so as to solve complex tasks. This ability is equally importantfor Artificial Intelligence (AI), and thus, we assert that in addition to thedevelopment of large, comprehensive intelligent models, it is equally crucialto equip such models with the capability to harness various domain-specificexpert models for complex task-solving in the pursuit of Artificial GeneralIntelligence (AGI). Recent developments in Large Language Models (LLMs) havedemonstrated remarkable learning and reasoning abilities, making them promisingas a controller to select, synthesize, and execute external models to solvecomplex tasks. In this project, we develop OpenAGI, an open-source AGI researchplatform, specifically designed to offer complex, multi-step tasks andaccompanied by task-specific datasets, evaluation metrics, and a diverse rangeof extensible models. OpenAGI formulates complex tasks as natural languagequeries, serving as input to the LLM. The LLM subsequently selects,synthesizes, and executes models provided by OpenAGI to address the task.Furthermore, we propose a Reinforcement Learning from Task Feedback (RLTF)mechanism, which uses the task-solving result as feedback to improve the LLM'stask-solving ability. Thus, the LLM is responsible for synthesizing variousexternal models for solving complex tasks, while RLTF provides feedback toimprove its task-solving ability, enabling a feedback loop for self-improvingAI. We believe that the paradigm of LLMs operating various expert models forcomplex task-solving is a promising approach towards AGI. To facilitate thecommunity's long-term improvement and evaluation of AGI's ability, weopen-source the code, benchmark, and evaluation methods of the OpenAGI projectat ", "output": "OpenAGI: When LLM Meets Domain Experts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) is a new distributed learning paradigm, with privacy,utility, and efficiency as its primary pillars. Existing research indicatesthat it is unlikely to simultaneously attain infinitesimal privacy leakage,utility loss, and efficiency. Therefore, how to find an optimal trade-offsolution is the key consideration when designing the FL algorithm. One commonway is to cast the trade-off problem as a multi-objective optimization problem,i.e., the goal is to minimize the utility loss and efficiency reduction whileconstraining the privacy leakage not exceeding a predefined value. However,existing multi-objective optimization frameworks are very time-consuming, anddo not guarantee the existence of the Pareto frontier, this motivates us toseek a solution to transform the multi-objective problem into asingle-objective problem because it is more efficient and easier to be solved.To this end, we propose FedPAC, a unified framework that leverages PAC learningto quantify multiple objectives in terms of sample complexity, suchquantification allows us to constrain the solution space of multiple objectivesto a shared dimension, so that it can be solved with the help of asingle-objective optimization algorithm. Specifically, we provide the resultsand detailed analyses of how to quantify the utility loss, privacy leakage,privacy-utility-efficiency trade-off, as well as the cost of the attacker fromthe PAC learning perspective.", "output": "Probably Approximately Correct Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Appropriately regulating artificial intelligence is an increasingly urgentpolicy challenge. Legislatures and regulators lack the specialized knowledgerequired to best translate public demands into legal requirements. Overrelianceon industry self-regulation fails to hold producers and users of AI systemsaccountable to democratic demands. Regulatory markets, in which governmentsrequire the targets of regulation to purchase regulatory services from aprivate regulator, are proposed. This approach to AI regulation could overcomethe limitations of both command-and-control regulation and self-regulation.Regulatory market could enable governments to establish policy priorities forthe regulation of AI, whilst relying on market forces and industry R&D effortsto pioneer the methods of regulation that best achieve policymakers' statedobjectives.", "output": "Regulatory Markets: The Future of AI Governance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) have made significant progress in variousdomains, including healthcare. However, the specialized nature of clinicallanguage understanding tasks presents unique challenges and limitations thatwarrant further investigation. In this study, we conduct a comprehensiveevaluation of state-of-the-art LLMs, namely GPT-3.5, GPT-4, and Bard, withinthe realm of clinical language understanding tasks. These tasks span a diverserange, including named entity recognition, relation extraction, naturallanguage inference, semantic textual similarity, document classification, andquestion-answering. We also introduce a novel prompting strategy,self-questioning prompting (SQP), tailored to enhance LLMs' performance byeliciting informative questions and answers pertinent to the clinical scenariosat hand. Our evaluation underscores the significance of task-specific learningstrategies and prompting techniques for improving LLMs' effectiveness inhealthcare-related tasks. Additionally, our in-depth error analysis on thechallenging relation extraction task offers valuable insights into errordistribution and potential avenues for improvement using SQP. Our study shedslight on the practical implications of employing LLMs in the specialized domainof healthcare, serving as a foundation for future research and the developmentof potential applications in healthcare settings.", "output": "Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Time series classification (TSC) is a challenging task due to the diversityof types of feature that may be relevant for different classification tasks,including trends, variance, frequency, magnitude, and various patterns. Toaddress this challenge, several alternative classes of approach have beendeveloped, including similarity-based, features and intervals, shapelets,dictionary, kernel, neural network, and hybrid approaches. While kernel, neuralnetwork, and hybrid approaches perform well overall, some specializedapproaches are better suited for specific tasks. In this paper, we propose anew similarity-based classifier, Proximity Forest version 2.0 (PF 2.0), whichoutperforms previous state-of-the-art similarity-based classifiers across theUCR benchmark and outperforms state-of-the-art kernel, neural network, andhybrid methods on specific datasets in the benchmark that are best addressed bysimilarity-base methods. PF 2.0 incorporates three recent advances in timeseries similarity measures -- (1) computationally efficient early abandoningand pruning to speedup elastic similarity computations; (2) a new elasticsimilarity measure, Amerced Dynamic Time Warping (ADTW); and (3) cost functiontuning. It rationalizes the set of similarity measures employed, reducing theeight base measures of the original PF to three and using the first derivativetransform with all similarity measures, rather than a limited subset. We haveimplemented both PF 1.0 and PF 2.0 in a single C++ framework, making the PFframework more efficient.", "output": "Proximity Forest 2.0: A new effective and scalable similarity-based classifier for time series."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Homographs, words with the same spelling but different meanings, remainchallenging in Neural Machine Translation (NMT). While recent works leveragevarious word embedding approaches to differentiate word sense in NMT, they donot focus on the pivotal components in resolving ambiguities of homographs inNMT: the hidden states of an encoder. In this paper, we propose a novelapproach to tackle homographic issues of NMT in the latent space. We firsttrain an encoder (aka \"HDR-encoder\") to learn universal sentencerepresentations in a natural language inference (NLI) task. We furtherfine-tune the encoder using homograph-based synset sentences from WordNet,enabling it to learn word-level homographic disambiguation representations(HDR). The pre-trained HDR-encoder is subsequently integrated with atransformer-based NMT in various schemes to improve translation accuracy.Experiments on four translation directions demonstrate the effectiveness of theproposed method in enhancing the performance of NMT systems in the BLEU scores(up to +2.3 compared to a solid baseline). The effects can be verified by othermetrics (F1, precision, and recall) of translation accuracy in an additionaldisambiguation task. Visualization methods like heatmaps, T-SNE and translationexamples are also utilized to demonstrate the effects of the proposed method.", "output": "Learning Homographic Disambiguation Representation for Neural Machine Translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The development of approaches for trajectory prediction requires metrics tovalidate and compare their performance. Currently established metrics are basedon Euclidean distance, which means that errors are weighted equally in alldirections. Euclidean metrics are insufficient for structured environments likeroads, since they do not properly capture the agent's intent relative to theunderlying lane. In order to provide a reasonable assessment of trajectoryprediction approaches with regard to the downstream planning task, we propose anew metric that is lane distance-based: Lane Miss Rate (LMR). For thecalculation of LMR, the ground-truth and predicted endpoints are assigned tolane segments, more precisely their centerlines. Measured by the distance alongthe lane segments, predictions that are within a certain threshold distance tothe ground-truth count as hits, otherwise they count as misses. LMR is thendefined as the ratio of sequences that yield a miss. Our results on threestate-of-the-art trajectory prediction models show that LMR preserves the orderof Euclidean distance-based metrics. In contrast to the Euclidean Miss Rate,qualitative results show that LMR yields misses for sequences where predictionsare located on wrong lanes. Hits on the other hand result for sequences wherepredictions are located on the correct lane. This means that LMR implicitlyweights Euclidean error relative to the lane and goes into the direction ofcapturing intents of traffic agents. The source code of LMR for Argoverse 2 ispublicly available.", "output": "LMR: Lane Distance-Based Metric for Trajectory Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advancements in areas such as natural language processing and computervision rely on intricate and massive models that have been trained using vastamounts of unlabelled or partly labeled data and training or deploying thesestate-of-the-art methods to resource constraint environments has been achallenge. Galaxy morphologies are crucial to understanding the processes bywhich galaxies form and evolve. Efficient methods to classify galaxymorphologies are required to extract physical information from modern-dayastronomy surveys. In this paper, we introduce methods to learn from lessamounts of data. We propose using a hybrid transformer-convolutionalarchitecture drawing much inspiration from the success of CoAtNet and MaxViT.Concretely, we use the transformer-convolutional hybrid with a new stack designfor the network, a different way of creating a relative self-attention layer,and pair it with a careful selection of data augmentation and regularizationtechniques. Our approach sets a new state-of-the-art on predicting galaxymorphologies from images on the Galaxy10 DECals dataset, a science objective,which consists of 17736 labeled images achieving $94.86%$ top-$1$ accuracy,beating the current state-of-the-art for this task by $4.62%$. Furthermore,this approach also sets a new state-of-the-art on CIFAR-100 and Tiny ImageNet.We also find that models and training methods used for larger datasets wouldoften not work very well in the low-data regime. Our code and models will bereleased at a later date before the conference.", "output": "Astroformer: More Data Might Not be All You Need for Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text-to-Image (T2I) generation is enabling new applications that supportcreators, designers, and general end users of productivity software bygenerating illustrative content with high photorealism starting from a givendescriptive text as a prompt. Such models are however trained on massiveamounts of web data, which surfaces the peril of potential harmful biases thatmay leak in the generation process itself. In this paper, we take amulti-dimensional approach to studying and quantifying common social biases asreflected in the generated images, by focusing on how occupations, personalitytraits, and everyday situations are depicted across representations of(perceived) gender, age, race, and geographical location. Through an extensiveset of both automated and human evaluation experiments we present findings fortwo popular T2I models: DALLE-v2 and Stable Diffusion. Our results reveal thatthere exist severe occupational biases of neutral prompts majorly excludinggroups of people from results for both models. Such biases can get mitigated byincreasing the amount of specification in the prompt itself, although theprompting mitigation will not address discrepancies in image quality or otherusages of the model or its representations in other scenarios. Further, weobserve personality traits being associated with only a limited set of peopleat the intersection of race, gender, and age. Finally, an analysis ofgeographical location representations on everyday situations (e.g., park, food,weddings) shows that for most situations, images generated through defaultlocation-neutral prompts are closer and more similar to images generated forlocations of United States and Germany.", "output": "Social Biases through the Text-to-Image Generation Lens."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the continuous improvement of computing power and deep learningalgorithms in recent years, the foundation model has grown in popularity.Because of its powerful capabilities and excellent performance, this technologyis being adopted and applied by an increasing number of industries. In theintelligent transportation industry, artificial intelligence faces thefollowing typical challenges: few shots, poor generalization, and a lack ofmulti-modal techniques. Foundation model technology can significantly alleviatethe aforementioned issues. To address these, we designed the 1st FoundationModel Challenge, with the goal of increasing the popularity of foundation modeltechnology in traffic scenarios and promoting the rapid development of theintelligent transportation industry. The challenge is divided into two tracks:all-in-one and cross-modal image retrieval. Furthermore, we provide a newbaseline and benchmark for the two tracks, called Open-TransMind. According toour knowledge, Open-TransMind is the first open-source transportationfoundation model with multi-task and multi-modal capabilities. Simultaneously,Open-TransMind can achieve state-of-the-art performance on detection,classification, and segmentation datasets of traffic scenarios. Our source codeis available at ", "output": "Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent Transportation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "3D object retrieval is an important yet challenging task, which has drawnmore and more attention in recent years. While existing approaches have madestrides in addressing this issue, they are often limited to restricted settingssuch as image and sketch queries, which are often unfriendly interactions forcommon users. In order to overcome these limitations, this paper presents anovel SHREC challenge track focusing on text-based fine-grained retrieval of 3Danimal models. Unlike previous SHREC challenge tracks, the proposed task isconsiderably more challenging, requiring participants to develop innovativeapproaches to tackle the problem of text-based retrieval. Despite the increaseddifficulty, we believe that this task has the potential to drive usefulapplications in practice and facilitate more intuitive interactions with 3Dobjects. Five groups participated in our competition, submitting a total of 114runs. While the results obtained in our competition are satisfactory, we notethat the challenges presented by this task are far from being fully solved. Assuch, we provide insights into potential areas for future research andimprovements. We believe that we can help push the boundaries of 3D objectretrieval and facilitate more user-friendly interactions via vision-languagetechnologies.", "output": "TextANIMAR: Text-based 3D Animal Fine-Grained Retrieval."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Ultra-low-resolution Infrared (IR) array sensors offer a low-cost,energy-efficient, and privacy-preserving solution for people counting, withapplications such as occupancy monitoring. Previous work has shown that DeepLearning (DL) can yield superior performance on this task. However, theliterature was missing an extensive comparative analysis of various efficientDL architectures for IR array-based people counting, that considers not onlytheir accuracy, but also the cost of deploying them on memory- andenergy-constrained Internet of Things (IoT) edge nodes. In this work, weaddress this need by comparing 6 different DL architectures on a novel datasetcomposed of IR images collected from a commercial 8x8 array, which we madeopenly available. With a wide architectural exploration of each model type, weobtain a rich set of Pareto-optimal solutions, spanning cross-validatedbalanced accuracy scores in the 55.70-82.70% range. When deployed on acommercial Microcontroller (MCU) by STMicroelectronics, the STM32L4A6ZG, thesemodels occupy 0.41-9.28kB of memory, and require 1.10-7.74ms per inference,while consuming 17.18-120.43 $mu$J of energy. Our models are significantlymore accurate than a previous deterministic method (up to +39.9%), while beingup to 3.53x faster and more energy efficient. Further, our models' accuracy iscomparable to state-of-the-art DL solutions on similar resolution sensors,despite a much lower complexity. All our models enable continuous, real-timeinference on a MCU-based IoT node, with years of autonomous operation withoutbattery recharging.", "output": "Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training models to apply linguistic knowledge and visual concepts from 2Dimages to 3D world understanding is a promising direction that researchers haveonly recently started to explore. In this work, we design a novel 3Dpre-training Vision-Language method that helps a model learn semanticallymeaningful and transferable 3D scene point cloud representations. We inject therepresentational power of the popular CLIP model into our 3D encoder byaligning the encoded 3D scene features with the corresponding 2D image and textembeddings produced by CLIP. To assess our model's 3D world reasoningcapability, we evaluate it on the downstream task of 3D Visual QuestionAnswering. Experimental quantitative and qualitative results show that ourpre-training method outperforms state-of-the-art works in this task and leadsto an interpretable representation of 3D scene features.", "output": "CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present Roto-Translation Equivariant Spherical Deconvolution (RT-ESD), an$E(3)times SO(3)$ equivariant framework for sparse deconvolution of volumeswhere each voxel contains a spherical signal. Such 6D data naturally arises indiffusion MRI (dMRI), a medical imaging modality widely used to measuremicrostructure and structural connectivity. As each dMRI voxel is typically amixture of various overlapping structures, there is a need for blinddeconvolution to recover crossing anatomical structures such as white mattertracts. Existing dMRI work takes either an iterative or deep learning approachto sparse spherical deconvolution, yet it typically does not account forrelationships between neighboring measurements. This work constructsequivariant deep learning layers which respect to symmetries of spatialrotations, reflections, and translations, alongside the symmetries of voxelwisespherical rotations. As a result, RT-ESD improves on previous work acrossseveral tasks including fiber recovery on the DiSCo dataset,deconvolution-derived partial volume estimation on real-world textit{in vivo}human brain dMRI, and improved downstream reconstruction of fiber tractogramson the Tractometer dataset. Our implementation is available at", "output": "$E(3) \\times SO(3)$-Equivariant Networks for Spherical Deconvolution in Diffusion MRI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In biomedical research and artificial intelligence, access to large,well-balanced, and representative datasets is crucial for developingtrustworthy applications that can be used in real-world scenarios. However,obtaining such datasets can be challenging, as they are often restricted tohospitals and specialized facilities. To address this issue, the study proposesto generate highly realistic synthetic faces exhibiting drug abuse traitsthrough augmentation. The proposed method, called \"3DG-GA\", Deep De-identifiedanonymous Dataset Generation, uses Genetics Algorithm as a strategy forsynthetic faces generation. The algorithm includes GAN artificial facegeneration, forgery detection, and face recognition. Initially, a dataset of120 images of actual facial drug abuse is used. By preserving, the drug traits,the 3DG-GA provides a dataset containing 3000 synthetic facial drug abuseimages. The dataset will be open to the scientific community, which canreproduce our results and benefit from the generated datasets while avoidinglegal or ethical restrictions.", "output": "Generation of artificial facial drug abuse images using Deep De-identified anonymous Dataset augmentation through Genetics Algorithm (3DG-GA)."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative models such as StyleGAN2 and Stable Diffusion have achievedstate-of-the-art performance in computer vision tasks such as image synthesis,inpainting, and de-noising. However, current generative models for faceinpainting often fail to preserve fine facial details and the identity of theperson, despite creating aesthetically convincing image structures andtextures. In this work, we propose Person Aware Tuning (PAT) of Mask-AwareTransformer (MAT) for face inpainting, which addresses this issue. Our proposedmethod, PATMAT, effectively preserves identity by incorporating referenceimages of a subject and fine-tuning a MAT architecture trained on faces. Byusing ~40 reference images, PATMAT creates anchor points in MAT's style module,and tunes the model using the fixed anchors to adapt the model to a new faceidentity. Moreover, PATMAT's use of multiple images per anchor during trainingallows the model to use fewer reference images than competing methods. Wedemonstrate that PATMAT outperforms state-of-the-art models in terms of imagequality, the preservation of person-specific details, and the identity of thesubject. Our results suggest that PATMAT can be a promising approach forimproving the quality of personalized face inpainting.", "output": "PATMAT: Person Aware Tuning of Mask-Aware Transformer for Face Inpainting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, the joint detection-and-tracking paradigm has been a verypopular way of tackling the multi-object tracking (MOT) task. Many of themethods following this paradigm use the object center keypoint for detection.However, we argue that the center point is not optimal since it is often notvisible in crowded scenarios, which results in many missed detections when theobjects are partially occluded. We propose TopTrack, a jointdetection-and-tracking method that uses the top of the object as a keypoint fordetection instead of the center because it is more often visible. Furthermore,TopTrack processes consecutive frames in separate streams in order tofacilitate training. We performed experiments to show that using the object topas a keypoint for detection can reduce the amount of missed detections, whichin turn leads to more complete trajectories and less lost trajectories.TopTrack manages to achieve competitive results with other state-of-the-arttrackers on two MOT benchmarks.", "output": "TopTrack: Tracking Objects By Their Top."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The short-form videos have explosive popularity and have dominated the newsocial media trends. Prevailing short-video platforms,~textit{e.g.}, Kuaishou(Kwai), TikTok, Instagram Reels, and YouTube Shorts, have changed the way weconsume and create content. For video content creation and understanding, theshot boundary detection (SBD) is one of the most essential components invarious scenarios. In this work, we release a new public Short video sHotbOundary deTection dataset, named SHOT, consisting of 853 complete short videosand 11,606 shot annotations, with 2,716 high quality shot boundary annotationsin 200 test videos. Leveraging this new data wealth, we propose to optimize themodel design for video SBD, by conducting neural architecture search in asearch space encapsulating various advanced 3D ConvNets and Transformers. Ourproposed approach, named AutoShot, achieves higher F1 scores than previousstate-of-the-art approaches, e.g., outperforming TransNetV2 by 4.2%, when beingderived and evaluated on our newly constructed SHOT dataset. Moreover, tovalidate the generalizability of the AutoShot architecture, we directlyevaluate it on another three public datasets: ClipShots, BBC and RAI, and theF1 scores of AutoShot outperform previous state-of-the-art approaches by 1.1%,0.9% and 1.2%, respectively. The SHOT dataset and code can be found in .", "output": "AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the significant progress in face recognition in the past years, theyare often treated as \"black boxes\" and have been criticized for lackingexplainability. It becomes increasingly important to understand thecharacteristics and decisions of deep face recognition systems to make themmore acceptable to the public. Explainable face recognition (XFR) refers to theproblem of interpreting why the recognition model matches a probe face with oneidentity over others. Recent studies have explored use of visual saliency mapsas an explanation, but they often lack a deeper analysis in the context of facerecognition. This paper starts by proposing a rigorous definition ofexplainable face recognition (XFR) which focuses on the decision-making processof the deep recognition model. Following the new definition, a similarity-basedRISE algorithm (S-RISE) is then introduced to produce high-quality visualsaliency maps. Furthermore, an evaluation approach is proposed tosystematically validate the reliability and accuracy of general visualsaliency-based XFR methods.", "output": "Explanation of Face Recognition via Saliency Maps."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "An ego vehicle following a virtual lead vehicle planned route is an essentialcomponent when autonomous and non-autonomous vehicles interact. Yet, there is aquestion about the driver's ability to follow the planned lead vehicle route.Thus, predicting the trajectory of the ego vehicle route given a lead vehicleroute is of interest. We introduce a new dataset, the FollowMe dataset, whichoffers a motion and behavior prediction problem by answering the latterquestion of the driver's ability to follow a lead vehicle. We also introduce adeep spatio-temporal graph model FollowMe-STGCNN as a baseline for the dataset.In our experiments and analysis, we show the design benefits of FollowMe-STGCNNin capturing the interactions that lie within the dataset. We contrast theperformance of FollowMe-STGCNN with prior motion prediction models showing theneed to have a different design mechanism to address the lead vehicle followingsettings.", "output": "FollowMe: Vehicle Behaviour Prediction in Autonomous Vehicle Settings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Detecting digital face manipulation in images and video has attractedextensive attention due to the potential risk to public trust. To counteractthe malicious usage of such techniques, deep learning-based deepfake detectionmethods have been employed and have exhibited remarkable performance. However,the performance of such detectors is often assessed on related benchmarks thathardly reflect real-world situations. For example, the impact of various imageand video processing operations and typical workflow distortions on detectionaccuracy has not been systematically measured. In this paper, a more reliableassessment framework is proposed to evaluate the performance of learning-baseddeepfake detectors in more realistic settings. To the best of ouracknowledgment, it is the first systematic assessment approach for deepfakedetectors that not only reports the general performance under real-worldconditions but also quantitatively measures their robustness toward differentprocessing operations. To demonstrate the effectiveness and usage of theframework, extensive experiments and detailed analysis of three populardeepfake detection methods are further presented in this paper. In addition, astochastic degradation-based data augmentation method driven by realisticprocessing operations is designed, which significantly improves the robustnessof deepfake detectors.", "output": "Assessment Framework for Deepfake Detection in Real-world Situations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Concept bottleneck models (CBM) are a popular way of creating moreinterpretable neural networks by having hidden layer neurons correspond tohuman-understandable concepts. However, existing CBMs and their variants havetwo crucial limitations: first, they need to collect labeled data for each ofthe predefined concepts, which is time consuming and labor intensive; second,the accuracy of a CBM is often significantly lower than that of a standardneural network, especially on more complex datasets. This poor performancecreates a barrier for adopting CBMs in practical real world applications.Motivated by these challenges, we propose Label-free CBM which is a novelframework to transform any neural network into an interpretable CBM withoutlabeled concept data, while retaining a high accuracy. Our Label-free CBM hasmany advantages, it is: scalable - we present the first CBM scaled to ImageNet,efficient - creating a CBM takes only a few hours even for very large datasets,and automated - training it for a new dataset requires minimal human effort.Our code is available at ", "output": "Label-Free Concept Bottleneck Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While deep learning models have become the predominant method for medicalimage segmentation, they are typically not capable of generalizing to unseensegmentation tasks involving new anatomies, image modalities, or labels. Givena new segmentation task, researchers generally have to train or fine-tunemodels, which is time-consuming and poses a substantial barrier for clinicalresearchers, who often lack the resources and expertise to train neuralnetworks. We present UniverSeg, a method for solving unseen medicalsegmentation tasks without additional training. Given a query image and exampleset of image-label pairs that define a new segmentation task, UniverSeg employsa new Cross-Block mechanism to produce accurate segmentation maps without theneed for additional training. To achieve generalization to new tasks, we havegathered and standardized a collection of 53 open-access medical segmentationdatasets with over 22,000 scans, which we refer to as MegaMedical. We used thiscollection to train UniverSeg on a diverse set of anatomies and imagingmodalities. We demonstrate that UniverSeg substantially outperforms severalrelated methods on unseen tasks, and thoroughly analyze and draw insights aboutimportant aspects of the proposed system. The UniverSeg source code and modelweights are freely available at ", "output": "UniverSeg: Universal Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As deep learning models increasingly find applications in critical domainssuch as medical imaging, the need for transparent and trustworthydecision-making becomes paramount. Many explainability methods provide insightsinto how these models make predictions by attributing importance to inputfeatures. As Vision Transformer (ViT) becomes a promising alternative toconvolutional neural networks for image classification, its interpretabilityremains an open research question. This paper investigates the performance ofvarious interpretation methods on a ViT applied to classify chest X-ray images.We introduce the notion of evaluating faithfulness, sensitivity, and complexityof ViT explanations. The obtained results indicate that Layerwise relevancepropagation for transformers outperforms Local interpretable model-agnosticexplanations and Attention visualization, providing a more accurate andreliable representation of what a ViT has actually learned. Our findingsprovide insights into the applicability of ViT explanations in medical imagingand highlight the importance of using appropriate evaluation criteria forcomparing them.", "output": "Towards Evaluating Explanations of Vision Transformers for Medical Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Denoising diffusion probabilistic models (DDPMs) employ a sequence of whiteGaussian noise samples to generate an image. In analogy with GANs, those noisemaps could be considered as the latent code associated with the generatedimage. However, this native noise space does not possess a convenientstructure, and is thus challenging to work with in editing tasks. Here, wepropose an alternative latent noise space for DDPM that enables a wide range ofediting operations via simple means, and present an inversion method forextracting these edit-friendly noise maps for any given image (real orsynthetically generated). As opposed to the native DDPM noise space, theedit-friendly noise maps do not have a standard normal distribution and are notstatistically independent across timesteps. However, they allow perfectreconstruction of any desired image, and simple transformations on themtranslate into meaningful manipulations of the output image (e.g., shifting,color edits). Moreover, in text-conditional models, fixing those noise mapswhile changing the text prompt, modifies semantics while retaining structure.We illustrate how this property enables text-based editing of real images viathe diverse DDPM sampling scheme (in contrast to the popular non-diverse DDIMinversion). We also show how it can be used within existing diffusion-basedediting methods to improve their quality and diversity.", "output": "An Edit Friendly DDPM Noise Space: Inversion and Manipulations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As technology progresses, smart automated systems will serve an increasinglyimportant role in the agricultural industry. Current existing vision systemsfor yield estimation face difficulties in occlusion and scalability as theyutilize a camera system that is large and expensive, which are unsuitable fororchard environments. To overcome these problems, this paper presents a sizemeasurement method combining a machine learning model and depth images capturedfrom three low cost RGBD cameras to detect and measure the height and width oftomatoes. The performance of the presented system is evaluated on a labenvironment with real tomato fruits and fake leaves to simulate occlusion inthe real farm environment. To improve accuracy by addressing fruit occlusion,our three-camera system was able to achieve a height measurement accuracy of0.9114 and a width accuracy of 0.9443.", "output": "Visual based Tomato Size Measurement System for an Indoor Farming Environment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Direct optimization of interpolated features on multi-resolution voxel gridshas emerged as a more efficient alternative to MLP-like modules. However, thisapproach is constrained by higher memory expenses and limited representationcapabilities. In this paper, we introduce a novel dynamic grid optimizationmethod for high-fidelity 3D surface reconstruction that incorporates both RGBand depth observations. Rather than treating each voxel equally, we optimizethe process by dynamically modifying the grid and assigning more finer-scalevoxels to regions with higher complexity, allowing us to capture more intricatedetails. Furthermore, we develop a scheme to quantify the dynamic subdivisionof voxel grid during optimization without requiring any priors. The proposedapproach is able to generate high-quality 3D reconstructions with fine detailson both synthetic and real-world data, while maintaining computationalefficiency, which is substantially faster than the baseline method NeuralRGBD.", "output": "Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Keypoint detection & descriptors are foundational tech-nologies for computervision tasks like image matching, 3D reconstruction and visual odometry.Hand-engineered methods like Harris corners, SIFT, and HOG descriptors havebeen used for decades; more recently, there has been a trend to introducelearning in an attempt to improve keypoint detectors. On inspection however,the results are difficult to interpret; recent learning-based methods employ avast diversity of experimental setups and design choices: empirical results areoften reported using different backbones, protocols, datasets, types ofsupervisions or tasks. Since these differences are often coupled together, itraises a natural question on what makes a good learned keypoint detector. Inthis work, we revisit the design of existing keypoint detectors bydeconstructing their methodologies and identifying the key components. Were-design each component from first-principle and propose Simple LearnedKeypoints (SiLK) that is fully-differentiable, lightweight, and flexible.Despite its simplicity, SiLK advances new state-of-the-art on DetectionRepeatability and Homography Estimation tasks on HPatches and 3D Point-CloudRegistration task on ScanNet, and achieves competitive performance tostate-of-the-art on camera pose estimation in 2022 Image Matching Challenge andScanNet.", "output": "SiLK -- Simple Learned Keypoints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current top-leading solutions for video object segmentation (VOS) typicallyfollow a matching-based regime: for each query frame, the segmentation mask isinferred according to its correspondence to previously processed and the firstannotated frames. They simply exploit the supervisory signals from thegroundtruth masks for learning mask prediction only, without posing anyconstraint on the space-time correspondence matching, which, however, is thefundamental building block of such regime. To alleviate this crucial yetcommonly ignored issue, we devise a correspondence-aware training framework,which boosts matching-based VOS solutions by explicitly encouraging robustcorrespondence matching during network learning. Through comprehensivelyexploring the intrinsic coherence in videos on pixel and object levels, ouralgorithm reinforces the standard, fully supervised training of masksegmentation with label-free, contrastive correspondence learning. Withoutneither requiring extra annotation cost during training, nor causing speeddelay during deployment, nor incurring architectural modification, ouralgorithm provides solid performance gains on four widely used benchmarks,i.e., DAVIS2016&2017, and YouTube-VOS2018&2019, on the top of famousmatching-based VOS solutions.", "output": "Boosting Video Object Segmentation via Space-time Correspondence Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose an embarrassingly simple yet highly effectivezero-shot semantic segmentation (ZS3) method, based on the pre-trainedvision-language model CLIP. First, our study provides a couple of keydiscoveries: (i) the global tokens (a.k.a [CLS] tokens in Transformer) of thetext branch in CLIP provide a powerful representation of semantic informationand (ii) these text-side [CLS] tokens can be regarded as category priors toguide CLIP visual encoder pay more attention on the corresponding region ofinterest. Based on that, we build upon the CLIP model as a backbone which weextend with a One-Way [CLS] token navigation from text to the visual branchthat enables zero-shot dense prediction, dubbed textbf{ClsCLIP}. Specifically,we use the [CLS] token output from the text branch, as an auxiliary semanticprompt, to replace the [CLS] token in shallow layers of the ViT-based visualencoder. This one-way navigation embeds such global category prior earlier andthus promotes semantic segmentation. Furthermore, to better segment tinyobjects in ZS3, we further enhance ClsCLIP with a local zoom-in strategy, whichemploys a region proposal pre-processing and we get ClsCLIP+. Extensiveexperiments demonstrate that our proposed ZS3 method achieves a SOTAperformance, and it is even comparable with those few-shot semanticsegmentation methods.", "output": "[CLS] Token is All You Need for Zero-Shot Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose a novel two-component loss for biomedical imagesegmentation tasks called the Instance-wise and Center-of-Instance (ICI) loss,a loss function that addresses the instance imbalance problem commonlyencountered when using pixel-wise loss functions such as the Dice loss. TheInstance-wise component improves the detection of small instances or ``blobs\"in image datasets with both large and small instances. The Center-of-Instancecomponent improves the overall detection accuracy. We compared the ICI losswith two existing losses, the Dice loss and the blob loss, in the task ofstroke lesion segmentation using the ATLAS R2.0 challenge dataset from MICCAI2022. Compared to the other losses, the ICI loss provided a better balancedsegmentation, and significantly outperformed the Dice loss with an improvementof $1.7-3.7%$ and the blob loss by $0.6-5.0%$ in terms of the Dice similaritycoefficient on both validation and test set, suggesting that the ICI loss is apotential solution to the instance imbalance problem.", "output": "Improving Segmentation of Objects with Varying Sizes in Biomedical Images using Instance-wise and Center-of-Instance Segmentation Loss Function."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Stereo image super-resolution aims to improve the quality of high-resolutionstereo image pairs by exploiting complementary information across views. Toattain superior performance, many methods have prioritized designing complexmodules to fuse similar information across views, yet overlooking theimportance of intra-view information for high-resolution reconstruction. Italso leads to problems of wrong texture in recovered images. To address thisissue, we explore the interdependencies between various hierarchies fromintra-view and propose a novel method, named Cross-View-Hierarchy Network forStereo Image Super-Resolution (CVHSSR). Specifically, we design across-hierarchy information mining block (CHIMB) that leverages channelattention and large kernel convolution attention to extract both global andlocal features from the intra-view, enabling the efficient restoration ofaccurate texture details. Additionally, a cross-view interaction module (CVIM)is proposed to fuse similar features from different views by utilizingcross-view attention mechanisms, effectively adapting to the binocular scene.Extensive experiments demonstrate the effectiveness of our method. CVHSSRachieves the best stereo image super-resolution performance than otherstate-of-the-art methods while using fewer parameters. The source code andpre-trained models are available at ", "output": "Cross-View Hierarchy Network for Stereo Image Super-Resolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural image compression methods have seen increasingly strong performance inrecent years. However, they suffer orders of magnitude higher computationalcomplexity compared to traditional codecs, which stands in the way ofreal-world deployment. This paper takes a step forward in closing this gap indecoding complexity by adopting shallow or even linear decoding transforms. Tocompensate for the resulting drop in compression performance, we exploit theoften asymmetrical computation budget between encoding and decoding, byadopting more powerful encoder networks and iterative encoding. Wetheoretically formalize the intuition behind, and our experimental resultsestablish a new frontier in the trade-off between rate-distortion and decodingcomplexity for neural image compression. Specifically, we achieverate-distortion performance competitive with the established mean-scalehyperprior architecture of Minnen et al. (2018), while reducing the overalldecoding complexity by 80 %, or over 90 % for the synthesis transform alone.Our code can be found at ", "output": "Asymmetrically-powered Neural Image Compression with Shallow Decoders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present ShapeClipper, a novel method that reconstructs 3D object shapesfrom real-world single-view RGB images. Instead of relying on laborious 3D,multi-view or camera pose annotation, ShapeClipper learns shape reconstructionfrom a set of single-view segmented images. The key idea is to facilitate shapelearning via CLIP-based shape consistency, where we encourage objects withsimilar CLIP encodings to share similar shapes. We also leverage off-the-shelfnormals as an additional geometric constraint so the model can learn betterbottom-up reasoning of detailed surface geometry. These two novel consistencyconstraints, when used to regularize our model, improve its ability to learnboth global shape structure and local geometric details. We evaluate our methodover three challenging real-world datasets, Pix3D, Pascal3D+, and OpenImages,where we achieve superior performance over state-of-the-art methods.", "output": "ShapeClipper: Scalable 3D Shape Learning from Single-View Images via Geometric and CLIP-based Consistency."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Set-based face recognition (SFR) aims to recognize the face sets in theunconstrained scenario, where the appearance of same identity may changedramatically with extreme variances (e.g., illumination, pose, expression). Weargue that the two crucial issues in SFR, the face quality and burstiness, areboth identity-irrelevant and variance-relevant. The quality and burstinessassessment are interfered with by the entanglement of identity, and the facerecognition is interfered with by the entanglement of variance. Thus we proposeto separate the identity features with the variance features in alight-weighted set-based disentanglement framework. Beyond disentanglement, thevariance features are fully utilized to indicate face quality and burstiness ina set, rather than being discarded after training. To suppress face burstinessin the sets, we propose a vocabulary-based burst suppression (VBS) method whichquantizes faces with a reference vocabulary. With interword and intra-wordnormalization operations on the assignment scores, the face burtisness degreesare appropriately estimated. The extensive illustrations and experimentsdemonstrate the effect of the disentanglement framework with VBS, which getsnew state-of-the-art on the SFR benchmarks. The code will be released at", "output": "Set-Based Face Recognition Beyond Disentanglement: Burstiness Suppression With Variance Vocabulary."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, Transformers have shown promising performance in various visiontasks. However, the high costs of global self-attention remain challenging forTransformers, especially for high-resolution vision tasks. Local self-attentionruns attention computation within a limited region for the sake of efficiency,resulting in insufficient context modeling as their receptive fields are small.In this work, we introduce two new attention modules to enhance the globalmodeling capability of the hierarchical vision transformer, namely, randomsampling windows (RS-Win) and important region windows (IR-Win). Specifically,RS-Win sample random image patches to compose the window, following a uniformdistribution, i.e., the patches in RS-Win can come from any position in theimage. IR-Win composes the window according to the weights of the image patchesin the attention map. Notably, RS-Win is able to capture global informationthroughout the entire model, even in earlier, high-resolution stages. IR-Winenables the self-attention module to focus on important regions of the imageand capture more informative features. Incorporated with these designs,RSIR-Win Transformer demonstrates competitive performance on common visiontasks.", "output": "RSIR Transformer: Hierarchical Vision Transformer using Random Sampling Windows and Important Region Windows."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Exemplar-based image colorization aims to colorize a target grayscale imagebased on a color reference image, and the key is to establish accuratepixel-level semantic correspondence between these two images. Previous methodssearch for correspondence across the entire reference image, and this type ofglobal matching is easy to get mismatch. We summarize the difficulties in twoaspects: (1) When the reference image only contains a part of objects relatedto target image, improper correspondence will be established in unrelatedregions. (2) It is prone to get mismatch in regions where the shape or textureof the object is easily confused. To overcome these issues, we propose SPColor,a semantic prior guided exemplar-based image colorization framework. Differentfrom previous methods, SPColor first coarsely classifies pixels of thereference and target images to several pseudo-classes under the guidance ofsemantic prior, then the correspondences are only established locally betweenthe pixels in the same class via the newly designed semantic prior guidedcorrespondence network. In this way, improper correspondence between differentsemantic classes is explicitly excluded, and the mismatch is obviouslyalleviated. Besides, to better reserve the color from reference, a similaritymasked perceptual loss is designed. Noting that the carefully designed SPColorutilizes the semantic prior provided by an unsupervised segmentation model,which is free for additional manual semantic annotations. Experimentsdemonstrate that our model outperforms recent state-of-the-art methods bothquantitatively and qualitatively on public dataset.", "output": "SPColor: Semantic Prior Guided Exemplar-based Image Colorization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent applications of deep convolutional neural networks in medical imagingraise concerns about their interpretability. While most explainable deeplearning applications use post hoc methods (such as GradCAM) to generatefeature attribution maps, there is a new type of case-based reasoning models,namely ProtoPNet and its variants, which identify prototypes during trainingand compare input image patches with those prototypes. We propose the firstmedical prototype network (MProtoNet) to extend ProtoPNet to brain tumorclassification with 3D multi-parametric magnetic resonance imaging (mpMRI)data. To address different requirements between 2D natural images and 3D mpMRIsespecially in terms of localizing attention regions, a new attention modulewith soft masking and online-CAM loss is introduced. Soft masking helps sharpenattention maps, while online-CAM loss directly utilizes image-level labels whentraining the attention module. MProtoNet achieves statistically significantimprovements in interpretability metrics of both correctness and localizationcoherence (with a best activation precision of $0.713pm0.058$) withouthuman-annotated labels during training, when compared with GradCAM and severalProtoPNet variants. The source code is available at", "output": "MProtoNet: A Case-Based Interpretable Model for Brain Tumor Classification with 3D Multi-parametric Magnetic Resonance Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Manipulatives used in the right way help improve mathematical conceptsleading to better learning outcomes. In this paper, we present a phygital(physical + digital) curriculum inspired teaching system for kids aged 5-8 tolearn geometry using shape tile manipulatives. Combining smaller shapes to formlarger ones is an important skill kids learn early on which requires shapetiles to be placed close to each other in the play area. This introduces achallenge of oriented object detection for densely packed objects witharbitrary orientations. Leveraging simulated data for neural network trainingand light-weight mobile architectures, we enable our system to understand userinteractions and provide real-time audiovisual feedback. Experimental resultsshow that our network runs real-time with high precision/recall on consumerdevices, thereby providing a consistent and enjoyable learning experience.", "output": "Gamifying Math Education using Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformer-based image denoising methods have achieved encouraging resultsin the past year. However, it must uses linear operations to model long-rangedependencies, which greatly increases model inference time and consumes GPUstorage space. Compared with convolutional neural network-based methods,current Transformer-based image denoising methods cannot achieve a balancebetween performance improvement and resource consumption. In this paper, wepropose an Efficient Wavelet Transformer (EWT) for image denoising.Specifically, we use Discrete Wavelet Transform (DWT) and Inverse WaveletTransform (IWT) for downsampling and upsampling, respectively. This method canfully preserve the image features while reducing the image resolution, therebygreatly reducing the device resource consumption of the Transformer model.Furthermore, we propose a novel Dual-stream Feature Extraction Block (DFEB) toextract image features at different levels, which can further reduce modelinference time and GPU memory usage. Experiments show that our method speeds upthe original Transformer by more than 80%, reduces GPU memory usage by morethan 60%, and achieves excellent denoising results. All code will be public.", "output": "EWT: Efficient Wavelet-Transformer for Single Image Denoising."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the success of multimodal learning in cross-modal retrieval task, theremarkable progress relies on the correct correspondence among multimedia data.However, collecting such ideal data is expensive and time-consuming. Inpractice, most widely used datasets are harvested from the Internet andinevitably contain mismatched pairs. Training on such noisy correspondencedatasets causes performance degradation because the cross-modal retrievalmethods can wrongly enforce the mismatched data to be similar. To tackle thisproblem, we propose a Meta Similarity Correction Network (MSCN) to providereliable similarity scores. We view a binary classification task as themeta-process that encourages the MSCN to learn discrimination from positive andnegative meta-data. To further alleviate the influence of noise, we design aneffective data purification strategy using meta-data as prior knowledge toremove the noisy samples. Extensive experiments are conducted to demonstratethe strengths of our method in both synthetic and real-world noises, includingFlickr30K, MS-COCO, and Conceptual Captions.", "output": "Noisy Correspondence Learning with Meta Similarity Correction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Improving performance in multiple domains is a challenging task, and oftenrequires significant amounts of data to train and test models. Active learningtechniques provide a promising solution by enabling models to select the mostinformative samples for labeling, thus reducing the amount of labeled datarequired to achieve high performance. In this paper, we present an activelearning-based framework for improving performance across multiple domains. Ourapproach consists of two stages: first, we use an initial set of labeled datato train a base model, and then we iteratively select the most informativesamples for labeling to refine the model. We evaluate our approach on severalmulti-domain datasets, including image classification, sentiment analysis, andobject recognition. Our experiments demonstrate that our approach consistentlyoutperforms baseline methods and achieves state-of-the-art performance onseveral datasets. We also show that our method is highly efficient, requiringsignificantly fewer labeled samples than other active learning-based methods.Overall, our approach provides a practical and effective solution for improvingperformance across multiple domains using active learning techniques.", "output": "Optimizing Multi-Domain Performance with Active Learning-based Improvement Strategies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automated interpretation of electrocardiograms (ECG) has garnered significantattention with the advancements in machine learning methodologies. Despite thegrowing interest in automated ECG interpretation using machine learning, mostcurrent studies focus solely on classification or regression tasks and overlooka crucial aspect of clinical cardio-disease diagnosis: the diagnostic reportgenerated by experienced human clinicians. In this paper, we introduce a novelapproach to ECG interpretation, leveraging recent breakthroughs in LargeLanguage Models (LLMs) and Vision-Transformer (ViT) models. Rather thantreating ECG diagnosis as a classification or regression task, we propose analternative method of automatically identifying the most similar clinical casesbased on the input ECG data. Also, since interpreting ECG as images are moreaffordable and accessible, we process ECG as encoded images and adopt avision-language learning paradigm to jointly learn vision-language alignmentbetween encoded ECG images and ECG diagnosis reports. Encoding ECG into imagescan result in an efficient ECG retrieval system, which will be highly practicaland useful in clinical applications. More importantly, our findings could serveas a crucial resource for providing diagnostic services in regions where onlypaper-printed ECG images are accessible due to past underdevelopment.", "output": "Converting ECG Signals to Images for Efficient Image-text Retrieval via Encoding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present NeRFVS, a novel neural radiance fields (NeRF) based method toenable free navigation in a room. NeRF achieves impressive performance inrendering images for novel views similar to the input views while suffering fornovel views that are significantly different from the training views. Toaddress this issue, we utilize the holistic priors, including pseudo depth mapsand view coverage information, from neural reconstruction to guide the learningof implicit neural representations of 3D indoor scenes. Concretely, anoff-the-shelf neural reconstruction method is leveraged to generate a geometryscaffold. Then, two loss functions based on the holistic priors are proposed toimprove the learning of NeRF: 1) A robust depth loss that can tolerate theerror of the pseudo depth map to guide the geometry learning of NeRF; 2) Avariance loss to regularize the variance of implicit neural representations toreduce the geometry and color ambiguity in the learning procedure. These twoloss functions are modulated during NeRF optimization according to the viewcoverage information to reduce the negative influence brought by the viewcoverage imbalance. Extensive results demonstrate that our NeRFVS outperformsstate-of-the-art view synthesis methods quantitatively and qualitatively onindoor scenes, achieving high-fidelity free navigation results.", "output": "NeRFVS: Neural Radiance Fields for Free View Synthesis via Geometry Scaffolds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a novel Text-to-Image Generation Network, Adaptive LayoutRefinement Generative Adversarial Network (ALR-GAN), to adaptively refine thelayout of synthesized images without any auxiliary information. The ALR-GANincludes an Adaptive Layout Refinement (ALR) module and a Layout VisualRefinement (LVR) loss. The ALR module aligns the layout structure (which refersto locations of objects and background) of a synthesized image with that of itscorresponding real image. In ALR module, we proposed an Adaptive LayoutRefinement (ALR) loss to balance the matching of hard and easy features, formore efficient layout structure matching. Based on the refined layoutstructure, the LVR loss further refines the visual representation within thelayout area. Experimental results on two widely-used datasets show that ALR-GANperforms competitively at the Text-to-Image generation task.", "output": "ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes a novel module called middle spectrum grouped convolution(MSGC) for efficient deep convolutional neural networks (DCNNs) with themechanism of grouped convolution. It explores the broad \"middle spectrum\" areabetween channel pruning and conventional grouped convolution. Compared withchannel pruning, MSGC can retain most of the information from the input featuremaps due to the group mechanism; compared with grouped convolution, MSGCbenefits from the learnability, the core of channel pruning, for constructingits group topology, leading to better channel division. The middle spectrumarea is unfolded along four dimensions: group-wise, layer-wise, sample-wise,and attention-wise, making it possible to reveal more powerful andinterpretable structures. As a result, the proposed module acts as a boosterthat can reduce the computational cost of the host backbones for general imagerecognition with even improved predictive accuracy. For example, in theexperiments on ImageNet dataset for image classification, MSGC can reduce themultiply-accumulates (MACs) of ResNet-18 and ResNet-50 by half but stillincrease the Top-1 accuracy by more than 1%. With 35% reduction of MACs, MSGCcan also increase the Top-1 accuracy of the MobileNetV2 backbone. Results on MSCOCO dataset for object detection show similar observations. Our code andtrained models are available at ", "output": "Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large-scale pre-training has brought unimodal fields such as computer visionand natural language processing to a new era. Following this trend, the size ofmulti-modal learning models constantly increases, leading to an urgent need toreduce the massive computational cost of finetuning these models for downstreamtasks. In this paper, we propose an efficient and flexible multimodal fusionmethod, namely PMF, tailored for fusing unimodally pre-trained transformers.Specifically, we first present a modular multimodal fusion framework thatexhibits high flexibility and facilitates mutual interactions among differentmodalities. In addition, we disentangle vanilla prompts into three types inorder to learn different optimizing objectives for multimodal learning. It isalso worth noting that we propose to add prompt vectors only on the deep layersof the unimodal transformers, thus significantly reducing the training memoryusage. Experiment results show that our proposed method achieves comparableperformance to several other multimodal finetuning methods with less than 3%trainable parameters and up to 66% saving of training memory usage.", "output": "Efficient Multimodal Fusion via Interactive Prompting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Few-shot learning (FSL) via customization of a deep learning network withlimited data has emerged as a promising technique to achieve personalized userexperiences on edge devices. However, existing FSL methods primarily assumeindependent and identically distributed (IID) data and utilize eithercomputational backpropagation updates for each task or a common model withtask-specific prototypes. Unfortunately, the former solution is infeasible foredge devices that lack on-device backpropagation capabilities, while the latteroften struggles with limited generalization ability, especially forout-of-distribution (OOD) data. This paper proposes a lightweight,plug-and-play FSL module called Task-aware Normalization (TANO) that enablesefficient and task-aware adaptation of a deep neural network withoutbackpropagation. TANO covers the properties of multiple user groups bycoordinating the updates of several groups of the normalization statisticsduring meta-training and automatically identifies the appropriate normalizationgroup for a downstream few-shot task. Consequently, TANO provides stable buttask-specific estimations of the normalization statistics to close thedistribution gaps and achieve efficient model adaptation. Results on bothintra-domain and out-of-domain generalization experiments demonstrate that TANOoutperforms recent methods in terms of accuracy, inference speed, and modelsize. Moreover, TANO achieves promising results on widely-used FSL benchmarksand data from real applications.", "output": "Out-of-distribution Few-shot Learning For Edge Devices without Model Fine-tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Autonomous vehicles rely on a variety of sensors to gather information abouttheir surrounding. The vehicle's behavior is planned based on the environmentperception, making its reliability crucial for safety reasons. The active LiDARsensor is able to create an accurate 3D representation of a scene, making it avaluable addition for environment perception for autonomous vehicles. Due tolight scattering and occlusion, the LiDAR's performance change under adverseweather conditions like fog, snow or rain. This limitation recently fostered alarge body of research on approaches to alleviate the decrease in perceptionperformance. In this survey, we gathered, analyzed, and discussed differentaspects on dealing with adverse weather conditions in LiDAR-based environmentperception. We address topics such as the availability of appropriate data, rawpoint cloud processing and denoising, robust perception algorithms and sensorfusion to mitigate adverse weather induced shortcomings. We furthermoreidentify the most pressing gaps in the current literature and pinpointpromising research directions.", "output": "Survey on LiDAR Perception in Adverse Weather Conditions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Light fields are a type of image data that capture both spatial and angularscene information by recording light rays emitted by a scene from differentorientations. In this context, spatial information is defined as features thatremain static regardless of perspectives, while angular information refers tofeatures that vary between viewpoints. We propose a novel neural network that,by design, can separate angular and spatial information of a light field. Thenetwork represents spatial information using spatial kernels shared among allSub-Aperture Images (SAIs), and angular information using sets of angularkernels for each SAI. To further improve the representation capability of thenetwork without increasing parameter number, we also introduce angular kernelallocation and kernel tensor decomposition mechanisms. Extensive experimentsdemonstrate the benefits of information separation: when applied to thecompression task, our network outperforms other state-of-the-art methods by alarge margin. And angular information can be easily transferred to other scenesfor rendering dense views, showing the successful separation and the potentialuse case for the view synthesis task. We plan to release the code uponacceptance of the paper to encourage further research on this topic.", "output": "Learning-based Spatial and Angular Information Separation for Light Field Compression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Monocular depth estimation is fundamental for 3D scene understanding anddownstream applications. However, even under the supervised setup, it is stillchallenging and ill-posed due to the lack of full geometric constraints.Although a scene can consist of millions of pixels, there are fewer high-levelpatterns. We propose iDisc to learn those patterns with internal discretizedrepresentations. The method implicitly partitions the scene into a set ofhigh-level patterns. In particular, our new module, Internal Discretization(ID), implements a continuous-discrete-continuous bottleneck to learn thoseconcepts without supervision. In contrast to state-of-the-art methods, theproposed model does not enforce any explicit constraints or priors on the depthoutput. The whole network with the ID module can be trained end-to-end, thanksto the bottleneck module based on attention. Our method sets the new state ofthe art with significant improvements on NYU-Depth v2 and KITTI, outperformingall published methods on the official KITTI benchmark. iDisc can also achievestate-of-the-art results on surface normal estimation. Further, we explore themodel generalization capability via zero-shot testing. We observe thecompelling need to promote diversification in the outdoor scenario. Hence, weintroduce splits of two autonomous driving datasets, DDAD and Argoverse. Codeis available at .", "output": "iDisc: Internal Discretization for Monocular Depth Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The dominant majority of 3D models that appear in gaming, VR/AR, and those weuse to train geometric deep learning algorithms are incomplete, since they aremodeled as surface meshes and missing their interior structures. We present alearning framework to recover the shape interiors (RoSI) of existing 3D modelswith only their exteriors from multi-view and multi-articulation images. Givena set of RGB images that capture a target 3D object in different articulatedposes, possibly from only few views, our method infers the interior planes thatare observable in the input images. Our neural architecture is trained in acategory-agnostic manner and it consists of a motion-aware multi-view analysisphase including pose, depth, and motion estimations, followed by interior planedetection in images and 3D space, and finally multi-view plane fusion. Inaddition, our method also predicts part articulations and is able to realizeand even extrapolate the captured motions on the target 3D object. We evaluateour method by quantitative and qualitative comparisons to baselines andalternative solutions, as well as testing on untrained object categories andreal image inputs to assess its generalization capabilities.", "output": "RoSI: Recovering 3D Shape Interiors from Few Articulation Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The structural re-parameterization (SRP) technique is a novel deep learningtechnique that achieves interconversion between different network architecturesthrough equivalent parameter transformations. This technique enables themitigation of the extra costs for performance improvement during training, suchas parameter size and inference time, through these transformations duringinference, and therefore SRP has great potential for industrial and practicalapplications. The existing SRP methods have successfully considered manycommonly used architectures, such as normalizations, pooling methods,multi-branch convolution. However, the widely used self-attention modulescannot be directly implemented by SRP due to these modules usually act on thebackbone network in a multiplicative manner and the modules' output isinput-dependent during inference, which limits the application scenarios ofSRP. In this paper, we conduct extensive experiments from a statisticalperspective and discover an interesting phenomenon Stripe Observation, whichreveals that channel attention values quickly approach some constant vectorsduring training. This observation inspires us to propose a simple-yet-effectiveattention-alike structural re-parameterization (ASR) that allows us to achieveSRP for a given network while enjoying the effectiveness of the self-attentionmechanism. Extensive experiments conducted on several standard benchmarksdemonstrate the effectiveness of ASR in generally improving the performance ofexisting backbone networks, self-attention modules, and SRP methods without anyelaborated model crafting. We also analyze the limitations and provideexperimental or theoretical evidence for the strong robustness of the proposedASR.", "output": "ASR: Attention-alike Structural Re-parameterization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformer is beneficial for image denoising tasks since it can modellong-range dependencies to overcome the limitations presented by inductiveconvolutional biases. However, directly applying the transformer structure toremove noise is challenging because its complexity grows quadratically with thespatial resolution. In this paper, we propose an efficient Dual-branchDeformable Transformer (DDT) denoising network which captures both local andglobal interactions in parallel. We divide features with a fixed patch size anda fixed number of patches in local and global branches, respectively. Inaddition, we apply deformable attention operation in both branches, which helpsthe network focus on more important regions and further reduces computationalcomplexity. We conduct extensive experiments on real-world and syntheticdenoising tasks, and the proposed DDT achieves state-of-the-art performancewith significantly fewer computational costs.", "output": "DDT: Dual-branch Deformable Transformer for Image Denoising."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, event cameras have shown large applicability in several computervision fields especially concerning tasks that require high temporalresolution. In this work, we investigate the usage of such kind of data foremotion recognition by presenting NEFER, a dataset for Neuromorphic Event-basedFacial Expression Recognition. NEFER is composed of paired RGB and event videosrepresenting human faces labeled with the respective emotions and alsoannotated with face bounding boxes and facial landmarks. We detail the dataacquisition process as well as providing a baseline method for RGB and eventdata. The collected data captures subtle micro-expressions, which are hard tospot with RGB data, yet emerge in the event domain. We report a doublerecognition accuracy for the event-based approach, proving the effectiveness ofa neuromorphic approach for analyzing fast and hardly detectable expressionsand the emotions they conceal.", "output": "Neuromorphic Event-based Facial Expression Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose the gradient-weighted Object Detector Activation Maps (ODAM), avisualized explanation technique for interpreting the predictions of objectdetectors. Utilizing the gradients of detector targets flowing into theintermediate feature maps, ODAM produces heat maps that show the influence ofregions on the detector's decision for each predicted attribute. Compared toprevious works classification activation maps (CAM), ODAM generatesinstance-specific explanations rather than class-specific ones. We show thatODAM is applicable to both one-stage detectors and two-stage detectors withdifferent types of detector backbones and heads, and produces higher-qualityvisual explanations than the state-of-the-art both effectively and efficiently.We next propose a training scheme, Odam-Train, to improve the explanationability on object discrimination of the detector through encouragingconsistency between explanations for detections on the same object, anddistinct explanations for detections on different objects. Based on the heatmaps produced by ODAM with Odam-Train, we propose Odam-NMS, which considers theinformation of the model's explanation for each prediction to distinguish theduplicate detected objects. We present a detailed analysis of the visualizedexplanations of detectors and carry out extensive experiments to validate theeffectiveness of the proposed ODAM.", "output": "ODAM: Gradient-based instance-specific visual explanations for object detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning the hash representation of multi-view heterogeneous data is animportant task in multimedia retrieval. However, existing methods fail toeffectively fuse the multi-view features and utilize the metric informationprovided by the dissimilar samples, leading to limited retrieval precision.Current methods utilize weighted sum or concatenation to fuse the multi-viewfeatures. We argue that these fusion methods cannot capture the interactionamong different views. Furthermore, these methods ignored the informationprovided by the dissimilar samples. We propose a novel deep metric multi-viewhashing (DMMVH) method to address the mentioned problems. Extensive empiricalevidence is presented to show that gate-based fusion is better than typicalmethods. We introduce deep metric learning to the multi-view hashing problems,which can utilize metric information of dissimilar samples. On theMIR-Flickr25K, MS COCO, and NUS-WIDE, our method outperforms the currentstate-of-the-art methods by a large margin (up to 15.28 mean Average Precision(mAP) improvement).", "output": "Deep Metric Multi-View Hashing for Multimedia Retrieval."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Driver Monitoring Systems (DMSs) are crucial for safe hand-over actions inLevel-2+ self-driving vehicles. State-of-the-art DMSs leverage multiple sensorsmounted at different locations to monitor the driver and the vehicle's interiorscene and employ decision-level fusion to integrate these heterogenous data.However, this fusion method may not fully utilize the complementarity ofdifferent data sources and may overlook their relative importance. To addressthese limitations, we propose a novel multiview multimodal driver monitoringsystem based on feature-level fusion through multi-head self-attention (MHSA).We demonstrate its effectiveness by comparing it against four alternativefusion strategies (Sum, Conv, SE, and AFF). We also present a novelGPU-friendly supervised contrastive learning framework SuMoCo to learn betterrepresentations. Furthermore, We fine-grained the test split of the DAD datasetto enable the multi-class recognition of drivers' activities. Experiments onthis enhanced database demonstrate that 1) the proposed MHSA-based fusionmethod (AUC-ROC: 97.0%) outperforms all baselines and previous approaches, and2) training MHSA with patch masking can improve its robustness againstmodality/view collapses. The code and annotations are publicly available.", "output": "Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The advances in automatic sign language translation (SLT) to spoken languageshave been mostly benchmarked with datasets of limited size and restricteddomains. Our work advances the state of the art by providing the first baselineresults on How2Sign, a large and broad dataset.We train a Transformer over I3D video features, using the reduced BLEU as areference metric for validation, instead of the widely used BLEU score. Wereport a result of 8.03 on the BLEU score, and publish the first open-sourceimplementation of its kind to promote further advances.", "output": "Sign Language Translation from Instructional Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce Flatlandia, a novel problem for visual localization of an imagefrom object detections composed of two specific tasks: i) Coarse MapLocalization: localizing a single image observing a set of objects in respectto a 2D map of object landmarks; ii) Fine-grained 3DoF Localization: estimatinglatitude, longitude, and orientation of the image within a 2D map. Solutionsfor these new tasks exploit the wide availability of open urban maps annotatedwith GPS locations of common objects (eg via surveying or crowd-sourced). Suchmaps are also more storage-friendly than standard large-scale 3D models oftenused in visual localization while additionally being privacy-preserving. Asexisting datasets are unsuited for the proposed problem, we provide theFlatlandia dataset, designed for 3DoF visual localization in multiple urbansettings and based on crowd-sourced data from five European cities. We use theFlatlandia dataset to validate the complexity of the proposed tasks.", "output": "You are here! Finding position and orientation on a 2D map from a single image: The Flatlandia localization problem and dataset."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is well known that a band-limited signal can be reconstructed from itsuniformly spaced samples if the sampling rate is sufficiently high. Morerecently, it has been proved that one can reconstruct a 1D band-limited signaleven if the exact sample locations are unknown, but given just the distributionof the sample locations and their ordering in 1D. In this work, we extend theanalytical bounds on the reconstruction error in such scenarios forquasi-bandlimited signals. We also prove that the method for such areconstruction is resilient to a certain proportion of errors in thespecification of the sample location ordering. We then express the problem oftomographic reconstruction of 2D images from 1D Radon projections under unknownangles with known angle distribution, as a special case for reconstruction ofquasi-bandlimited signals from samples at unknown locations with knowndistribution. Building upon our theoretical background, we present asymptoticbounds for 2D quasi-bandlimited image reconstruction from 1D Radon projectionsin the unknown angles setting, which commonly occurs in cryo-electronmicroscopy (cryo-EM). To the best of our knowledge, this is the first piece ofwork to perform such an analysis for 2D cryo-EM, even though the associatedreconstruction algorithms have been known for a long time.", "output": "Analysis of Tomographic Reconstruction of 2D Images using the Distribution of Unknown Projection Angles."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Magnetic Resonance (MR) images suffer from various types of artifacts due tomotion, spatial resolution, and under-sampling. Conventional deep learningmethods deal with removing a specific type of artifact, leading to separatelytrained models for each artifact type that lack the shared knowledgegeneralizable across artifacts. Moreover, training a model for each type andamount of artifact is a tedious process that consumes more training time andstorage of models. On the other hand, the shared knowledge learned by jointlytraining the model on multiple artifacts might be inadequate to generalizeunder deviations in the types and amounts of artifacts. Model-agnosticmeta-learning (MAML), a nested bi-level optimization framework is a promisingtechnique to learn common knowledge across artifacts in the outer level ofoptimization, and artifact-specific restoration in the inner level. We proposecurriculum-MAML (CMAML), a learning process that integrates MAML withcurriculum learning to impart the knowledge of variable artifact complexity toadaptively learn restoration of multiple artifacts during training. Comparativestudies against Stochastic Gradient Descent and MAML, using two cardiacdatasets reveal that CMAML exhibits (i) better generalization with improvedPSNR for 83% of unseen types and amounts of artifacts and improved SSIM in allcases, and (ii) better artifact suppression in 4 out of 5 cases of compositeartifacts (scans with multiple artifacts).", "output": "Generalizable Deep Learning Method for Suppressing Unseen and Multiple MRI Artifacts Using Meta-learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper explores a hierarchical prompting mechanism for the hierarchicalimage classification (HIC) task. Different from prior HIC methods, ourhierarchical prompting is the first to explicitly inject ancestor-classinformation as a tokenized hint that benefits the descendant-classdiscrimination. We think it well imitates human visual recognition, i.e.,humans may use the ancestor class as a prompt to draw focus on the subtledifferences among descendant classes. We model this prompting mechanism into aTransformer with Hierarchical Prompting (TransHP). TransHP consists of threesteps: 1) learning a set of prompt tokens to represent the coarse (ancestor)classes, 2) on-the-fly predicting the coarse class of the input image at anintermediate block, and 3) injecting the prompt token of the predicted coarseclass into the intermediate feature. Though the parameters of TransHP maintainthe same for all input images, the injected coarse-class prompt conditions(modifies) the subsequent feature extraction and encourages a dynamic focus onrelatively subtle differences among the descendant classes. Extensiveexperiments show that TransHP improves image classification on accuracy (e.g.,improving ViT-B/16 by +2.83% ImageNet classification accuracy), training dataefficiency (e.g., +12.69% improvement under 10% ImageNet training data), andmodel explainability. Moreover, TransHP also performs favorably against priorHIC methods, showing that TransHP well exploits the hierarchical information.", "output": "TransHP: Image Classification with Hierarchical Prompting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The lack of interpretability of the Vision Transformer may hinder its use incritical real-world applications despite its effectiveness. To overcome thisissue, we propose a post-hoc interpretability method called VISION DIFFMASK,which uses the activations of the model's hidden layers to predict the relevantparts of the input that contribute to its final predictions. Our approach usesa gating mechanism to identify the minimal subset of the original input thatpreserves the predicted distribution over classes. We demonstrate thefaithfulness of our method, by introducing a faithfulness task, and comparingit to other state-of-the-art attribution methods on CIFAR-10 and ImageNet-1K,achieving compelling results. To aid reproducibility and further extension ofour work, we open source our implementation:", "output": "VISION DIFFMASK: Faithful Interpretation of Vision Transformers with Differentiable Patch Masking."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose an ultrafast automated model compression frameworkcalled SeerNet for flexible network deployment. Conventionalnon-differen-tiable methods discretely search the desirable compression policybased on the accuracy from exhaustively trained lightweight models, andexisting differentiable methods optimize an extremely large supernet to obtainthe required compressed model for deployment. They both cause heavycomputational cost due to the complex compression policy search and evaluationprocess. On the contrary, we obtain the optimal efficient networks by directlyoptimizing the compression policy with an accurate performance predictor, wherethe ultrafast automated model compression for various computational costconstraint is achieved without complex compression policy search andevaluation. Specifically, we first train the performance predictor based on theaccuracy from uncertain compression policies actively selected by efficientevolutionary search, so that informative supervision is provided to learn theaccurate performance predictor with acceptable cost. Then we leverage thegradient that maximizes the predicted performance under the barrier complexityconstraint for ultrafast acquisition of the desirable compression policy, whereadaptive update stepsizes with momentum are employed to enhance optimality ofthe acquired pruning and quantization strategy. Compared with thestate-of-the-art automated model compression methods, experimental results onimage classification and object detection show that our method achievescompetitive accuracy-complexity trade-offs with significant reduction of thesearch cost.", "output": "Learning Accurate Performance Predictors for Ultrafast Automated Model Compression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "More information leads to better decisions and predictions, right? Confirmingthis hypothesis, several studies concluded that the simultaneous use of opticaland thermal images leads to better predictions in crowd counting. However, theway multimodal models extract enriched features from both modalities is not yetfully understood. Since the use of multimodal data usually increases thecomplexity, inference time, and memory requirements of the models, it isrelevant to examine the differences and advantages of multimodal compared tomonomodal models. In this work, all available multimodal datasets for crowdcounting are used to investigate the differences between monomodal andmultimodal models. To do so, we designed a monomodal architecture thatconsiders the current state of research on monomodal crowd counting. Inaddition, several multimodal architectures have been developed using differentmultimodal learning strategies. The key components of the monomodalarchitecture are also used in the multimodal architectures to be able to answerwhether multimodal models perform better in crowd counting in general.Surprisingly, no general answer to this question can be derived from theexisting datasets. We found that the existing datasets hold a bias towardthermal images. This was determined by analyzing the relationship between thebrightness of optical images and crowd count as well as examining theannotations made for each dataset. Since answering this question is importantfor future real-world applications of crowd counting, this paper establishescriteria for a potential dataset suitable for answering whether multimodalmodels perform better in crowd counting in general.", "output": "Why Existing Multimodal Crowd Counting Datasets Can Lead to Unfulfilled Expectations in Real-World Applications."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose a novel fully unsupervised framework that learnsaction representations suitable for the action segmentation task from thesingle input video itself, without requiring any training data. Our method is adeep metric learning approach rooted in a shallow network with a triplet lossoperating on similarity distributions and a novel triplet selection strategythat effectively models temporal and semantic priors to discover actions in thenew representational space. Under these circumstances, we successfully recovertemporal boundaries in the learned action representations with higher qualitycompared with existing unsupervised approaches. The proposed method isevaluated on two widely used benchmark datasets for the action segmentationtask and it achieves competitive performance by applying a generic clusteringalgorithm on the learned representations.", "output": "Leveraging triplet loss for unsupervised action segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Detecting fake images is becoming a major goal of computer vision. This needis becoming more and more pressing with the continuous improvement of synthesismethods based on Generative Adversarial Networks (GAN), and even more with theappearance of powerful methods based on Diffusion Models (DM). Towards thisend, it is important to gain insight into which image features betterdiscriminate fake images from real ones. In this paper we report on oursystematic study of a large number of image generators of different families,aimed at discovering the most forensically relevant characteristics of real andgenerated images. Our experiments provide a number of interesting observationsand shed light on some intriguing properties of synthetic images: (1) not onlythe GAN models but also the DM and VQ-GAN (Vector Quantized GenerativeAdversarial Networks) models give rise to visible artifacts in the Fourierdomain and exhibit anomalous regular patterns in the autocorrelation; (2) whenthe dataset used to train the model lacks sufficient variety, its biases can betransferred to the generated images; (3) synthetic and real images exhibitsignificant differences in the mid-high frequency signal content, observable intheir radial and angular spectral power distributions.", "output": "Intriguing properties of synthetic images: from generative adversarial networks to diffusion models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Predicting high-fidelity future human poses, from a historically observedsequence, is decisive for intelligent robots to interact with humans. Deepend-to-end learning approaches, which typically train a generic pre-trainedmodel on external datasets and then directly apply it to all test samples,emerge as the dominant solution to solve this issue. Despite encouragingprogress, they remain non-optimal, as the unique properties (e.g., motionstyle, rhythm) of a specific sequence cannot be adapted. More generally, attest-time, once encountering unseen motion categories (out-of-distribution),the predicted poses tend to be unreliable. Motivated by this observation, wepropose a novel test-time adaptation framework that leverages twoself-supervised auxiliary tasks to help the primary forecasting network adaptto the test sequence. In the testing phase, our model can adjust the modelparameters by several gradient updates to improve the generation quality.However, due to catastrophic forgetting, both auxiliary tasks typically tend tothe low ability to automatically present the desired positive incentives forthe final prediction performance. For this reason, we also propose ameta-auxiliary learning scheme for better adaptation. In terms of generalsetup, our approach obtains higher accuracy, and under two new experimentaldesigns for out-of-distribution data (unseen subjects and categories), achievessignificant improvements.", "output": "Meta-Auxiliary Learning for Adaptive Human Pose Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most model-free visual object tracking methods formulate the tracking task asobject location estimation given by a 2D segmentation or a bounding box in eachvideo frame. We argue that this representation is limited and instead proposeto guide and improve 2D tracking with an explicit object representation, namelythe textured 3D shape and 6DoF pose in each video frame. Our representationtackles a complex long-term dense correspondence problem between all 3D pointson the object for all video frames, including frames where some points areinvisible. To achieve that, the estimation is driven by re-rendering the inputvideo frames as well as possible through differentiable rendering, which hasnot been used for tracking before. The proposed optimization minimizes a novelloss function to estimate the best 3D shape, texture, and 6DoF pose. We improvethe state-of-the-art in 2D segmentation tracking on three different datasetswith mostly rigid objects.", "output": "Tracking by 3D Model Estimation of Unknown Objects in Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Certified defense methods against adversarial perturbations have beenrecently investigated in the black-box setting with a zeroth-order (ZO)perspective. However, these methods suffer from high model variance with lowperformance on high-dimensional datasets due to the ineffective design of thedenoiser and are limited in their utilization of ZO techniques. To this end, wepropose a certified ZO preprocessing technique for removing adversarialperturbations from the attacked image in the black-box setting using only modelqueries. We propose a robust UNet denoiser (RDUNet) that ensures the robustnessof black-box models trained on high-dimensional datasets. We propose a novelblack-box denoised smoothing (DS) defense mechanism, ZO-RUDS, by prepending ourRDUNet to the black-box model, ensuring black-box defense. We further proposeZO-AE-RUDS in which RDUNet followed by autoencoder (AE) is prepended to theblack-box model. We perform extensive experiments on four classificationdatasets, CIFAR-10, CIFAR-10, Tiny Imagenet, STL-10, and the MNIST dataset forimage reconstruction tasks. Our proposed defense methods ZO-RUDS and ZO-AE-RUDSbeat SOTA with a huge margin of $35%$ and $9%$, for low dimensional(CIFAR-10) and with a margin of $20.61%$ and $23.51%$ for high-dimensional(STL-10) datasets, respectively.", "output": "Certified Zeroth-order Black-Box Defense with Robust UNet Denoiser."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a novel method for Zero-Shot Anomaly Localization that leverages abidirectional mapping derived from the 1-dimensional Wasserstein Distance. Theproposed approach allows pinpointing the anomalous regions in a texture withincreased precision by aggregating the contribution of a pixel to the errors ofall nearby patches. We validate our solution on several datasets and obtainmore than a 40% reduction in error over the previous state of the art on theMVTec AD dataset in a zero-shot setting.", "output": "High-Fidelity Zero-Shot Texture Anomaly Localization Using Feature Correspondence Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video quality assessment (VQA) aims to simulate the human perception of videoquality, which is influenced by factors ranging from low-level color andtexture details to high-level semantic content. To effectively model thesecomplicated quality-related factors, in this paper, we decompose video intothree levels (ie, patch level, frame level, and clip level), and propose anovel Zoom-VQA architecture to perceive spatio-temporal features at differentlevels. It integrates three components: patch attention module, frame pyramidalignment, and clip ensemble strategy, respectively for capturingregion-of-interest in the spatial dimension, multi-level information atdifferent feature levels, and distortions distributed over the temporaldimension. Owing to the comprehensive design, Zoom-VQA obtains state-of-the-artresults on four VQA benchmarks and achieves 2nd place in the NTIRE 2023 VQAchallenge. Notably, Zoom-VQA has outperformed the previous best results on twosubsets of LSVQ, achieving 0.8860 (+1.0%) and 0.7985 (+1.9%) of SRCC on therespective subsets. Adequate ablation studies further verify the effectivenessof each component. Codes and models are released in", "output": "Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vision transformers have been applied successfully for image recognitiontasks. There have been either multi-headed self-attention based (ViTcite{dosovitskiy2020image}, DeIT, cite{touvron2021training}) similar to theoriginal work in textual models or more recently based on spectral layers(Fnetcite{lee2021fnet}, GFNetcite{rao2021global},AFNOcite{guibas2021efficient}). We hypothesize that both spectral andmulti-headed attention plays a major role. We investigate this hypothesisthrough this work and observe that indeed combining spectral and multi-headedattention layers provides a better transformer architecture. We thus proposethe novel Spectformer architecture for transformers that combines spectral andmulti-headed attention layers. We believe that the resulting representationallows the transformer to capture the feature representation appropriately andit yields improved performance over other transformer representations. Forinstance, it improves the top-1 accuracy by 2% on ImageNet compared to bothGFNet-H and LiT. SpectFormer-S reaches 84.25% top-1 accuracy on ImageNet-1K(state of the art for small version). Further, Spectformer-L achieves 85.7%that is the state of the art for the comparable base version of thetransformers. We further ensure that we obtain reasonable results in otherscenarios such as transfer learning on standard datasets such as CIFAR-10,CIFAR-100, Oxford-IIIT-flower, and Standford Car datasets. We then investigateits use in downstream tasks such of object detection and instance segmentationon the MS-COCO dataset and observe that Spectformer shows consistentperformance that is comparable to the best backbones and can be furtheroptimized and improved. Hence, we believe that combined spectral and attentionlayers are what are needed for vision transformers.", "output": "SpectFormer: Frequency and Attention is what you need in a Vision Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Document-based Visual Question Answering examines the document understandingof document images in conditions of natural language questions. We proposed anew document-based VQA dataset, PDF-VQA, to comprehensively examine thedocument understanding from various aspects, including document elementrecognition, document layout structural understanding as well as contextualunderstanding and key information extraction. Our PDF-VQA dataset extends thecurrent scale of document understanding that limits on the single document pageto the new scale that asks questions over the full document of multiple pages.We also propose a new graph-based VQA model that explicitly integrates thespatial and hierarchically structural relationships between different documentelements to boost the document structural understanding. The performances arecompared with several baselines over different question types andtasksfootnote{The full dataset will be released after paper acceptance.", "output": "PDF-VQA: A New Dataset for Real-World VQA on PDF Documents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the development of high-definition display devices, the practicalscenario of Super-Resolution (SR) usually needs to super-resolve large inputlike 2K to higher resolution (4K/8K). To reduce the computational and memorycost, current methods first split the large input into local patches and thenmerge the SR patches into the output. These methods adaptively allocate asubnet for each patch. Quantization is a very important technique for networkacceleration and has been used to design the subnets. Current methods train anMLP bit selector to determine the propoer bit for each layer. However, theyuniformly sample subnets for training, making simple subnets overfitted andcomplicated subnets underfitted. Therefore, the trained bit selector fails todetermine the optimal bit. Apart from this, the introduced bit selector bringsadditional cost to each layer of the SR network. In this paper, we propose anovel method named Content-Aware Bit Mapping (CABM), which can remove the bitselector without any performance loss. CABM also learns a bit selector for eachlayer during training. After training, we analyze the relation between the edgeinformation of an input patch and the bit of each layer. We observe that theedge information can be an effective metric for the selected bit. Therefore, wedesign a strategy to build an Edge-to-Bit lookup table that maps the edge scoreof a patch to the bit of each layer during inference. The bit configuration ofSR network can be determined by the lookup tables of all layers. Our strategycan find better bit configuration, resulting in more efficient mixed precisionnetworks. We conduct detailed experiments to demonstrate the generalizationability of our method. The code will be released.", "output": "CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network with Large Input."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Self-supervised learning (SSL) has made remarkable progress in visualrepresentation learning. Some studies combine SSL with knowledge distillation(SSL-KD) to boost the representation learning performance of small models. Inthis study, we propose a Multi-mode Online Knowledge Distillation method (MOKD)to boost self-supervised visual representation learning. Different fromexisting SSL-KD methods that transfer knowledge from a static pre-trainedteacher to a student, in MOKD, two different models learn collaboratively in aself-supervised manner. Specifically, MOKD consists of two distillation modes:self-distillation and cross-distillation modes. Among them, self-distillationperforms self-supervised learning for each model independently, whilecross-distillation realizes knowledge interaction between different models. Incross-distillation, a cross-attention feature search strategy is proposed toenhance the semantic feature alignment between different models. As a result,the two models can absorb knowledge from each other to boost theirrepresentation learning performance. Extensive experimental results ondifferent backbones and datasets demonstrate that two heterogeneous models canbenefit from MOKD and outperform their independently trained baseline. Inaddition, MOKD also outperforms existing SSL-KD methods for both the studentand teacher models.", "output": "Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The ability of image and video generation models to create photorealisticimages has reached unprecedented heights, making it difficult to distinguishbetween real and fake images in many cases. However, despite this progress, agap remains between the quality of generated images and those found in the realworld. To address this, we have reviewed a vast body of literature from bothacademic publications and social media to identify qualitative shortcomings inimage generation models, which we have classified into five categories. Byunderstanding these failures, we can identify areas where these models needimprovement, as well as develop strategies for detecting deep fakes. Theprevalence of deep fakes in today's society is a serious concern, and ourfindings can help mitigate their negative impact.", "output": "Qualitative Failures of Image Generation Models and Their Application in Detecting Deepfakes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which isdemonstrated to be one small step for generative AI (GAI), but one giant leapfor artificial general intelligence (AGI). Since its official release inNovember 2022, ChatGPT has quickly attracted numerous users with extensivemedia coverage. Such unprecedented attention has also motivated numerousresearchers to investigate ChatGPT from various aspects. According to Googlescholar, there are more than 500 articles with ChatGPT in their titles ormentioning it in their abstracts. Considering this, a review is urgentlyneeded, and our work fills this gap. Overall, this work is the first to surveyChatGPT with a comprehensive review of its underlying technology, applications,and challenges. Moreover, we present an outlook on how ChatGPT might evolve torealize general-purpose AIGC (a.k.a. AI-generated content), which will be asignificant milestone for the development of AGI.", "output": "One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Super-Resolution (SR) has gained increasing research attention over the pastfew years. With the development of Deep Neural Networks (DNNs), manysuper-resolution methods based on DNNs have been proposed. Although most ofthese methods are aimed at ordinary frames, there are few works onsuper-resolution of omnidirectional frames. In these works, omnidirectionalframes are projected from the 3D sphere to a 2D plane by Equi-RectangularProjection (ERP). Although ERP has been widely used for projection, it hassevere projection distortion near poles. Current DNN-based SR methods use 2Dconvolution modules, which is more suitable for the regular grid. In thispaper, we find that different projection methods have great impact on theperformance of DNNs. To study this problem, a comprehensive comparison ofprojections in omnidirectional super-resolution is conducted. We compare the SRresults of different projection methods. Experimental results show thatEqui-Angular cube map projection (EAC), which has minimal distortion, achievesthe best result in terms of WS-PSNR compared with other projections. Code anddata will be released.", "output": "A Comprehensive Comparison of Projections in Omnidirectional Super-Resolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Convolutional neural networks learns spatial features and are heavilyinterlinked within kernels. The SE module have broken the traditional route ofneural networks passing the entire result to next layer. Instead SE only passesimportant features to be learned with its squeeze and excitation (SE) module.We propose variations of the SE module which improvises the process of squeezeand excitation and enhances the performance. The proposed squeezing or excitingthe layer makes it possible for having a smooth transition of layer weights.These proposed variations also retain the characteristics of SE module. Theexperimented results are carried out on residual networks and the results aretabulated.", "output": "Variations of Squeeze and Excitation networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Correct identification and categorization of plant diseases are crucial forensuring the safety of the global food supply and the overall financial successof stakeholders. In this regard, a wide range of solutions has been madeavailable by introducing deep learning-based classification systems fordifferent staple crops. Despite being one of the most important commercialcrops in many parts of the globe, research proposing a smart solution forautomatically classifying apple leaf diseases remains relatively unexplored.This study presents a technique for identifying apple leaf diseases based ontransfer learning. The system extracts features using a pretrainedEfficientNetV2S architecture and passes to a classifier block for effectiveprediction. The class imbalance issues are tackled by utilizing runtime dataaugmentation. The effect of various hyperparameters, such as input resolution,learning rate, number of epochs, etc., has been investigated carefully. Thecompetence of the proposed pipeline has been evaluated on the apple leafdisease subset from the publicly available `PlantVillage' dataset, where itachieved an accuracy of 99.21%, outperforming the existing works.", "output": "An Efficient Transfer Learning-based Approach for Apple Leaf Disease Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "3D scanning as a technique to digitize objects in reality and create their 3Dmodels, is used in many fields and areas. Though the quality of 3D scansdepends on the technical characteristics of the 3D scanner, the common drawbackis the smoothing of fine details, or the edges of an object. We introduceSepicNet, a novel deep network for the detection and parametrization of sharpedges in 3D shapes as primitive curves. To make the network end-to-endtrainable, we formulate the curve fitting in a differentiable manner. Wedevelop an adaptive point cloud sampling technique that captures the sharpfeatures better than uniform sampling. The experiments were conducted on anewly introduced large-scale dataset of 50k 3D scans, where the sharp edgeannotations were extracted from their parametric CAD models, and demonstratesignificant improvement over state-of-the-art methods.", "output": "SepicNet: Sharp Edges Recovery by Parametric Inference of Curves in 3D Shapes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes a novel method for human hands tracking using data froman event camera. The event camera detects changes in brightness, measuringmotion, with low latency, no motion blur, low power consumption and highdynamic range. Captured frames are analysed using lightweight algorithmsreporting 3D hand position data. The chosen pick-and-place scenario serves asan example input for collaborative human-robot interactions and in obstacleavoidance for human-robot safety applications. Events data are pre-processedinto intensity frames. The regions of interest (ROI) are defined through objectedge event activity, reducing noise. ROI features are extracted for usein-depth perception. Event-based tracking of human hand demonstrated feasible,in real time and at a low computational cost. The proposed ROI-finding methodreduces noise from intensity images, achieving up to 89% of data reduction inrelation to the original, while preserving the features. The depth estimationerror in relation to ground truth (measured with wearables), measured usingdynamic time warping and using a single event camera, is from 15 to 30millimetres, depending on the plane it is measured. Tracking of human hands in3D space using a single event camera data and lightweight algorithms to defineROI features (hands tracking in space).", "output": "Event-based tracking of human hands."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "How to estimate the uncertainty of a given model is a crucial problem.Current calibration techniques treat different classes equally and thusimplicitly assume that the distribution of training data is balanced, butignore the fact that real-world data often follows a long-tailed distribution.In this paper, we explore the problem of calibrating the model trained from along-tailed distribution. Due to the difference between the imbalanced trainingdistribution and balanced test distribution, existing calibration methods suchas temperature scaling can not generalize well to this problem. Specificcalibration methods for domain adaptation are also not applicable because theyrely on unlabeled target domain instances which are not available. Modelstrained from a long-tailed distribution tend to be more overconfident to headclasses. To this end, we propose a novel knowledge-transferring-basedcalibration method by estimating the importance weights for samples of tailclasses to realize long-tailed calibration. Our method models the distributionof each class as a Gaussian distribution and views the source statistics ofhead classes as a prior to calibrate the target distributions of tail classes.We adaptively transfer knowledge from head classes to get the targetprobability density of tail classes. The importance weight is estimated by theratio of the target probability density over the source probability density.Extensive experiments on CIFAR-10-LT, MNIST-LT, CIFAR-100-LT, and ImageNet-LTdatasets demonstrate the effectiveness of our method.", "output": "Transfer Knowledge from Head to Tail: Uncertainty Calibration under Long-tailed Distribution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spiking neural networks have attracted extensive attention from researchersin many fields due to their brain-like information processing mechanism. Theproposal of surrogate gradient enables the spiking neural networks to migrateto more complex tasks, and gradually close the gap with the conventionalartificial neural networks. Current spiking neural networks utilize the outputof all moments to produce the final prediction, which compromises theirtemporal characteristics and causes a reduction in performance and efficiency.We propose a temporal knowledge sharing approach (TKS) that enables theinteraction of information between different moments, by selecting the outputof specific moments to compose teacher signals to guide the training of thenetwork along with the real labels. We have validated TKS on both staticdatasets CIFAR10, CIFAR100, ImageNet-1k and neuromorphic datasets DVS-CIFAR10,NCALTECH101. Our experimental results indicate that we have achieved thecurrent optimal performance in comparison with other algorithms. Experiments onFine-grained classification datasets further demonstrate our algorithm'ssuperiority with CUB-200-2011, StanfordDogs, and StanfordCars. TKS algorithmhelps the model to have stronger temporal generalization capability, allowingthe network to guarantee performance with large time steps in the trainingphase and with small time steps in the testing phase. This greatly facilitatesthe deployment of SNNs on edge devices.", "output": "Temporal Knowledge Sharing enable Spiking Neural Network Learning from Past and Future."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing implicit neural representation (INR) methods do not fully exploitspatiotemporal redundancies in videos. Index-based INRs ignore thecontent-specific spatial features and hybrid INRs ignore the contextualdependency on adjacent frames, leading to poor modeling capability for sceneswith large motion or dynamics. We analyze this limitation from the perspectiveof function fitting and reveal the importance of frame difference. To useexplicit motion information, we propose Difference Neural Representation forVideos (DNeRV), which consists of two streams for content and frame difference.We also introduce a collaborative content unit for effective feature fusion. Wetest DNeRV for video compression, inpainting, and interpolation. DNeRV achievescompetitive results against the state-of-the-art neural compression approachesand outperforms existing implicit methods on downstream inpainting andinterpolation for $960 times 1920$ videos.", "output": "DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A reliable perception has to be robust against challenging environmentalconditions. Therefore, recent efforts focused on the use of radar sensors inaddition to camera and lidar sensors for perception applications. However, thesparsity of radar point clouds and the poor data availability remainchallenging for current perception methods. To address these challenges, anovel graph neural network is proposed that does not just use the informationof the points themselves but also the relationships between the points. Themodel is designed to consider both point features and point-pair features,embedded in the edges of the graph. Furthermore, a general approach forachieving transformation invariance is proposed which is robust against unseenscenarios and also counteracts the limited data availability. Thetransformation invariance is achieved by an invariant data representationrather than an invariant model architecture, making it applicable to othermethods. The proposed RadarGNN model outperforms all previous methods on theRadarScenes dataset. In addition, the effects of different invariances on theobject detection and semantic segmentation quality are investigated. The codeis made available as open-source software under", "output": "RadarGNN: Transformation Invariant Graph Neural Network for Radar-based Perception."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes a novel approach to real-time automatic rim detection,classification, and inspection by combining traditional computer vision anddeep learning techniques. At the end of every automotive assembly line, aquality control process is carried out to identify any potential defects in theproduced cars. Common yet hazardous defects are related, for example, toincorrectly mounted rims. Routine inspections are mostly conducted by humanworkers that are negatively affected by factors such as fatigue or distraction.We have designed a new prototype to validate whether all four wheels on asingle car match in size and type. Additionally, we present three comprehensiveopen-source databases, CWD1500, WHEEL22, and RB600, for wheel, rim, and boltdetection, as well as rim classification, which are free-to-use for scientificpurposes.", "output": "Real-Time Wheel Detection and Rim Classification in Automotive Production."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce NeRD, a new demosaicking method for generating full-color imagesfrom Bayer patterns. Our approach leverages advancements in neural fields toperform demosaicking by representing an image as a coordinate-based neuralnetwork with sine activation functions. The inputs to the network are spatialcoordinates and a low-resolution Bayer pattern, while the outputs are thecorresponding RGB values. An encoder network, which is a blend of ResNet andU-net, enhances the implicit neural representation of the image to improve itsquality and ensure spatial consistency through prior learning. Our experimentalresults demonstrate that NeRD outperforms traditional and state-of-the-artCNN-based methods and significantly closes the gap to transformer-basedmethods.", "output": "NeRD: Neural field-based Demosaicking."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial examples have been found for various deep as well as shallowlearning models, and have at various times been suggested to be either fixablemodel-specific bugs, or else inherent dataset feature, or both. We presenttheoretical and empirical results to show that adversarial examples areapproximate discontinuities resulting from models that specify approximatelybijective maps $f: Bbb R^n to Bbb R^m; n neq m$ over their inputs, and thisdiscontinuity follows from the topological invariance of dimension.", "output": "Adversarial Examples from Dimensional Invariance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Age is an important variable to describe the expected brain's anatomy statusacross the normal aging trajectory. The deviation from that normative agingtrajectory may provide some insights into neurological diseases. Inneuroimaging, predicted brain age is widely used to analyze different diseases.However, using only the brain age gap information (ie the difference betweenthe chronological age and the estimated age) can be not enough informative fordisease classification problems. In this paper, we propose to extend the notionof global brain age by estimating brain structure ages using structuralmagnetic resonance imaging. To this end, an ensemble of deep learning models isfirst used to estimate a 3D aging map (ie voxel-wise age estimation). Then, a3D segmentation mask is used to obtain the final brain structure ages. Thisbiomarker can be used in several situations. First, it enables to accuratelyestimate the brain age for the purpose of anomaly detection at the populationlevel. In this situation, our approach outperforms several state-of-the-artmethods. Second, brain structure ages can be used to compute the deviation fromthe normal aging process of each brain structure. This feature can be used in amulti-disease classification task for an accurate differential diagnosis at thesubject level. Finally, the brain structure age deviations of individuals canbe visualized, providing some insights about brain abnormality and helpingclinicians in real medical contexts.", "output": "Brain Structure Ages -- A new biomarker for multi-disease classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent works have shown that large models pretrained on common visuallearning tasks can provide useful representations for a wide range ofspecialized perception problems, as well as a variety of robotic manipulationtasks. While prior work on robotic manipulation has predominantly used frozenpretrained features, we demonstrate that in robotics this approach can fail toreach optimal performance, and that fine-tuning of the full model can lead tosignificantly better results. Unfortunately, fine-tuning disrupts thepretrained visual representation, and causes representational drift towards thefine-tuned task thus leading to a loss of the versatility of the originalmodel. We introduce \"lossless adaptation\" to address this shortcoming ofclassical fine-tuning. We demonstrate that appropriate placement of ourparameter efficient adapters can significantly reduce the performance gapbetween frozen pretrained representations and full end-to-end fine-tuningwithout changes to the original representation and thus preserving originalcapabilities of the pretrained model. We perform a comprehensive investigationacross three major model architectures (ViTs, NFNets, and ResNets), supervised(ImageNet-1K classification) and self-supervised pretrained weights (CLIP,BYOL, Visual MAE) in 3 task domains and 35 individual tasks, and demonstratethat our claims are strongly validated in various settings.", "output": "Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Humans possess the capacity to reason about the future based on a sparsecollection of visual cues acquired over time. In order to emulate this ability,we introduce a novel task called Anticipation Captioning, which generates acaption for an unseen oracle image using a sparsely temporally-ordered set ofimages. To tackle this new task, we propose a model called A-CAP, whichincorporates commonsense knowledge into a pre-trained vision-language model,allowing it to anticipate the caption. Through both qualitative andquantitative evaluations on a customized visual storytelling dataset, A-CAPoutperforms other image captioning methods and establishes a strong baselinefor anticipation captioning. We also address the challenges inherent in thistask.", "output": "A-CAP: Anticipation Captioning with Commonsense Knowledge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper investigates the problem of class-incremental object detection foragricultural applications where a model needs to learn new plant species anddiseases incrementally without forgetting the previously learned ones. We adapttwo public datasets to include new categories over time, simulating a morerealistic and dynamic scenario. We then compare three class-incrementallearning methods that leverage different forms of knowledge distillation tomitigate catastrophic forgetting. Our experiments show that all three methodssuffer from catastrophic forgetting, but the recent Dynamic Y-KD approach,which additionally uses a dynamic architecture that grows new branches to learnnew tasks, outperforms ILOD and Faster-ILOD in most scenarios both on new andold classes.These results highlight the challenges and opportunities of continual objectdetection for agricultural applications. In particular, the large intra-classand small inter-class variability that is typical of plant images exacerbatethe difficulty of learning new categories without interfering with previousknowledge. We publicly release our code to encourage future work.", "output": "Class-Incremental Learning of Plant and Disease Detection: Growing Branches with Knowledge Distillation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Without access to the source data, source-free domain adaptation (SFDA)transfers knowledge from a source-domain trained model to target domains.Recently, SFDA has gained popularity due to the need to protect the dataprivacy of the source domain, but it suffers from catastrophic forgetting onthe source domain due to the lack of data. To systematically investigate themechanism of catastrophic forgetting, we first reimplement previous SFDAapproaches within a unified framework and evaluate them on four benchmarks. Weobserve that there is a trade-off between adaptation gain and forgetting loss,which motivates us to design a consistency regularization to mitigateforgetting. In particular, we propose a continual source-free domain adaptationapproach named CoSDA, which employs a dual-speed optimized teacher-studentmodel pair and is equipped with consistency learning capability. Ourexperiments demonstrate that CoSDA outperforms state-of-the-art approaches incontinuous adaptation. Notably, our CoSDA can also be integrated with otherSFDA methods to alleviate forgetting.", "output": "CoSDA: Continual Source-Free Domain Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have proven to be highly effective in generatinghigh-quality images. However, adapting large pre-trained diffusion models tonew domains remains an open challenge, which is critical for real-worldapplications. This paper proposes DiffFit, a parameter-efficient strategy tofine-tune large pre-trained diffusion models that enable fast adaptation to newdomains. DiffFit is embarrassingly simple that only fine-tunes the bias termand newly-added scaling factors in specific layers, yet resulting insignificant training speed-up and reduced model storage costs. Compared withfull fine-tuning, DiffFit achieves 2$times$ training speed-up and only needsto store approximately 0.12% of the total model parameters. Intuitivetheoretical analysis has been provided to justify the efficacy of scalingfactors on fast adaptation. On 8 downstream datasets, DiffFit achieves superioror competitive performances compared to the full fine-tuning while being moreefficient. Remarkably, we show that DiffFit can adapt a pre-trainedlow-resolution generative model to a high-resolution one by adding minimalcost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of3.02 on ImageNet 512$times$512 benchmark by fine-tuning only 25 epochs from apublic pre-trained ImageNet 256$times$256 checkpoint while being 30$times$more training efficient than the closest competitor.", "output": "DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Due to the limitations of inadequate Whole-Slide Image (WSI) samples withweak labels, pseudo-bag-based multiple instance learning (MIL) appears as avibrant prospect in WSI classification. However, the pseudo-bag dividingscheme, often crucial for classification performance, is still an open topicworth exploring. Therefore, this paper proposes a novel scheme, ProtoDiv, usinga bag prototype to guide the division of WSI pseudo-bags. Rather than designingcomplex network architecture, this scheme takes a plugin-and-play approach tosafely augment WSI data for effective training while preserving sampleconsistency. Furthermore, we specially devise an attention-based prototype thatcould be optimized dynamically in training to adapt to a classification task.We apply our ProtoDiv scheme on seven baseline models, and then carry out agroup of comparison experiments on two public WSI datasets. Experiments confirmour ProtoDiv could usually bring obvious performance improvements to WSIclassification.", "output": "ProtoDiv: Prototype-guided Division of Consistent Pseudo-bags for Whole-slide Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Breast cancer has reached the highest incidence rate worldwide among allmalignancies since 2020. Breast imaging plays a significant role in earlydiagnosis and intervention to improve the outcome of breast cancer patients. Inthe past decade, deep learning has shown remarkable progress in breast cancerimaging analysis, holding great promise in interpreting the rich informationand complex context of breast imaging modalities. Considering the rapidimprovement in the deep learning technology and the increasing severity ofbreast cancer, it is critical to summarize past progress and identify futurechallenges to be addressed. In this paper, we provide an extensive survey ofdeep learning-based breast cancer imaging research, covering studies onmammogram, ultrasound, magnetic resonance imaging, and digital pathology imagesover the past decade. The major deep learning methods, publicly availabledatasets, and applications on imaging-based screening, diagnosis, treatmentresponse prediction, and prognosis are described in detail. Drawn from thefindings of this survey, we present a comprehensive discussion of thechallenges and potential avenues for future research in deep learning-basedbreast cancer imaging.", "output": "Deep Learning in Breast Cancer Imaging: A Decade of Progress and Future Directions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most state-of-the-art instance segmentation methods rely on large amounts ofpixel-precise ground-truth annotations for training, which are expensive tocreate. Interactive segmentation networks help generate such annotations basedon an image and the corresponding user interactions such as clicks. Existingmethods for this task can only process a single instance at a time and eachuser interaction requires a full forward pass through the entire deep network.We introduce a more efficient approach, called DynaMITe, in which we representuser interactions as spatio-temporal queries to a Transformer decoder with apotential to segment multiple object instances in a single iteration. Ourarchitecture also alleviates any need to re-compute image features duringrefinement, and requires fewer interactions for segmenting multiple instancesin a single image when compared to other methods. DynaMITe achievesstate-of-the-art results on multiple existing interactive segmentationbenchmarks, and also on the new multi-instance benchmark that we propose inthis paper.", "output": "DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spatial control is a core capability in controllable image generation.Advancements in layout-guided image generation have shown promising results onin-distribution (ID) datasets with similar spatial configurations. However, itis unclear how these models perform when facing out-of-distribution (OOD)samples with arbitrary, unseen layouts. In this paper, we propose LayoutBench,a diagnostic benchmark for layout-guided image generation that examines fourcategories of spatial control skills: number, position, size, and shape. Webenchmark two recent representative layout-guided image generation methods andobserve that the good ID layout control may not generalize well to arbitrarylayouts in the wild (e.g., objects at the boundary). Next, we proposeIterInpaint, a new baseline that generates foreground and background regions ina step-by-step manner via inpainting, demonstrating stronger generalizabilitythan existing models on OOD layouts in LayoutBench. We perform quantitative andqualitative evaluation and fine-grained analysis on the four LayoutBench skillsto pinpoint the weaknesses of existing models. Lastly, we show comprehensiveablation studies on IterInpaint, including training task ratio, crop&paste vs.repaint, and generation order. Project website: ", "output": "Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Few-shot learning (FSL) techniques seek to learn the underlying patterns indata using fewer samples, analogous to how humans learn from limitedexperience. In this limited-data scenario, the challenges associated with deepneural networks, such as shortcut learning and texture bias behaviors, arefurther exacerbated. Moreover, the significance of addressing shortcut learningis not yet fully explored in the few-shot setup. To address these issues, wepropose LSFSL, which enforces the model to learn more generalizable featuresutilizing the implicit prior information present in the data. Throughcomprehensive analyses, we demonstrate that LSFSL-trained models are lessvulnerable to alteration in color schemes, statistical correlations, andadversarial perturbations leveraging the global semantics in the data. Ourfindings highlight the potential of incorporating relevant priors in few-shotapproaches to increase robustness and generalization.", "output": "LSFSL: Leveraging Shape Information in Few-shot Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have recently become the de-facto approach for generativemodeling in the 2D domain. However, extending diffusion models to 3D ischallenging due to the difficulties in acquiring 3D ground truth data fortraining. On the other hand, 3D GANs that integrate implicit 3D representationsinto GANs have shown remarkable 3D-aware generation when trained only onsingle-view image datasets. However, 3D GANs do not provide straightforwardways to precisely control image synthesis. To address these challenges, Wepresent Control3Diff, a 3D diffusion model that combines the strengths ofdiffusion models and 3D GANs for versatile, controllable 3D-aware imagesynthesis for single-view datasets. Control3Diff explicitly models theunderlying latent distribution (optionally conditioned on external inputs),thus enabling direct control during the diffusion process. Moreover, ourapproach is general and applicable to any type of controlling input, allowingus to train it with the same diffusion objective without any auxiliarysupervision. We validate the efficacy of Control3Diff on standard imagegeneration benchmarks, including FFHQ, AFHQ, and ShapeNet, using variousconditioning inputs such as images, sketches, and text prompts. Please see theproject website (url{ for video comparisons.", "output": "Learning Controllable 3D Diffusion Models from Single-view Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Burst image processing is becoming increasingly popular in recent years.However, it is a challenging task since individual burst images undergomultiple degradations and often have mutual misalignments resulting in ghostingand zipper artifacts. Existing burst restoration methods usually do notconsider the mutual correlation and non-local contextual information amongburst frames, which tends to limit these approaches in challenging cases.Another key challenge lies in the robust up-sampling of burst frames. Theexisting up-sampling methods cannot effectively utilize the advantages ofsingle-stage and progressive up-sampling strategies with conventional and/orrecent up-samplers at the same time. To address these challenges, we propose anovel Gated Multi-Resolution Transfer Network (GMTNet) to reconstruct aspatially precise high-quality image from a burst of low-quality raw images.GMTNet consists of three modules optimized for burst processing tasks:Multi-scale Burst Feature Alignment (MBFA) for feature denoising and alignment,Transposed-Attention Feature Merging (TAFM) for multi-frame featureaggregation, and Resolution Transfer Feature Up-sampler (RTFU) to up-scalemerged features and construct a high-quality output image. Detailedexperimental analysis on five datasets validates our approach and sets astate-of-the-art for burst super-resolution, burst denoising, and low-lightburst enhancement.", "output": "Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a method to estimate the mechanical parameters of fabrics using acasual capture setup with a depth camera. Our approach enables to createmechanically-correct digital representations of real-world textile materials,which is a fundamental step for many interactive design and engineeringapplications. As opposed to existing capture methods, which typically requireexpensive setups, video sequences, or manual intervention, our solution cancapture at scale, is agnostic to the optical appearance of the textile, andfacilitates fabric arrangement by non-expert operators. To this end, we proposea sim-to-real strategy to train a learning-based framework that can take asinput one or multiple images and outputs a full set of mechanical parameters.Thanks to carefully designed data augmentation and transfer learning protocols,our solution generalizes to real images despite being trained only on syntheticdata, hence successfully closing the sim-to-real loop.Key in our work is todemonstrate that evaluating the regression accuracy based on the similarity atparameter space leads to an inaccurate distances that do not match the humanperception. To overcome this, we propose a novel metric for fabric drapesimilarity that operates on the image domain instead on the parameter space,allowing us to evaluate our estimation within the context of a similarity rank.We show that out metric correlates with human judgments about the perception ofdrape similarity, and that our model predictions produce perceptually accurateresults compared to the ground truth parameters.", "output": "How Will It Drape Like? Capturing Fabric Mechanics from Depth Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural Radiance Field training can be accelerated through the use ofgrid-based representations in NeRF's learned mapping from spatial coordinatesto colors and volumetric density. However, these grid-based approaches lack anexplicit understanding of scale and therefore often introduce aliasing, usuallyin the form of jaggies or missing scene content. Anti-aliasing has previouslybeen addressed by mip-NeRF 360, which reasons about sub-volumes along a conerather than points along a ray, but this approach is not natively compatiblewith current grid-based techniques. We show how ideas from rendering and signalprocessing can be used to construct a technique that combines mip-NeRF 360 andgrid-based models such as Instant NGP to yield error rates that are 8% - 76%lower than either prior technique, and that trains 22x faster than mip-NeRF360.", "output": "Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, there has been an arms race of pose forecasting methods aimed atsolving the spatio-temporal task of predicting a sequence of future 3D poses ofa person given a sequence of past observed ones. However, the lack of unifiedbenchmarks and limited uncertainty analysis have hindered progress in thefield. To address this, we first develop an open-source library for human poseforecasting, featuring multiple models, datasets, and standardized evaluationmetrics, with the aim of promoting research and moving toward a unified andfair evaluation. Second, we devise two types of uncertainty in the problem toincrease performance and convey better trust: 1) we propose a method formodeling aleatoric uncertainty by using uncertainty priors to inject knowledgeabout the behavior of uncertainty. This focuses the capacity of the model inthe direction of more meaningful supervision while reducing the number oflearned parameters and improving stability; 2) we introduce a novel approachfor quantifying the epistemic uncertainty of any model through clustering andmeasuring the entropy of its assignments. Our experiments demonstrate up to$25%$ improvements in accuracy and better performance in uncertaintyestimation.", "output": "Toward Reliable Human Pose Forecasting with Uncertainty."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Understanding verbs is crucial to modelling how people and objects interactwith each other and the environment through space and time. Recently,state-of-the-art video-language models based on CLIP have been shown to havelimited verb understanding and to rely extensively on nouns, restricting theirperformance in real-world video applications that require action and temporalunderstanding. In this work, we improve verb understanding for CLIP-basedvideo-language models by proposing a new Verb-Focused Contrastive (VFC)framework. This consists of two main components: (1) leveraging pretrainedlarge language models (LLMs) to create hard negatives for cross-modalcontrastive learning, together with a calibration strategy to balance theoccurrence of concepts in positive and negative pairs; and (2) enforcing afine-grained, verb phrase alignment loss. Our method achieves state-of-the-artresults for zero-shot performance on three downstream tasks that focus on verbunderstanding: video-text matching, video question-answering and videoclassification. To the best of our knowledge, this is the first work whichproposes a method to alleviate the verb understanding problem, and does notsimply highlight it.", "output": "Verbs in Action: Improving verb understanding in video-language models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current transformer-based change detection (CD) approaches either employ apre-trained model trained on large-scale image classification ImageNet datasetor rely on first pre-training on another CD dataset and then fine-tuning on thetarget benchmark. This current strategy is driven by the fact that transformerstypically require a large amount of training data to learn inductive biases,which is insufficient in standard CD datasets due to their small size. Wedevelop an end-to-end CD approach with transformers that is trained fromscratch and yet achieves state-of-the-art performance on four publicbenchmarks. Instead of using conventional self-attention that struggles tocapture inductive biases when trained from scratch, our architecture utilizes ashuffled sparse-attention operation that focuses on selected sparse informativeregions to capture the inherent characteristics of the CD data. Moreover, weintroduce a change-enhanced feature fusion (CEFF) module to fuse the featuresfrom input image pairs by performing a per-channel re-weighting. Our CEFFmodule aids in enhancing the relevant semantic changes while suppressing thenoisy ones. Extensive experiments on four CD datasets reveal the merits of theproposed contributions, achieving gains as high as 14.27% inintersection-over-union (IoU) score, compared to the best-published results inthe literature. Code is available aturl{", "output": "Remote Sensing Change Detection With Transformers Trained from Scratch."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We address the problem of learning person-specific facial priors from a smallnumber (e.g., 20) of portrait photos of the same person. This enables us toedit this specific person's facial appearance, such as expression and lighting,while preserving their identity and high-frequency facial details. Key to ourapproach, which we dub DiffusionRig, is a diffusion model conditioned on, or\"rigged by,\" crude 3D face models estimated from single in-the-wild images byan off-the-shelf estimator. On a high level, DiffusionRig learns to mapsimplistic renderings of 3D face models to realistic photos of a given person.Specifically, DiffusionRig is trained in two stages: It first learns genericfacial priors from a large-scale face dataset and then person-specific priorsfrom a small portrait photo collection of the person of interest. By learningthe CGI-to-photo mapping with such personalized priors, DiffusionRig can \"rig\"the lighting, facial expression, head pose, etc. of a portrait photo,conditioned only on coarse 3D models while preserving this person's identityand other high-frequency characteristics. Qualitative and quantitativeexperiments show that DiffusionRig outperforms existing approaches in bothidentity preservation and photorealism. Please see the project website: for the supplemental material, video, code, anddata.", "output": "DiffusionRig: Learning Personalized Priors for Facial Appearance Editing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large-scale Vision-Language Models, such as CLIP, learn powerful image-textrepresentations that have found numerous applications, from zero-shotclassification to text-to-image generation. Despite that, their capabilitiesfor solving novel discriminative tasks via prompting fall behind those of largelanguage models, such as GPT-3. Here we explore the idea of visual promptengineering for solving computer vision tasks beyond classification by editingin image space instead of text. In particular, we discover an emergent abilityof CLIP, where, by simply drawing a red circle around an object, we can directthe model's attention to that region, while also maintaining globalinformation. We show the power of this simple approach by achievingstate-of-the-art in zero-shot referring expressions comprehension and strongperformance in keypoint localization tasks. Finally, we draw attention to somepotential ethical concerns of large language-vision models.", "output": "What does CLIP know about a red circle? Visual prompt engineering for VLMs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "3D-aware image synthesis encompasses a variety of tasks, such as scenegeneration and novel view synthesis from images. Despite numerous task-specificmethods, developing a comprehensive model remains challenging. In this paper,we present SSDNeRF, a unified approach that employs an expressive diffusionmodel to learn a generalizable prior of neural radiance fields (NeRF) frommulti-view images of diverse objects. Previous studies have used two-stageapproaches that rely on pretrained NeRFs as real data to train diffusionmodels. In contrast, we propose a new single-stage training paradigm with anend-to-end objective that jointly optimizes a NeRF auto-decoder and a latentdiffusion model, enabling simultaneous 3D reconstruction and prior learning,even from sparsely available views. At test time, we can directly sample thediffusion prior for unconditional generation, or combine it with arbitraryobservations of unseen objects for NeRF reconstruction. SSDNeRF demonstratesrobust results comparable to or better than leading task-specific methods inunconditional generation and single/sparse-view 3D reconstruction.", "output": "Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large-scale models pre-trained on large-scale datasets have profoundlyadvanced the development of deep learning. However, the state-of-the-art modelsfor medical image segmentation are still small-scale, with their parametersonly in the tens of millions. Further scaling them up to higher orders ofmagnitude is rarely explored. An overarching goal of exploring large-scalemodels is to train them on large-scale medical segmentation datasets for bettertransfer capacities. In this work, we design a series of Scalable andTransferable U-Net (STU-Net) models, with parameter sizes ranging from 14million to 1.4 billion. Notably, the 1.4B STU-Net is the largest medical imagesegmentation model to date. Our STU-Net is based on nnU-Net framework due toits popularity and impressive performance. We first refine the defaultconvolutional blocks in nnU-Net to make them scalable. Then, we empiricallyevaluate different scaling combinations of network depth and width, discoveringthat it is optimal to scale model depth and width together. We train ourscalable STU-Net models on a large-scale TotalSegmentator dataset and find thatincreasing model size brings a stronger performance gain. This observationreveals that a large model is promising in medical image segmentation.Furthermore, we evaluate the transferability of our model on 14 downstreamdatasets for direct inference and 3 datasets for further fine-tuning, coveringvarious modalities and segmentation targets. We observe good performance of ourpre-trained model in both direct inference and fine-tuning. The code andpre-trained models are available at ", "output": "STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces a novel representation of volumetric videos forreal-time view synthesis of dynamic scenes. Recent advances in neural scenerepresentations demonstrate their remarkable capability to model and rendercomplex static scenes, but extending them to represent dynamic scenes is notstraightforward due to their slow rendering speed or high storage cost. Tosolve this problem, our key idea is to represent the radiance field of eachframe as a set of shallow MLP networks whose parameters are stored in 2D grids,called MLP maps, and dynamically predicted by a 2D CNN decoder shared by allframes. Representing 3D scenes with shallow MLPs significantly improves therendering speed, while dynamically predicting MLP parameters with a shared 2DCNN instead of explicitly storing them leads to low storage cost. Experimentsshow that the proposed approach achieves state-of-the-art rendering quality onthe NHR and ZJU-MoCap datasets, while being efficient for real-time renderingwith a speed of 41.7 fps for $512 times 512$ images on an RTX 3090 GPU. Thecode is available at ", "output": "Representing Volumetric Videos as Dynamic MLP Maps."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the growing demand for interactive AI systems, there have been fewcomprehensive studies on human-AI interaction in visual understanding e.g.segmentation. Inspired by the development of prompt-based universal interfacesfor LLMs, this paper presents SEEM, a promptable, interactive model forSegmenting Everything Everywhere all at once in an image. SEEM has fourdesiderata: i) Versatility: by introducing a versatile prompting engine fordifferent types of prompts, including points, boxes, scribbles, masks, texts,and referred regions of another image; ii) Compositionality: by learning ajoint visual-semantic space for visual and textual prompts to compose querieson the fly for inference as shown in Fig 1; iii)Interactivity: by incorporatinglearnable memory prompts to retain dialog history information via mask-guidedcross-attention; and iv) Semantic-awareness: by using a text encoder to encodetext queries and mask labels for open-vocabulary segmentation.", "output": "Segment Everything Everywhere All at Once."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent advances in camera-based bird's eye view (BEV) representationexhibit great potential for in-vehicle 3D perception. Despite the substantialprogress achieved on standard benchmarks, the robustness of BEV algorithms hasnot been thoroughly examined, which is critical for safe operations. To bridgethis gap, we introduce RoboBEV, a comprehensive benchmark suite thatencompasses eight distinct corruptions, including Bright, Dark, Fog, Snow,Motion Blur, Color Quant, Camera Crash, and Frame Lost. Based on it, weundertake extensive evaluations across a wide range of BEV-based models tounderstand their resilience and reliability. Our findings indicate a strongcorrelation between absolute performance on in-distribution andout-of-distribution datasets. Nonetheless, there are considerable variations inrelative performance across different approaches. Our experiments furtherdemonstrate that pre-training and depth-free BEV transformation has thepotential to enhance out-of-distribution robustness. Additionally, utilizinglong and rich temporal information largely helps with robustness. Our findingsprovide valuable insights for designing future BEV models that can achieve bothaccuracy and robustness in real-world deployments.", "output": "RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Plain text has become a prevalent interface for text-to-image synthesis.However, its limited customization options hinder users from accuratelydescribing desired outputs. For example, plain text makes it hard to specifycontinuous quantities, such as the precise RGB color value or importance ofeach word. Furthermore, creating detailed text prompts for complex scenes istedious for humans to write and challenging for text encoders to interpret. Toaddress these challenges, we propose using a rich-text editor supportingformats such as font style, size, color, and footnote. We extract each word'sattributes from rich text to enable local style control, explicit tokenreweighting, precise color rendering, and detailed region synthesis. We achievethese capabilities through a region-based diffusion process. We first obtaineach word's region based on cross-attention maps of a vanilla diffusion processusing plain text. For each region, we enforce its text attributes by creatingregion-specific detailed prompts and applying region-specific guidance. Wepresent various examples of image generation from rich text and demonstratethat our method outperforms strong baselines with quantitative evaluations.", "output": "Expressive Text-to-Image Generation with Rich Text."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Univariate and multivariate normal probability distributions are widely usedwhen modeling decisions under uncertainty. Computing the performance of suchmodels requires integrating these distributions over specific domains, whichcan vary widely across models. Besides some special cases, there exist nogeneral analytical expressions, standard numerical methods or software forthese integrals. Here we present mathematical results and open-source softwarethat provide (i) the probability in any domain of a normal in any dimensionswith any parameters, (ii) the probability density, cumulative distribution, andinverse cumulative distribution of any function of a normal vector, (iii) theclassification errors among any number of normal distributions, theBayes-optimal discriminability index and relation to the operatingcharacteristic, (iv) dimension reduction and visualizations for such problems,and (v) tests for how reliably these methods may be used on given data. Wedemonstrate these tools with vision research applications of detectingoccluding objects in natural scenes, and detecting camouflage.", "output": "A method to integrate and classify normal distributions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Areas under ROC (AUROC) and precision-recall curves (AUPRC) are commonmetrics for evaluating classification performance for imbalanced problems.Compared with AUROC, AUPRC is a more appropriate metric for highly imbalanceddatasets. While stochastic optimization of AUROC has been studied extensively,principled stochastic optimization of AUPRC has been rarely explored. In thiswork, we propose a principled technical method to optimize AUPRC for deeplearning. Our approach is based on maximizing the averaged precision (AP),which is an unbiased point estimator of AUPRC. We cast the objective into a sumof {it dependent compositional functions} with inner functions dependent onrandom variables of the outer level. We propose efficient adaptive andnon-adaptive stochastic algorithms named SOAP with {it provable convergenceguarantee under mild conditions} by leveraging recent advances in stochasticcompositional optimization. Extensive experimental results on image and graphdatasets demonstrate that our proposed method outperforms prior methods onimbalanced problems in terms of AUPRC. To the best of our knowledge, our workrepresents the first attempt to optimize AUPRC with provable convergence. TheSOAP has been implemented in the libAUC library at~url{", "output": "Stochastic Optimization of Areas Under Precision-Recall Curves with Provable Convergence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The human-object interaction (HOI) detection task refers to localizinghumans, localizing objects, and predicting the interactions between eachhuman-object pair. HOI is considered one of the fundamental steps in trulyunderstanding complex visual scenes. For detecting HOI, it is important toutilize relative spatial configurations and object semantics to find salientspatial regions of images that highlight the interactions between human objectpairs. This issue is addressed by the novel self-attention based guidedtransformer network, GTNet. GTNet encodes this spatial contextual informationin human and object visual features via self-attention while achieving state ofthe art results on both the V-COCO and HICO-DET datasets. Code will be madeavailable online.", "output": "GTNet:Guided Transformer Network for Detecting Human-Object Interactions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As a fundamental yet challenging problem in intelligent transportationsystems, point cloud registration attracts vast attention and has been attainedwith various deep learning-based algorithms. The unsupervised registrationalgorithms take advantage of deep neural network-enabled novel representationlearning while requiring no human annotations, making them applicable toindustrial applications. However, unsupervised methods mainly depend on globaldescriptors, which ignore the high-level representations of local geometries.In this paper, we propose to jointly use both global and local descriptors toregister point clouds in a self-supervised manner, which is motivated by acritical observation that all local geometries of point clouds are transformedconsistently under the same transformation. Therefore, local geometries can beemployed to enhance the representation ability of the feature extractionmodule. Moreover, the proposed local descriptor is flexible and can beintegrated into most existing registration methods and improve theirperformance. Besides, we also utilize point cloud reconstruction and normalestimation to enhance the transformation awareness of global and localdescriptors. Lastly, extensive experimental results on one synthetic and threereal-world datasets demonstrate that our method outperforms existingstate-of-art unsupervised registration methods and even surpasses supervisedones in some cases. Robustness and computational efficiency evaluations alsoindicate that the proposed method applies to intelligent vehicles.", "output": "Self-Supervised Point Cloud Registration with Deep Versatile Descriptors."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Convolutional Neural Networks are the de facto models for image recognition.However 3D CNNs, the straight forward extension of 2D CNNs for videorecognition, have not achieved the same success on standard action recognitionbenchmarks. One of the main reasons for this reduced performance of 3D CNNs isthe increased computational complexity requiring large scale annotated datasetsto train them in scale. 3D kernel factorization approaches have been proposedto reduce the complexity of 3D CNNs. Existing kernel factorization approachesfollow hand-designed and hard-wired techniques. In this paper we proposeGate-Shift-Fuse (GSF), a novel spatio-temporal feature extraction module whichcontrols interactions in spatio-temporal decomposition and learns to adaptivelyroute features through time and combine them in a data dependent manner. GSFleverages grouped spatial gating to decompose input tensor and channelweighting to fuse the decomposed tensors. GSF can be inserted into existing 2DCNNs to convert them into an efficient and high performing spatio-temporalfeature extractor, with negligible parameter and compute overhead. We performan extensive analysis of GSF using two popular 2D CNN families and achievestate-of-the-art or competitive performance on five standard action recognitionbenchmarks. Code and models will be made publicly available at", "output": "Gate-Shift-Fuse for Video Action Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A significant gap remains between today's visual pattern recognition modelsand human-level visual cognition especially when it comes to few-shot learningand compositional reasoning of novel concepts. We introduce Bongard-HOI, a newvisual reasoning benchmark that focuses on compositional learning ofhuman-object interactions (HOIs) from natural images. It is inspired by twodesirable characteristics from the classical Bongard problems (BPs): 1)few-shot concept learning, and 2) context-dependent reasoning. We carefullycurate the few-shot instances with hard negatives, where positive and negativeimages only disagree on action labels, making mere recognition of objectcategories insufficient to complete our benchmarks. We also design multipletest sets to systematically study the generalization of visual learning models,where we vary the overlap of the HOI concepts between the training and testsets of few-shot instances, from partial to no overlaps. Bongard-HOI presents asubstantial challenge to today's visual recognition models. Thestate-of-the-art HOI detection model achieves only 62% accuracy on few-shotbinary prediction while even amateur human testers on MTurk have 91% accuracy.With the Bongard-HOI benchmark, we hope to further advance research efforts invisual reasoning, especially in holistic perception-reasoning systems andbetter representation learning.", "output": "Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While diffusion models have shown great success in image generation, theirnoise-inverting generative process does not explicitly consider the structureof images, such as their inherent multi-scale nature. Inspired by diffusionmodels and the empirical success of coarse-to-fine modelling, we propose a newdiffusion-like model that generates images through stochastically reversing theheat equation, a PDE that locally erases fine-scale information when run overthe 2D plane of the image. We interpret the solution of the forward heatequation with constant additive noise as a variational approximation in thediffusion latent variable model. Our new model shows emergent qualitativeproperties not seen in standard diffusion models, such as disentanglement ofoverall colour and shape in images. Spectral analysis on natural imageshighlights connections to diffusion models and reveals an implicitcoarse-to-fine inductive bias in them.", "output": "Generative Modelling With Inverse Heat Dissipation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Densely annotating LiDAR point clouds is costly, which restrains thescalability of fully-supervised learning methods. In this work, we study theunderexplored semi-supervised learning (SSL) in LiDAR segmentation. Our coreidea is to leverage the strong spatial cues of LiDAR point clouds to betterexploit unlabeled data. We propose LaserMix to mix laser beams from differentLiDAR scans, and then encourage the model to make consistent and confidentpredictions before and after mixing. Our framework has three appealingproperties: 1) Generic: LaserMix is agnostic to LiDAR representations (e.g.,range view and voxel), and hence our SSL framework can be universally applied.2) Statistically grounded: We provide a detailed analysis to theoreticallyexplain the applicability of the proposed framework. 3) Effective:Comprehensive experimental analysis on popular LiDAR segmentation datasets(nuScenes, SemanticKITTI, and ScribbleKITTI) demonstrates our effectiveness andsuperiority. Notably, we achieve competitive results over fully-supervisedcounterparts with 2x to 5x fewer labels and improve the supervised-onlybaseline significantly by 10.8% on average. We hope this concise yethigh-performing framework could facilitate future research in semi-supervisedLiDAR segmentation. Code is publicly available.", "output": "LaserMix for Semi-Supervised LiDAR Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As two fundamental representation modalities of 3D objects, 3D point cloudsand multi-view 2D images record shape information from different domains ofgeometric structures and visual appearances. In the current deep learning era,remarkable progress in processing such two data modalities has been achievedthrough respectively customizing compatible 3D and 2D network architectures.However, unlike multi-view image-based 2D visual modeling paradigms, which haveshown leading performance in several common 3D shape recognition benchmarks,point cloud-based 3D geometric modeling paradigms are still highly limited byinsufficient learning capacity, due to the difficulty of extractingdiscriminative features from irregular geometric signals. In this paper, weexplore the possibility of boosting deep 3D point cloud encoders bytransferring visual knowledge extracted from deep 2D image encoders under astandard teacher-student distillation workflow. Generally, we propose PointMCD,a unified multi-view cross-modal distillation architecture, including apretrained deep image encoder as the teacher and a deep point encoder as thestudent. To perform heterogeneous feature alignment between 2D visual and 3Dgeometric domains, we further investigate visibility-aware feature projection(VAFP), by which point-wise embeddings are reasonably aggregated intoview-specific geometric descriptors. By pair-wisely aligning multi-view visualand geometric descriptors, we can obtain more powerful deep point encoderswithout exhausting and complicated network modification. Experiments on 3Dshape classification, part segmentation, and unsupervised learning stronglyvalidate the effectiveness of our method. The code and data will be publiclyavailable at ", "output": "PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal Distillation for 3D Shape Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose a novel transfer-based targeted attack method thatoptimizes the adversarial perturbations without any extra training efforts forauxiliary networks on training data. Our new attack method is proposed based onthe observation that highly universal adversarial perturbations tend to be moretransferable for targeted attacks. Therefore, we propose to make theperturbation to be agnostic to different local regions within one image, whichwe called as self-universality. Instead of optimizing the perturbations ondifferent images, optimizing on different regions to achieve self-universalitycan get rid of using extra data. Specifically, we introduce a featuresimilarity loss that encourages the learned perturbations to be universal bymaximizing the feature similarity between adversarial perturbed global imagesand randomly cropped local regions. With the feature similarity loss, ourmethod makes the features from adversarial perturbations to be more dominantthan that of benign images, hence improving targeted transferability. We namethe proposed attack method as Self-Universality (SU) attack. Extensiveexperiments demonstrate that SU can achieve high success rates fortransfer-based targeted attacks. On ImageNet-compatible dataset, SU yields animprovement of 12% compared with existing state-of-the-art methods. Code isavailable at ", "output": "Enhancing the Self-Universality for Transferable Targeted Attacks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although analog semantic communication systems have received considerableattention in the literature, there is less work on digital semanticcommunication systems. In this paper, we develop a deep learning (DL)-enabledvector quantized (VQ) semantic communication system for image transmission,named VQ-DeepSC. Specifically, we propose a convolutional neural network(CNN)-based transceiver to extract multi-scale semantic features of images andintroduce multi-scale semantic embedding spaces to perform semantic featurequantization, rendering the data compatible with digital communication systems.Furthermore, we employ adversarial training to improve the quality of receivedimages by introducing a PatchGAN discriminator. Experimental resultsdemonstrate that the proposed VQ-DeepSC is more robustness than BPG in digitalcommunication systems and has comparable MS-SSIM performance to the DeepJSCCmethod.", "output": "Vector Quantized Semantic Communication System."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, neural distance functions trained via volumetric raymarching have been widely adopted for multi-view 3D reconstruction. Thesemethods, however, apply the ray marching procedure for the entire scene volume,leading to reduced sampling efficiency and, as a result, lower reconstructionquality in the areas of high-frequency details. In this work, we address thisproblem via joint training of the implicit function and our new coarsesphere-based surface reconstruction. We use the coarse representation toefficiently exclude the empty volume of the scene from the volumetric raymarching procedure without additional forward passes of the neural surfacenetwork, which leads to an increased fidelity of the reconstructions comparedto the base systems. We evaluate our approach by incorporating it into thetraining procedures of several implicit surface modeling methods and observeuniform improvements across both synthetic and real-world datasets. Ourcodebase can be accessed via the project page:", "output": "Sphere-Guided Training of Neural Implicit Surfaces."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Classifier-free guided diffusion models have recently been shown to be highlyeffective at high-resolution image generation, and they have been widely usedin large-scale diffusion frameworks including DALLE-2, Stable Diffusion andImagen. However, a downside of classifier-free guided diffusion models is thatthey are computationally expensive at inference time since they requireevaluating two diffusion models, a class-conditional model and an unconditionalmodel, tens to hundreds of times. To deal with this limitation, we propose anapproach to distilling classifier-free guided diffusion models into models thatare fast to sample from: Given a pre-trained classifier-free guided model, wefirst learn a single model to match the output of the combined conditional andunconditional models, and then we progressively distill that model to adiffusion model that requires much fewer sampling steps. For standard diffusionmodels trained on the pixel-space, our approach is able to generate imagesvisually comparable to that of the original model using as few as 4 samplingsteps on ImageNet 64x64 and CIFAR-10, achieving FID/IS scores comparable tothat of the original model while being up to 256 times faster to sample from.For diffusion models trained on the latent-space (e.g., Stable Diffusion), ourapproach is able to generate high-fidelity images using as few as 1 to 4denoising steps, accelerating inference by at least 10-fold compared toexisting methods on ImageNet 256x256 and LAION datasets. We further demonstratethe effectiveness of our approach on text-guided image editing and inpainting,where our distilled model is able to generate high-quality results using as fewas 2-4 denoising steps.", "output": "On Distillation of Guided Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a new task to benchmark scene understanding of embodied agents:Situated Question Answering in 3D Scenes (SQA3D). Given a scene context (e.g.,3D scan), SQA3D requires the tested agent to first understand its situation(position, orientation, etc.) in the 3D scene as described by text, then reasonabout its surrounding environment and answer a question under that situation.Based upon 650 scenes from ScanNet, we provide a dataset centered around 6.8kunique situations, along with 20.4k descriptions and 33.4k diverse reasoningquestions for these situations. These questions examine a wide spectrum ofreasoning capabilities for an intelligent agent, ranging from spatial relationcomprehension to commonsense understanding, navigation, and multi-hopreasoning. SQA3D imposes a significant challenge to current multi-modalespecially 3D reasoning models. We evaluate various state-of-the-art approachesand find that the best one only achieves an overall score of 47.20%, whileamateur human participants can reach 90.06%. We believe SQA3D could facilitatefuture embodied AI research with stronger situation understanding and reasoningcapability.", "output": "SQA3D: Situated Question Answering in 3D Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of iterative machine teaching, where a teachersequentially provides examples based on the status of a learner under adiscrete input space (i.e., a pool of finite samples), which greatly limits theteacher's capability. To address this issue, we study iterative teaching undera continuous input space where the input example (i.e., image) can be eithergenerated by solving an optimization problem or drawn directly from acontinuous distribution. Specifically, we propose data hallucination teaching(DHT) where the teacher can generate input data intelligently based on labels,the learner's status and the target concept. We study a number of challengingteaching setups (e.g., linear/neural learners in omniscient and black-boxsettings). Extensive empirical results verify the effectiveness of DHT.", "output": "Iterative Teaching by Data Hallucination."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a post-processor, called NeighborTrack, that leverages neighborinformation of the tracking target to validate and improve single-objecttracking (SOT) results. It requires no additional data or retraining. Instead,it uses the confidence score predicted by the backbone SOT network toautomatically derive neighbor information and then uses this information toimprove the tracking results. When tracking an occluded target, its appearancefeatures are untrustworthy. However, a general siamese network often cannottell whether the tracked object is occluded by reading the confidence scorealone, because it could be misled by neighbors with high confidence scores. Ourproposed NeighborTrack takes advantage of unoccluded neighbors' information toreconfirm the tracking target and reduces false tracking when the target isoccluded. It not only reduces the impact caused by occlusion, but also fixestracking problems caused by object appearance changes. NeighborTrack isagnostic to SOT networks and post-processing methods. For the VOT challengedataset commonly used in short-term object tracking, we improve three famousSOT networks, Ocean, TransT, and OSTrack, by an average of ${1.92%}$ EAO and${2.11%}$ robustness. For the mid- and long-term tracking experiments based onOSTrack, we achieve state-of-the-art ${72.25%}$ AUC on LaSOT and ${75.7%}$ AOon GOT-10K. Code duplication can be found in", "output": "NeighborTrack: Improving Single Object Tracking by Bipartite Matching with Neighbor Tracklets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose Token Turing Machines (TTM), a sequential, autoregressiveTransformer model with memory for real-world sequential visual understanding.Our model is inspired by the seminal Neural Turing Machine, and has an externalmemory consisting of a set of tokens which summarise the previous history(i.e., frames). This memory is efficiently addressed, read and written using aTransformer as the processing unit/controller at each step. The model's memorymodule ensures that a new observation will only be processed with the contentsof the memory (and not the entire history), meaning that it can efficientlyprocess long sequences with a bounded computational cost at each step. We showthat TTM outperforms other alternatives, such as other Transformer modelsdesigned for long sequences and recurrent neural networks, on two real-worldsequential visual understanding tasks: online temporal activity detection fromvideos and vision-based robot action policy learning.Code is publicly available at:", "output": "Token Turing Machines."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present Tensor4D, an efficient yet effective approach to dynamic scenemodeling. The key of our solution is an efficient 4D tensor decompositionmethod so that the dynamic scene can be directly represented as a 4Dspatio-temporal tensor. To tackle the accompanying memory issue, we decomposethe 4D tensor hierarchically by projecting it first into three time-awarevolumes and then nine compact feature planes. In this way, spatial informationover time can be simultaneously captured in a compact and memory-efficientmanner. When applying Tensor4D for dynamic scene reconstruction and rendering,we further factorize the 4D fields to different scales in the sense thatstructural motions and dynamic detailed changes can be learned from coarse tofine. The effectiveness of our method is validated on both synthetic andreal-world scenes. Extensive experiments show that our method is able toachieve high-quality dynamic reconstruction and rendering from sparse-viewcamera rigs or even a monocular camera. The code and dataset will be releasedat ", "output": "Tensor4D : Efficient Neural 4D Decomposition for High-fidelity Dynamic Reconstruction and Rendering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning continuous image representations is recently gaining popularity forimage super-resolution (SR) because of its ability to reconstructhigh-resolution images with arbitrary scales from low-resolution inputs.Existing methods mostly ensemble nearby features to predict the new pixel atany queried coordinate in the SR image. Such a local ensemble suffers from somelimitations: i) it has no learnable parameters and it neglects the similarityof the visual features; ii) it has a limited receptive field and cannotensemble relevant features in a large field which are important in an image. Toaddress these issues, this paper proposes a continuous implicitattention-in-attention network, called CiaoSR. We explicitly design an implicitattention network to learn the ensemble weights for the nearby local features.Furthermore, we embed a scale-aware attention in this implicit attentionnetwork to exploit additional non-local information. Extensive experiments onbenchmark datasets demonstrate CiaoSR significantly outperforms the existingsingle image SR methods with the same backbone. In addition, CiaoSR alsoachieves the state-of-the-art performance on the arbitrary-scale SR task. Theeffectiveness of the method is also demonstrated on the real-world SR setting.More importantly, CiaoSR can be flexibly integrated into any backbone toimprove the SR performance.", "output": "CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text-guided image editing can have a transformative impact in supportingcreative applications. A key challenge is to generate edits that are faithfulto input text prompts, while consistent with input images. We present ImagenEditor, a cascaded diffusion model built, by fine-tuning Imagen on text-guidedimage inpainting. Imagen Editor's edits are faithful to the text prompts, whichis accomplished by using object detectors to propose inpainting masks duringtraining. In addition, Imagen Editor captures fine details in the input imageby conditioning the cascaded pipeline on the original high resolution image. Toimprove qualitative and quantitative evaluation, we introduce EditBench, asystematic benchmark for text-guided image inpainting. EditBench evaluatesinpainting edits on natural and generated images exploring objects, attributes,and scenes. Through extensive human evaluation on EditBench, we find thatobject-masking during training leads to across-the-board improvements intext-image alignment -- such that Imagen Editor is preferred over DALL-E 2 andStable Diffusion -- and, as a cohort, these models are better atobject-rendering than text-rendering, and handle material/color/size attributesbetter than count/shape attributes.", "output": "Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Our goal with this survey is to provide an overview of the state of the artdeep learning technologies for face generation and editing. We will coverpopular latest architectures and discuss key ideas that make them work, such asinversion, latent representation, loss functions, training procedures, editingmethods, and cross domain style transfer. We particularly focus on GAN-basedarchitectures that have culminated in the StyleGAN approaches, which allowgeneration of high-quality face images and offer rich interfaces forcontrollable semantics editing and preserving photo quality. We aim to providean entry point into the field for readers that have basic knowledge about thefield of deep learning and are looking for an accessible introduction andoverview.", "output": "Face Generation and Editing with StyleGAN: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep convolutional neural networks (DCNNs) based remote sensing (RS) imagesemantic segmentation technology has achieved great success used in manyreal-world applications such as geographic element analysis. However, strongdependency on annotated data of specific scene makes it hard for DCNNs to fitdifferent RS scenes. To solve this problem, recent works gradually focus oncross-domain RS image semantic segmentation task. In this task, differentground sampling distance, remote sensing sensor variation and differentgeographical landscapes are three main factors causing dramatic domain shiftbetween source and target images. To decrease the negative influence of domainshift, we propose a self-training guided disentangled adaptation network(ST-DASegNet). We first propose source student backbone and target studentbackbone to respectively extract the source-style and target-style feature forboth source and target images. Towards the intermediate output feature maps ofeach backbone, we adopt adversarial learning for alignment. Then, we propose adomain disentangled module to extract the universal feature and purify thedistinct feature of source-style and target-style features. Finally, these twofeatures are fused and served as input of source student decoder and targetstudent decoder to generate final predictions. Based on our proposed domaindisentangled module, we further propose exponential moving average (EMA) basedcross-domain separated self-training mechanism to ease the instability anddisadvantageous effect during adversarial optimization. Extensive experimentsand analysis on benchmark RS datasets show that ST-DASegNet outperformsprevious methods on cross-domain RS image semantic segmentation task andachieves state-of-the-art (SOTA) results. Our code is available at", "output": "Self-Training Guided Disentangled Adaptation for Cross-Domain Remote Sensing Image Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce LYSTO, the Lymphocyte Assessment Hackathon, which was held inconjunction with the MICCAI 2019 Conference in Shenzen (China). The competitionrequired participants to automatically assess the number of lymphocytes, inparticular T-cells, in histopathological images of colon, breast, and prostatecancer stained with CD3 and CD8 immunohistochemistry. Differently from otherchallenges setup in medical image analysis, LYSTO participants were solelygiven a few hours to address this problem. In this paper, we describe the goaland the multi-phase organization of the hackathon; we describe the proposedmethods and the on-site results. Additionally, we present post-competitionresults where we show how the presented methods perform on an independent setof lung cancer slides, which was not part of the initial competition, as wellas a comparison on lymphocyte assessment between presented methods and a panelof pathologists. We show that some of the participants were capable to achievepathologist-level performance at lymphocyte assessment. After the hackathon,LYSTO was left as a lightweight plug-and-play benchmark dataset ongrand-challenge website, together with an automatic evaluation platform. LYSTOhas supported a number of research in lymphocyte assessment in oncology. LYSTOwill be a long-lasting educational challenge for deep learning and digitalpathology, it is available at ", "output": "LYSTO: The Lymphocyte Assessment Hackathon and Benchmark Dataset."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-target multi-camera tracking (MTMCT) of vehicles, i.e. trackingvehicles across multiple cameras, is a crucial application for the developmentof smart city and intelligent traffic system. The main challenges of MTMCT ofvehicles include the intra-class variability of the same vehicle andinter-class similarity between different vehicles and how to associate the samevehicle accurately across different cameras under large search space. Previousmethods for MTMCT usually use hierarchical clustering of trajectories toconduct cross camera association. However, the search space can be large anddoes not take spatial and temporal information into consideration. In thispaper, we proposed a transformer-based camera link model with spatial andtemporal filtering to conduct cross camera tracking. Achieving 73.68% IDF1 onthe Nvidia Cityflow V2 dataset test set, showing the effectiveness of ourcamera link model on multi-target multi-camera tracking.", "output": "Multi-target multi-camera vehicle tracking using transformer-based camera link model and spatial-temporal information."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper demonstrates an approach for learning highly semantic imagerepresentations without relying on hand-crafted data-augmentations. Weintroduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), anon-generative approach for self-supervised learning from images. The ideabehind I-JEPA is simple: from a single context block, predict therepresentations of various target blocks in the same image. A core designchoice to guide I-JEPA towards producing semantic representations is themasking strategy; specifically, it is crucial to (a) sample target blocks withsufficiently large scale (semantic), and to (b) use a sufficiently informative(spatially distributed) context block. Empirically, when combined with VisionTransformers, we find I-JEPA to be highly scalable. For instance, we train aViT-Huge/14 on ImageNet using 16 A100 GPUs in under 72 hours to achieve strongdownstream performance across a wide range of tasks, from linear classificationto object counting and depth prediction.", "output": "Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recognition of facial emotions is an essential aspect of humancommunication, allowing individuals to understand emotions conveyed by facialexpressions and vocal tones. The field of Facial Emotion Recognition (FER) isof great significance in the areas of computer vision and artificialintelligence, with vast commercial and academic potential in fields such assecurity, advertising, and entertainment. We propose a FER framework thatemploys Swin Vision Transformers (SwinT) and squeeze and excitation block (SE)to address vision tasks. The approach uses a transformer model with anattention mechanism, SE, and SAM to improve the efficiency of the model, astransformers often require a large amount of data. Our focus was to create anefficient FER model based on SwinT architecture that can recognize facialemotions using minimal data. We trained our model on a hybrid dataset andevaluated its performance on the AffectNet dataset, achieving an F1-score of0.5420, which surpassed the winner of the Affective Behavior Analysis in theWild (ABAW) Competition held at the European Conference on Computer Vision(ECCV) 2022", "output": "Facial Expression Recognition using Squeeze and Excitation-powered Swin Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Remote sensing hyperspectral and more generally spectral instruments arecommon tools to decipher surface features in Earth and Planetary science. Whilelinear mixture is the most common approximation for compounds detection(mineral, water, ice, etc...), the transfer of light in surface and atmosphericmedium are highly non-linear. The exact simulation of non-linearities can beestimated at very high numerical cost. Here I propose a very simple non-linearform (that includes the regular linear area mixture) of radiative transfer toapproximate surface spectral feature. I demonstrate that this analytical formis able to approximate the grain size and intimate mixture dependence ofsurface features. In addition, the same analytical form can approximate theeffect of Martian mineral aerosols. Unfortunately, Earth aerosols are morecomplex (water droplet, water ice, soot,...) and are not expected to follow thesame trend.", "output": "Approximation of radiative transfer for surface spectral features."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transfer learning has been widely utilized to mitigate the data scarcityproblem in the field of Alzheimer's disease (AD). Conventional transferlearning relies on re-using models trained on AD-irrelevant tasks such asnatural image classification. However, it often leads to negative transfer dueto the discrepancy between the non-medical source and target medical domains.To address this, we present evidence-empowered transfer learning for ADdiagnosis. Unlike conventional approaches, we leverage an AD-relevant auxiliarytask, namely morphological change prediction, without requiring additional MRIdata. In this auxiliary task, the diagnosis model learns the evidential andtransferable knowledge from morphological features in MRI scans. Experimentalresults demonstrate that our framework is not only effective in improvingdetection performance regardless of model capacity, but also moredata-efficient and faithful.", "output": "Evidence-empowered Transfer Learning for Alzheimer's Disease."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Local feature matching is challenging due to textureless and repetitivepatterns. Existing methods focus on using appearance features and globalinteraction and matching, while the importance of geometry priors in localfeature matching has not been fully exploited. Different from these methods, inthis paper, we delve into the importance of geometry prior and proposeStructured Epipolar Matcher (SEM) for local feature matching, which canleverage the geometric information in an iterative matching way. The proposedmodel enjoys several merits. First, our proposed Structured Feature Extractorcan model the relative positional relationship between pixels andhigh-confidence anchor points. Second, our proposed Epipolar Attention andMatching can filter out irrelevant areas by utilizing the epipolar constraint.Extensive experimental results on five standard benchmarks demonstrate thesuperior performance of our SEM compared to state-of-the-art methods. Projectpage: ", "output": "Structured Epipolar Matcher for Local Feature Matching."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The robustness of 3D perception systems under natural corruptions fromenvironments and sensors is pivotal for safety-critical applications. Existinglarge-scale 3D perception datasets often contain data that are meticulouslycleaned. Such configurations, however, cannot reflect the reliability ofperception models during the deployment stage. In this work, we present Robo3D,the first comprehensive benchmark heading toward probing the robustness of 3Ddetectors and segmentors under out-of-distribution scenarios against naturalcorruptions that occur in real-world environments. Specifically, we considereight corruption types stemming from adversarial weather conditions, externaldisturbances, and internal sensor failure. We uncover that, although promisingresults have been progressively achieved on standard benchmarks,state-of-the-art 3D perception models are at risk of being vulnerable tocorruptions. We draw key observations on the use of data representations,augmentation schemes, and training strategies, that could severely affect themodel's performance. To pursue better robustness, we propose adensity-insensitive training framework along with a simple flexiblevoxelization strategy to enhance the model resiliency. We hope our benchmarkand approach could inspire future research in designing more robust andreliable 3D perception models. Our robustness benchmark suite is publiclyavailable.", "output": "Robo3D: Towards Robust and Reliable 3D Perception against Corruptions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large-scale text-to-image diffusion models achieve unprecedented success inimage generation and editing. However, how to extend such success to videoediting is unclear. Recent initial attempts at video editing requiresignificant text-to-video data and computation resources for training, which isoften not accessible. In this work, we propose vid2vid-zero, a simple yeteffective method for zero-shot video editing. Our vid2vid-zero leveragesoff-the-shelf image diffusion models, and doesn't require training on anyvideo. At the core of our method is a null-text inversion module fortext-to-video alignment, a cross-frame modeling module for temporalconsistency, and a spatial regularization module for fidelity to the originalvideo. Without any training, we leverage the dynamic nature of the attentionmechanism to enable bi-directional temporal modeling at test time. Experimentsand analyses show promising results in editing attributes, subjects, places,etc., in real-world videos. Code is made available aturl{", "output": "Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning image classification and image generation using the same set ofnetwork parameters is a challenging problem. Recent advanced approaches performwell in one task often exhibit poor performance in the other. This workintroduces an energy-based classifier and generator, namely EGC, which canachieve superior performance in both tasks using a single neural network.Unlike a conventional classifier that outputs a label given an image (i.e., aconditional distribution $p(y|mathbf{x})$), the forward pass in EGC is aclassifier that outputs a joint distribution $p(mathbf{x},y)$, enabling animage generator in its backward pass by marginalizing out the label $y$. Thisis done by estimating the energy and classification probability given a noisyimage in the forward pass, while denoising it using the score functionestimated in the backward pass. EGC achieves competitive generation resultscompared with state-of-the-art approaches on ImageNet-1k, CelebA-HQ and LSUNChurch, while achieving superior classification accuracy and robustness againstadversarial attacks on CIFAR-10. This work represents the first successfulattempt to simultaneously excel in both tasks using a single set of networkparameters. We believe that EGC bridges the gap between discriminative andgenerative learning.", "output": "EGC: Image Generation and Classification via a Diffusion Energy-Based Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spatial attention has been widely used to improve the performance ofconvolutional neural networks by allowing them to focus on importantinformation. However, it has certain limitations. In this paper, we propose anew perspective on the effectiveness of spatial attention, which is that it cansolve the problem of convolutional kernel parameter sharing. Despite this, theinformation contained in the attention map generated by spatial attention isnot sufficient for large-size convolutional kernels. Therefore, we introduce anew attention mechanism called Receptive-Field Attention (RFA). While previousattention mechanisms such as the Convolutional Block Attention Module (CBAM)and Coordinate Attention (CA) only focus on spatial features, they cannot fullyaddress the issue of convolutional kernel parameter sharing. In contrast, RFAnot only focuses on the receptive-field spatial feature but also provideseffective attention weights for large-size convolutional kernels. TheReceptive-Field Attention convolutional operation (RFAConv), developed by RFA,represents a new approach to replace the standard convolution operation. Itoffers nearly negligible increment of computational cost and parameters, whilesignificantly improving network performance. We conducted a series ofexperiments on ImageNet-1k, MS COCO, and VOC datasets, which demonstrated thesuperiority of our approach in various tasks including classification, objectdetection, and semantic segmentation. Of particular importance, we believe thatit is time to shift focus from spatial features to receptive-field spatialfeatures for current spatial attention mechanisms. By doing so, we can furtherimprove network performance and achieve even better results. The code andpre-trained models for the relevant tasks can be found at", "output": "RFAConv: Innovating Spatital Attention and Standard Convolutional Operation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video captioning aims to convey dynamic scenes from videos using naturallanguage, facilitating the understanding of spatiotemporal information withinour environment. Although there have been recent advances, generating detailedand enriched video descriptions continues to be a substantial challenge. Inthis work, we introduce Video ChatCaptioner, an innovative approach forcreating more comprehensive spatiotemporal video descriptions. Our methodemploys a ChatGPT model as a controller, specifically designed to select framesfor posing video content-driven questions. Subsequently, a robust algorithm isutilized to answer these visual queries. This question-answer frameworkeffectively uncovers intricate video details and shows promise as a method forenhancing video content. Following multiple conversational rounds, ChatGPT cansummarize enriched video content based on previous conversations. Wequalitatively demonstrate that our Video ChatCaptioner can generate captionscontaining more visual details about the videos. The code is publicly availableat ", "output": "Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Avoiding the introduction of ghosts when synthesising LDR images as highdynamic range (HDR) images is a challenging task. Convolutional neural networks(CNNs) are effective for HDR ghost removal in general, but are challenging todeal with the LDR images if there are large movements oroversaturation/undersaturation. Existing dual-branch methods combining CNN andTransformer omit part of the information from non-reference images, while thefeatures extracted by the CNN-based branch are bound to the kernel size withsmall receptive field, which are detrimental to the deblurring and the recoveryof oversaturated/undersaturated regions. In this paper, we propose a novelhierarchical dual Transformer method for ghost-free HDR (HDT-HDR) imagesgeneration, which extracts global features and local features simultaneously.First, we use a CNN-based head with spatial attention mechanisms to extractfeatures from all the LDR images. Second, the LDR features are delivered to theHierarchical Dual Transformer (HDT). In each Dual Transformer (DT), the globalfeatures are extracted by the window-based Transformer, while the local detailsare extracted using the channel attention mechanism with deformable CNNs.Finally, the ghost free HDR image is obtained by dimensional mapping on the HDToutput. Abundant experiments demonstrate that our HDT-HDR achieves thestate-of-the-art performance among existing HDR ghost removal methods.", "output": "High Dynamic Range Imaging with Context-aware Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "High dynamic range (HDR) video reconstruction is attracting more and moreattention due to the superior visual quality compared with those of low dynamicrange (LDR) videos. The availability of LDR-HDR training pairs is essential forthe HDR reconstruction quality. However, there are still no real LDR-HDR pairsfor dynamic scenes due to the difficulty in capturing LDR-HDR framessimultaneously. In this work, we propose to utilize a staggered sensor tocapture two alternate exposure images simultaneously, which are then fused intoan HDR frame in both raw and sRGB domains. In this way, we build a large scaleLDR-HDR video dataset with 85 scenes and each scene contains 60 frames. Basedon this dataset, we further propose a Raw-HDRNet, which utilizes the raw LDRframes as inputs. We propose a pyramid flow-guided deformation convolution toalign neighboring frames. Experimental results demonstrate that 1) the proposeddataset can improve the HDR reconstruction performance on real scenes for threebenchmark networks; 2) Compared with sRGB inputs, utilizing raw inputs canfurther improve the reconstruction quality and our proposed Raw-HDRNet is astrong baseline for raw HDR reconstruction. Our dataset and code will bereleased after the acceptance of this paper.", "output": "HDR Video Reconstruction with a Large Dynamic Dataset in Raw and sRGB Domains."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-object tracking in sports scenes plays a critical role in gatheringplayers statistics, supporting further analysis, such as automatic tacticalanalysis. Yet existing MOT benchmarks cast little attention on the domain,limiting its development. In this work, we present a new large-scalemulti-object tracking dataset in diverse sports scenes, coined asemph{SportsMOT}, where all players on the court are supposed to be tracked. Itconsists of 240 video sequences, over 150K frames (almost 15times MOT17) andover 1.6M bounding boxes (3times MOT17) collected from 3 sports categories,including basketball, volleyball and football. Our dataset is characterizedwith two key properties: 1) fast and variable-speed motion and 2) similar yetdistinguishable appearance. We expect SportsMOT to encourage the MOT trackersto promote in both motion-based association and appearance-based association.We benchmark several state-of-the-art trackers and reveal the key challenge ofSportsMOT lies in object association. To alleviate the issue, we furtherpropose a new multi-object tracking framework, termed as emph{MixSort},introducing a MixFormer-like structure as an auxiliary association model toprevailing tracking-by-detection trackers. By integrating the customizedappearance-based association with the original motion-based association,MixSort achieves state-of-the-art performance on SportsMOT and MOT17. Based onMixSort, we give an in-depth analysis and provide some profound insights intoSportsMOT. The dataset and code will be available at", "output": "SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, Meta AI Research approaches a general, promptable Segment AnythingModel (SAM) pre-trained on an unprecedentedly large segmentation dataset(SA-1B). Without a doubt, the emergence of SAM will yield significant benefitsfor a wide array of practical image segmentation applications. In this study,we conduct a series of intriguing investigations into the performance of SAMacross various applications, particularly in the fields of natural images,agriculture, manufacturing, remote sensing, and healthcare. We analyze anddiscuss the benefits and limitations of SAM and provide an outlook on futuredevelopment of segmentation tasks. Note that our work does not intend topropose new algorithms or theories, but rather provide a comprehensive view ofSAM in practice. This work is expected to provide insights that facilitatefuture research activities toward generic segmentation.", "output": "Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world Applications."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The development of approaches for trajectory prediction requires metrics tovalidate and compare their performance. Currently established metrics are basedon Euclidean distance, which means that errors are weighted equally in alldirections. Euclidean metrics are insufficient for structured environments likeroads, since they do not properly capture the agent's intent relative to theunderlying lane. In order to provide a reasonable assessment of trajectoryprediction approaches with regard to the downstream planning task, we propose anew metric that is lane distance-based: Lane Miss Rate (LMR). For thecalculation of LMR, the ground-truth and predicted endpoints are assigned tolane segments, more precisely their centerlines. Measured by the distance alongthe lane segments, predictions that are within a certain threshold distance tothe ground-truth count as hits, otherwise they count as misses. LMR is thendefined as the ratio of sequences that yield a miss. Our results on threestate-of-the-art trajectory prediction models show that LMR preserves the orderof Euclidean distance-based metrics. In contrast to the Euclidean Miss Rate,qualitative results show that LMR yields misses for sequences where predictionsare located on wrong lanes. Hits on the other hand result for sequences wherepredictions are located on the correct lane. This means that LMR implicitlyweights Euclidean error relative to the lane and goes into the direction ofcapturing intents of traffic agents. The source code of LMR for Argoverse 2 ispublicly available.", "output": "LMR: Lane Distance-Based Metric for Trajectory Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present ImageReward -- the first general-purpose text-to-image humanpreference reward model -- to address various prevalent issues in generativemodels and align them with human values and preferences. Its training is basedon our systematic annotation pipeline that covers both the rating and rankingcomponents, collecting a dataset of 137k expert comparisons to date. In humanevaluation, ImageReward outperforms existing scoring methods (e.g., CLIP by38.6%), making it a promising automatic metric for evaluating and improvingtext-to-image synthesis. The reward model is publicly available via thetexttt{image-reward} package at url{", "output": "ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The swift and precise detection of vehicles holds significant researchsignificance in intelligent transportation systems (ITS). However, currentvehicle detection algorithms encounter challenges such as high computationalcomplexity, low detection rate, and limited feasibility on mobile devices. Toaddress these issues, this paper proposes a lightweight vehicle detectionalgorithm for YOLOv7-tiny called Ghost-YOLOv7. The model first scales the widthmultiple to 0.5 and replaces the standard convolution of the backbone networkwith Ghost convolution to achieve a lighter network and improve the detectionspeed; secondly, a Ghost bi-directional feature pyramid network (Ghost-BiFPN)neck network is designed to enhance feature extraction capability of thealgorithm and enrich semantic information; thirdly, a Ghost Decouoled Head(GDH) is employed for accurate prediction of vehicle location and class,enhancing model accuracy; finally, a coordinate attention mechanism isintroduced in the output layer to suppress environmental interference, and theWIoU loss function is employed to enhance the detection accuracy further.Experimental results on the PASCAL VOC dataset demonstrate that Ghost-YOLOv7outperforms the original YOLOv7-tiny model, achieving a 29.8% reduction incomputation, 37.3% reduction in the number of parameters, 35.1% reduction inmodel weights, and 1.1% higher mean average precision (mAP), while achieving adetection speed of 428 FPS. These results validate the effectiveness of theproposed method.", "output": "Fast vehicle detection algorithm based on lightweight YOLO7-tiny."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advancements in areas such as natural language processing and computervision rely on intricate and massive models that have been trained using vastamounts of unlabelled or partly labeled data and training or deploying thesestate-of-the-art methods to resource constraint environments has been achallenge. Galaxy morphologies are crucial to understanding the processes bywhich galaxies form and evolve. Efficient methods to classify galaxymorphologies are required to extract physical information from modern-dayastronomy surveys. In this paper, we introduce methods to learn from lessamounts of data. We propose using a hybrid transformer-convolutionalarchitecture drawing much inspiration from the success of CoAtNet and MaxViT.Concretely, we use the transformer-convolutional hybrid with a new stack designfor the network, a different way of creating a relative self-attention layer,and pair it with a careful selection of data augmentation and regularizationtechniques. Our approach sets a new state-of-the-art on predicting galaxymorphologies from images on the Galaxy10 DECals dataset, a science objective,which consists of 17736 labeled images achieving $94.86%$ top-$1$ accuracy,beating the current state-of-the-art for this task by $4.62%$. Furthermore,this approach also sets a new state-of-the-art on CIFAR-100 and Tiny ImageNet.We also find that models and training methods used for larger datasets wouldoften not work very well in the low-data regime. Our code and models will bereleased at a later date before the conference.", "output": "Astroformer: More Data Might Not be All You Need for Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper clarifies why bias cannot be completely mitigated in MachineLearning (ML) and proposes an end-to-end methodology to translate the ethicalprinciple of justice and fairness into the practice of ML development as anongoing agreement with stakeholders. The pro-ethical iterative processpresented in the paper aims to challenge asymmetric power dynamics in thefairness decision making within ML design and support ML development teams toidentify, mitigate and monitor bias at each step of ML systems development. Theprocess also provides guidance on how to explain the always imperfecttrade-offs in terms of bias to users.", "output": "Fairness: from the ethical principle to the practice of Machine Learning development as an ongoing agreement with stakeholders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning models for scoring sleep stages based on single-channel EEGhave been proposed as a promising method for remote sleep monitoring. However,applying these models to new datasets, particularly from wearable devices,raises two questions. First, when annotations on a target dataset areunavailable, which different data characteristics affect the sleep stagescoring performance the most and by how much? Second, when annotations areavailable, which dataset should be used as the source of transfer learning tooptimize performance? In this paper, we propose a novel method forcomputationally quantifying the impact of different data characteristics on thetransferability of deep learning models. Quantification is accomplished bytraining and evaluating two models with significant architectural differences,TinySleepNet and U-Time, under various transfer configurations in which thesource and target datasets have different recording channels, recordingenvironments, and subject conditions. For the first question, the environmenthad the highest impact on sleep stage scoring performance, with performancedegrading by over 14% when sleep annotations were unavailable. For the secondquestion, the most useful transfer sources for TinySleepNet and the U-Timemodels were MASS-SS1 and ISRUC-SG1, containing a high percentage of N1 (therarest sleep stage) relative to the others. The frontal and central EEGs werepreferred for TinySleepNet. The proposed approach enables full utilization ofexisting sleep datasets for training and planning model transfer to maximizethe sleep stage scoring performance on a target problem when sleep annotationsare limited or unavailable, supporting the realization of remote sleepmonitoring.", "output": "Quantifying the Impact of Data Characteristics on the Transferability of Sleep Stage Scoring Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reinforcement learning (RL) is a branch of machine learning that has beenused in a variety of applications such as robotics, game playing, andautonomous systems. In recent years, there has been growing interest inapplying RL to quantitative trading, where the goal is to make profitabletrades in financial markets. This paper explores the use of RL in quantitativetrading and presents a case study of a RL-based trading algorithm. The resultsshow that RL can be a powerful tool for quantitative trading, and that it hasthe potential to outperform traditional trading algorithms. The use ofreinforcement learning in quantitative trading represents a promising area ofresearch that can potentially lead to the development of more sophisticated andeffective trading systems. Future work could explore the use of alternativereinforcement learning algorithms, incorporate additional data sources, andtest the system on different asset classes. Overall, our research demonstratesthe potential of using reinforcement learning in quantitative trading andhighlights the importance of continued research and development in this area.By developing more sophisticated and effective trading systems, we canpotentially improve the efficiency of financial markets and generate greaterreturns for investors.", "output": "Quantitative Trading using Deep Q Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Wearable devices for seizure monitoring detection could significantly improvethe quality of life of epileptic patients. However, existing solutions thatmostly rely on full electrode set of electroencephalogram (EEG) measurementscould be inconvenient for every day use. In this paper, we propose a novelknowledge distillation approach to transfer the knowledge from a sophisticatedseizure detector (called the teacher) trained on data from the full set ofelectrodes to learn new detectors (called the student). They are both providinglightweight implementations and significantly reducing the number of electrodesneeded for recording the EEG. We consider the case where the teacher and thestudent seizure detectors are graph neural networks (GNN), since thesearchitectures actively use the connectivity information. We consider two cases(a) when a single student is learnt for all the patients using preselectedchannels; and (b) when personalized students are learnt for every individualpatient, with personalized channel selection using a Gumbelsoftmax approach.Our experiments on the publicly available Temple University Hospital EEGSeizure Data Corpus (TUSZ) show that both knowledge-distillation andpersonalization play significant roles in improving performance of seizuredetection, particularly for patients with scarce EEG data. We observe thatusing as few as two channels, we are able to obtain competitive seizuredetection performance. This, in turn, shows the potential of our approach inmore realistic scenario of wearable devices for personalized monitoring ofseizures, even with few recordings.", "output": "Knowledge-Distilled Graph Neural Networks for Personalized Epileptic Seizure Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The proliferation of multi-unit cortical recordings over the last twodecades, especially in macaques and during motor-control tasks, has generatedinterest in neural \"population dynamics\": the time evolution of neural activityacross a group of neurons working together. A good model of these dynamicsshould be able to infer the activity of unobserved neurons within the samepopulation and of the observed neurons at future times. Accordingly,Pandarinath and colleagues have introduced a benchmark to evaluate models onthese two (and related) criteria: four data sets, each consisting of firingrates from a population of neurons, recorded from macaque cortex duringmovement-related tasks. Here we show that simple, general-purpose architecturesbased on recurrent neural networks (RNNs) outperform more \"bespoke\" models, andindeed outperform all published models on all four data sets in the benchmark.Performance can be improved further still with a novel, hybrid architecturethat augments the RNN with self-attention, as in transformer networks. But puretransformer models fail to achieve this level of performance, either in ourwork or that of other groups. We argue that the autoregressive bias imposed byRNNs is critical for achieving the highest levels of performance. We conclude,however, by proposing that the benchmark be augmented with an alternativeevaluation of latent dynamics that favors generative over discriminative modelslike the ones we propose in this report.", "output": "Inferring Population Dynamics in Macaque Cortex."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Next generation cars embed intelligent assessment of car driving safetythrough innovative solutions often based on usage of artificial intelligence.The safety driving monitoring can be carried out using several methodologieswidely treated in scientific literature. In this context, the author proposesan innovative approach that uses ad-hoc bio-sensing system suitable toreconstruct the physio-based attentional status of the car driver. Toreconstruct the car driver physiological status, the author proposed the use ofa bio-sensing probe consisting of a coupled LEDs at Near infrared (NiR)spectrum with a photodetector. This probe placed over the monitored subjectallows to detect a physiological signal called PhotoPlethysmoGraphy (PPG). ThePPG signal formation is regulated by the change in oxygenated andnon-oxygenated hemoglobin concentration in the monitored subject bloodstreamwhich will be directly connected to cardiac activity in turn regulated by theAutonomic Nervous System (ANS) that characterizes the subject's attentionlevel. This so designed car driver drowsiness monitoring will be combined withfurther driving safety assessment based on correlated intelligent drivingscenario understanding.", "output": "Deep Learning Systems for Advanced Driving Assistance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent surge in the number of Electric Vehicles have created a need todevelop inexpensive energy-dense Battery Storage Systems. Many countries acrossthe planet have put in place concrete measures to reduce and subsequently limitthe number of vehicles powered by fossil fuels. Lithium-ion based batteries arepresently dominating the electric automotive sector. Energy research effortsare also focussed on accurate computation of State-of-Charge of such batteriesto provide reliable vehicle range estimates. Although such estimationalgorithms provide precise estimates, all such techniques available inliterature presume availability of superior quality battery datasets. Inreality, gaining access to proprietary battery usage datasets is very tough forbattery scientists. Moreover, open access datasets lack the diverse batterycharge/discharge patterns needed to build generalized models. Curating batterymeasurement data is time consuming and needs expensive equipment. To surmountsuch limited data scenarios, we introduce few Deep Learning-based methods tosynthesize high-fidelity battery datasets, these augmented synthetic datasetswill help battery researchers build better estimation models in the presence oflimited data. We have released the code and dataset used in the presentapproach to generate synthetic data. The battery data augmentation techniquesintroduced here will alleviate limited battery dataset challenges.", "output": "A Deep Learning Approach Towards Generating High-fidelity Diverse Synthetic Battery Datasets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We applied physics-informed neural networks to solve the constitutiverelations for nonlinear, path-dependent material behavior. As a result, thetrained network not only satisfies all thermodynamic constraints but alsoinstantly provides information about the current material state (i.e., freeenergy, stress, and the evolution of internal variables) under any givenloading scenario without requiring initial data. One advantage of this work isthat it bypasses the repetitive Newton iterations needed to solve nonlinearequations in complex material models. Additionally, strategies are provided toreduce the required order of derivation for obtaining the tangent operator. Thetrained model can be directly used in any finite element package (or othernumerical methods) as a user-defined material model. However, challenges remainin the proper definition of collocation points and in integrating severalnon-equality constraints that become active or non-active simultaneously. Wetested this methodology on rate-independent processes such as the classical vonMises plasticity model with a nonlinear hardening law, as well as local damagemodels for interface cracking behavior with a nonlinear softening law. Finally,we discuss the potential and remaining challenges for future developments ofthis new approach.", "output": "Learning solution of nonlinear constitutive material models using physics-informed neural networks: COMM-PINN."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Combinatorial optimization (CO) aims to efficiently find the best solution toNP-hard problems ranging from statistical physics to social media marketing. Awide range of CO applications can benefit from local search methods becausethey allow reversible action over greedy policies. Deep Q-learning (DQN) usingmessage-passing neural networks (MPNN) has shown promise in replicating thelocal search behavior and obtaining comparable results to the local searchalgorithms. However, the over-smoothing and the information loss during theiterations of message passing limit its robustness across applications, and thelarge message vectors result in memory inefficiency. Our paper introducesRELS-DQN, a lightweight DQN framework that exhibits the local search behaviorwhile providing practical scalability. Using the RELS-DQN model trained on oneapplication, it can generalize to various applications by providing solutionvalues higher than or equal to both the local search algorithms and theexisting DQN models while remaining efficient in runtime and memory.", "output": "RELS-DQN: A Robust and Efficient Local Search Framework for Combinatorial Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Over the past decade, neural network (NN)-based controllers have demonstratedremarkable efficacy in a variety of decision-making tasks. However, theirblack-box nature and the risk of unexpected behaviors and surprising resultspose a challenge to their deployment in real-world systems with strongguarantees of correctness and safety. We address these limitations byinvestigating the transformation of NN-based controllers into equivalent softdecision tree (SDT)-based controllers and its impact on verifiability.Differently from previous approaches, we focus on discrete-output NNcontrollers including rectified linear unit (ReLU) activation functions as wellas argmax operations. We then devise an exact but cost-effective transformationalgorithm, in that it can automatically prune redundant branches. We evaluateour approach using two benchmarks from the OpenAI Gym environment. Our resultsindicate that the SDT transformation can benefit formal verification, showingruntime improvements of up to 21x and 2x for MountainCar-v0 and CartPole-v0,respectively.", "output": "Exact and Cost-Effective Automated Transformation of Neural Network Controllers to Decision Tree Controllers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the continuous improvement of computing power and deep learningalgorithms in recent years, the foundation model has grown in popularity.Because of its powerful capabilities and excellent performance, this technologyis being adopted and applied by an increasing number of industries. In theintelligent transportation industry, artificial intelligence faces thefollowing typical challenges: few shots, poor generalization, and a lack ofmulti-modal techniques. Foundation model technology can significantly alleviatethe aforementioned issues. To address these, we designed the 1st FoundationModel Challenge, with the goal of increasing the popularity of foundation modeltechnology in traffic scenarios and promoting the rapid development of theintelligent transportation industry. The challenge is divided into two tracks:all-in-one and cross-modal image retrieval. Furthermore, we provide a newbaseline and benchmark for the two tracks, called Open-TransMind. According toour knowledge, Open-TransMind is the first open-source transportationfoundation model with multi-task and multi-modal capabilities. Simultaneously,Open-TransMind can achieve state-of-the-art performance on detection,classification, and segmentation datasets of traffic scenarios. Our source codeis available at ", "output": "Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent Transportation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deploying deep learning models in real-world certified systems requires theability to provide confidence estimates that accurately reflect theiruncertainty. In this paper, we demonstrate the use of the conformal predictionframework to construct reliable and trustworthy predictors for detectingrailway signals. Our approach is based on a novel dataset that includes imagestaken from the perspective of a train operator and state-of-the-art objectdetectors. We test several conformal approaches and introduce a new methodbased on conformal risk control. Our findings demonstrate the potential of theconformal prediction framework to evaluate model performance and providepractical guidance for achieving formally guaranteed uncertainty bounds.", "output": "Confident Object Detection via Conformal Prediction and Conformal Risk Control: an Application to Railway Signaling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Landslide susceptibility prediction has always been an important andchallenging content. However, there are some uncertain problems to be solved insusceptibility modeling, such as the error of landslide samples and the complexnonlinear relationship between environmental factors. A self-screening graphconvolutional network and long short-term memory network (SGCN-LSTM) isproposed int this paper to overcome the above problems in landslidesusceptibility prediction. The SGCN-LSTM model has the advantages of wide widthand good learning ability. The landslide samples with large errors outside theset threshold interval are eliminated by self-screening network, and thenonlinear relationship between environmental factors can be extracted from bothspatial nodes and time series, so as to better simulate the nonlinearrelationship between environmental factors. The SGCN-LSTM model was applied tolandslide susceptibility prediction in Anyuan County, Jiangxi Province, China,and compared with Cascade-parallel Long Short-Term Memory and ConditionalRandom Fields (CPLSTM-CRF), Random Forest (RF), Support Vector Machine (SVM),Stochastic Gradient Descent (SGD) and Logistic Regression (LR) models.Thelandslide prediction experiment in Anyuan County showed that the total accuracyand AUC of SGCN-LSTM model were the highest among the six models, and the totalaccuracy reached 92.38 %, which was 5.88%, 12.44%, 19.65%, 19.92% and 20.34%higher than those of CPLSTM-CRF, RF, SVM, SGD and LR models, respectively. TheAUC value reached 0.9782, which was 0.0305,0.0532,0.1875,0.1909 and 0.1829higher than the other five models, respectively. In conclusion, compared withsome existing traditional machine learning, the SGCN-LSTM model proposed inthis paper has higher landslide prediction accuracy and better robustness, andhas a good application prospect in the LSP field.", "output": "Landslide Susceptibility Prediction Modeling Based on Self-Screening Deep Learning Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Fairness in AI has garnered quite some attention in research, andincreasingly also in society. The so-called \"Impossibility Theorem\" has beenone of the more striking research results with both theoretical and practicalconsequences, as it states that satisfying a certain combination of fairnessmeasures is impossible. To date, this negative result has not yet beencomplemented with a positive one: a characterization of which combinations offairness notions are possible. This work aims to fill this gap by identifyingmaximal sets of commonly used fairness measures that can be simultaneouslysatisfied. The fairness measures used are demographic parity, equalopportunity, false positive parity, predictive parity, predictive equality,overall accuracy equality and treatment equality. We conclude that in total 12maximal sets of these fairness measures are possible, among which sevencombinations of two measures, and five combinations of three measures. Our workraises interest questions regarding the practical relevance of each of these 12maximal fairness notions in various scenarios.", "output": "Maximal Fairness."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Ultra-low-resolution Infrared (IR) array sensors offer a low-cost,energy-efficient, and privacy-preserving solution for people counting, withapplications such as occupancy monitoring. Previous work has shown that DeepLearning (DL) can yield superior performance on this task. However, theliterature was missing an extensive comparative analysis of various efficientDL architectures for IR array-based people counting, that considers not onlytheir accuracy, but also the cost of deploying them on memory- andenergy-constrained Internet of Things (IoT) edge nodes. In this work, weaddress this need by comparing 6 different DL architectures on a novel datasetcomposed of IR images collected from a commercial 8x8 array, which we madeopenly available. With a wide architectural exploration of each model type, weobtain a rich set of Pareto-optimal solutions, spanning cross-validatedbalanced accuracy scores in the 55.70-82.70% range. When deployed on acommercial Microcontroller (MCU) by STMicroelectronics, the STM32L4A6ZG, thesemodels occupy 0.41-9.28kB of memory, and require 1.10-7.74ms per inference,while consuming 17.18-120.43 $mu$J of energy. Our models are significantlymore accurate than a previous deterministic method (up to +39.9%), while beingup to 3.53x faster and more energy efficient. Further, our models' accuracy iscomparable to state-of-the-art DL solutions on similar resolution sensors,despite a much lower complexity. All our models enable continuous, real-timeinference on a MCU-based IoT node, with years of autonomous operation withoutbattery recharging.", "output": "Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Energy-Based Models (EBMs) are known in the Machine Learning community forthe decades. Since the seminal works devoted to EBMs dating back to thenoughties there have been appearing a lot of efficient methods which solve thegenerative modelling problem by means of energy potentials (unnormalizedlikelihood functions). In contrast, the realm of Optimal Transport (OT) and, inparticular, neural OT solvers is much less explored and limited by few recentworks (excluding WGAN based approaches which utilize OT as a loss function anddo not model OT maps themselves). In our work, we bridge the gap between EBMsand Entropy-regularized OT. We present the novel methodology which allowsutilizing the recent developments and technical improvements of the former inorder to enrich the latter. We validate the applicability of our method on toy2D scenarios as well as standard unpaired image-to-image translation problems.For the sake of simplicity, we choose simple short- and long- run EBMs as abackbone of our Energy-guided Entropic OT method, leaving the application ofmore sophisticated EBMs for future research.", "output": "Energy-guided Entropic Neural Optimal Transport."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "N-body simulations are the most powerful method to study the non-linearevolution of large-scale structure. However, they require large amounts ofcomputational resources, making unfeasible their direct adoption in scenariosthat require broad explorations of parameter spaces. In this work, we show thatit is possible to perform fast dark matter density field emulations withcompetitive accuracy using simple machine-learning approaches. We build anemulator based on dimensionality reduction and machine learning regressioncombining simple Principal Component Analysis and supervised learning methods.For the estimations with a single free parameter, we train on the dark matterdensity parameter, $Omega_m$, while for emulations with two free parameters,we train on a range of $Omega_m$ and redshift. The method first adopts aprojection of a grid of simulations on a given basis; then, a machine learningregression is trained on this projected grid. Finally, new density cubes fordifferent cosmological parameters can be estimated without relying directly onnew N-body simulations by predicting and de-projecting the basis coefficients.We show that the proposed emulator can generate density cubes at non-linearcosmological scales with density distributions within a few percent compared tothe corresponding N-body simulations. The method enables gains of three ordersof magnitude in CPU run times compared to performing a full N-body simulationwhile reproducing the power spectrum and bispectrum within $sim 1%$ and $sim3%$, respectively, for the single free parameter emulation and $sim 5%$ and$sim 15%$ for two free parameters. This can significantly accelerate thegeneration of density cubes for a wide variety of cosmological models, openingthe doors to previously unfeasible applications, such as parameter and modelinferences at full survey scales as the ESA/NASA Euclid mission.", "output": "Fast emulation of cosmological density fields based on dimensionality reduction and supervised machine-learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper studies the problem of online performance optimization ofconstrained closed-loop control systems, where both the objective and theconstraints are unknown black-box functions affected by exogenous time-varyingcontextual disturbances. A primal-dual contextual Bayesian optimizationalgorithm is proposed that achieves sublinear cumulative regret with respect tothe dynamic optimal solution under certain regularity conditions. Furthermore,the algorithm achieves zero time-average constraint violation, ensuring thatthe average value of the constraint function satisfies the desired constraint.The method is applied to both sampled instances from Gaussian processes and acontinuous stirred tank reactor parameter tuning problem; simulation resultsshow that the method simultaneously provides close-to-optimal performance andmaintains constraint feasibility on average. This contrasts currentstate-of-the-art methods, which either suffer from large cumulative regret orsevere constraint violations for the case studies presented.", "output": "Primal-Dual Contextual Bayesian Optimization for Control System Online Optimization with Time-Average Constraints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative models such as StyleGAN2 and Stable Diffusion have achievedstate-of-the-art performance in computer vision tasks such as image synthesis,inpainting, and de-noising. However, current generative models for faceinpainting often fail to preserve fine facial details and the identity of theperson, despite creating aesthetically convincing image structures andtextures. In this work, we propose Person Aware Tuning (PAT) of Mask-AwareTransformer (MAT) for face inpainting, which addresses this issue. Our proposedmethod, PATMAT, effectively preserves identity by incorporating referenceimages of a subject and fine-tuning a MAT architecture trained on faces. Byusing ~40 reference images, PATMAT creates anchor points in MAT's style module,and tunes the model using the fixed anchors to adapt the model to a new faceidentity. Moreover, PATMAT's use of multiple images per anchor during trainingallows the model to use fewer reference images than competing methods. Wedemonstrate that PATMAT outperforms state-of-the-art models in terms of imagequality, the preservation of person-specific details, and the identity of thesubject. Our results suggest that PATMAT can be a promising approach forimproving the quality of personalized face inpainting.", "output": "PATMAT: Person Aware Tuning of Mask-Aware Transformer for Face Inpainting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The short-form videos have explosive popularity and have dominated the newsocial media trends. Prevailing short-video platforms,~textit{e.g.}, Kuaishou(Kwai), TikTok, Instagram Reels, and YouTube Shorts, have changed the way weconsume and create content. For video content creation and understanding, theshot boundary detection (SBD) is one of the most essential components invarious scenarios. In this work, we release a new public Short video sHotbOundary deTection dataset, named SHOT, consisting of 853 complete short videosand 11,606 shot annotations, with 2,716 high quality shot boundary annotationsin 200 test videos. Leveraging this new data wealth, we propose to optimize themodel design for video SBD, by conducting neural architecture search in asearch space encapsulating various advanced 3D ConvNets and Transformers. Ourproposed approach, named AutoShot, achieves higher F1 scores than previousstate-of-the-art approaches, e.g., outperforming TransNetV2 by 4.2%, when beingderived and evaluated on our newly constructed SHOT dataset. Moreover, tovalidate the generalizability of the AutoShot architecture, we directlyevaluate it on another three public datasets: ClipShots, BBC and RAI, and theF1 scores of AutoShot outperform previous state-of-the-art approaches by 1.1%,0.9% and 1.2%, respectively. The SHOT dataset and code can be found in .", "output": "AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Concept bottleneck models (CBM) are a popular way of creating moreinterpretable neural networks by having hidden layer neurons correspond tohuman-understandable concepts. However, existing CBMs and their variants havetwo crucial limitations: first, they need to collect labeled data for each ofthe predefined concepts, which is time consuming and labor intensive; second,the accuracy of a CBM is often significantly lower than that of a standardneural network, especially on more complex datasets. This poor performancecreates a barrier for adopting CBMs in practical real world applications.Motivated by these challenges, we propose Label-free CBM which is a novelframework to transform any neural network into an interpretable CBM withoutlabeled concept data, while retaining a high accuracy. Our Label-free CBM hasmany advantages, it is: scalable - we present the first CBM scaled to ImageNet,efficient - creating a CBM takes only a few hours even for very large datasets,and automated - training it for a new dataset requires minimal human effort.Our code is available at ", "output": "Label-Free Concept Bottleneck Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While deep learning models have become the predominant method for medicalimage segmentation, they are typically not capable of generalizing to unseensegmentation tasks involving new anatomies, image modalities, or labels. Givena new segmentation task, researchers generally have to train or fine-tunemodels, which is time-consuming and poses a substantial barrier for clinicalresearchers, who often lack the resources and expertise to train neuralnetworks. We present UniverSeg, a method for solving unseen medicalsegmentation tasks without additional training. Given a query image and exampleset of image-label pairs that define a new segmentation task, UniverSeg employsa new Cross-Block mechanism to produce accurate segmentation maps without theneed for additional training. To achieve generalization to new tasks, we havegathered and standardized a collection of 53 open-access medical segmentationdatasets with over 22,000 scans, which we refer to as MegaMedical. We used thiscollection to train UniverSeg on a diverse set of anatomies and imagingmodalities. We demonstrate that UniverSeg substantially outperforms severalrelated methods on unseen tasks, and thoroughly analyze and draw insights aboutimportant aspects of the proposed system. The UniverSeg source code and modelweights are freely available at ", "output": "UniverSeg: Universal Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As deep learning models increasingly find applications in critical domainssuch as medical imaging, the need for transparent and trustworthydecision-making becomes paramount. Many explainability methods provide insightsinto how these models make predictions by attributing importance to inputfeatures. As Vision Transformer (ViT) becomes a promising alternative toconvolutional neural networks for image classification, its interpretabilityremains an open research question. This paper investigates the performance ofvarious interpretation methods on a ViT applied to classify chest X-ray images.We introduce the notion of evaluating faithfulness, sensitivity, and complexityof ViT explanations. The obtained results indicate that Layerwise relevancepropagation for transformers outperforms Local interpretable model-agnosticexplanations and Attention visualization, providing a more accurate andreliable representation of what a ViT has actually learned. Our findingsprovide insights into the applicability of ViT explanations in medical imagingand highlight the importance of using appropriate evaluation criteria forcomparing them.", "output": "Towards Evaluating Explanations of Vision Transformers for Medical Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Denoising diffusion probabilistic models (DDPMs) employ a sequence of whiteGaussian noise samples to generate an image. In analogy with GANs, those noisemaps could be considered as the latent code associated with the generatedimage. However, this native noise space does not possess a convenientstructure, and is thus challenging to work with in editing tasks. Here, wepropose an alternative latent noise space for DDPM that enables a wide range ofediting operations via simple means, and present an inversion method forextracting these edit-friendly noise maps for any given image (real orsynthetically generated). As opposed to the native DDPM noise space, theedit-friendly noise maps do not have a standard normal distribution and are notstatistically independent across timesteps. However, they allow perfectreconstruction of any desired image, and simple transformations on themtranslate into meaningful manipulations of the output image (e.g., shifting,color edits). Moreover, in text-conditional models, fixing those noise mapswhile changing the text prompt, modifies semantics while retaining structure.We illustrate how this property enables text-based editing of real images viathe diverse DDPM sampling scheme (in contrast to the popular non-diverse DDIMinversion). We also show how it can be used within existing diffusion-basedediting methods to improve their quality and diversity.", "output": "An Edit Friendly DDPM Noise Space: Inversion and Manipulations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The growclusters package for R implements an enhanced version of k-meansclustering that allows discovery of local clusterings or partitions for acollection of data sets that each draw their cluster means from a single,global partition. The package contains functions to estimate a partitionstructure for multivariate data. Estimation is performed under a penalizedoptimization derived from Bayesian non-parametric formulations. This paperdescribes some of the functions and capabilities of the growclusters package,including the creation of R Shiny applications designed to visually illustratethe operation and functionality of the growclusters package.", "output": "The growclusters Package for R."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we consider the problem of learning a neural networkcontroller for a system required to satisfy a Signal Temporal Logic (STL)specification. We exploit STL quantitative semantics to define a notion ofrobust satisfaction. Guaranteeing the correctness of a neural networkcontroller, i.e., ensuring the satisfaction of the specification by thecontrolled system, is a difficult problem that received a lot of attentionrecently. We provide a general procedure to construct a set of trainable HighOrder Control Barrier Functions (HOCBFs) enforcing the satisfaction of formulasin a fragment of STL. We use the BarrierNet, implemented by a differentiableQuadratic Program (dQP) with HOCBF constraints, as the last layer of the neuralnetwork controller, to guarantee the satisfaction of the STL formulas. We trainthe HOCBFs together with other neural network parameters to further improve therobustness of the controller. Simulation results demonstrate that our approachensures satisfaction and outperforms existing algorithms.", "output": "Learning Robust and Correct Controllers from Signal Temporal Logic Specifications Using BarrierNet."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As more connected devices are implemented in a cyber-physical world and datais expected to be collected and processed in real time, the ability to handletime series data has become increasingly significant. To help analyze timeseries in data mining applications, many time series representation approacheshave been proposed to convert a raw time series into another series forrepresenting the original time series. However, existing approaches are notdesigned for open-ended time series (which is a sequence of data points beingcontinuously collected at a fixed interval without any length limit) becausethese approaches need to know the total length of the target time series inadvance and pre-process the entire time series using normalization methods.Furthermore, many representation approaches require users to configure and tunesome parameters beforehand in order to achieve satisfactory representationresults. In this paper, we propose NP-Free, a real-time Normalization-free andParameter-tuning-free representation approach for open-ended time series.Without needing to use any normalization method or tune any parameter, NP-Freecan generate a representation for a raw time series on the fly by convertingeach data point of the time series into a root-mean-square error (RMSE) valuebased on Long Short-Term Memory (LSTM) and a Look-Back and Predict-Forwardstrategy. To demonstrate the capability of NP-Free in representing time series,we conducted several experiments based on real-world open-source time seriesdatasets. We also evaluated the time consumption of NP-Free in generatingrepresentations.", "output": "NP-Free: A Real-Time Normalization-free and Parameter-tuning-free Representation Approach for Open-ended Time Series."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transition state (TS) search is key in chemistry for elucidating reactionmechanisms and exploring reaction networks. The search for accurate 3D TSstructures, however, requires numerous computationally intensive quantumchemistry calculations due to the complexity of potential energy surfaces.Here, we developed an object-aware SE(3) equivariant diffusion model thatsatisfies all physical symmetries and constraints for generating pairs ofstructures, i.e., reactant, TS, and product, in an elementary reaction.Provided reactant and product, this model generates a TS structure in secondsinstead of the hours required when performing quantum chemistry-basedoptimizations. The generated TS structures achieve an average error of 0.13 Aroot mean square deviation compared to true TS. With a confidence scoring modelfor uncertainty quantification, we approach an accuracy required for reactionrate estimation (2.6 kcal/mol) by only performing quantum chemistry-basedoptimizations on 14% of the most challenging reactions. We envision theproposed approach to be useful in constructing and pruning large reactionnetworks with unknown mechanisms.", "output": "Accurate transition state generation with an object-aware equivariant elementary reaction diffusion model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a policy parameterization for learning-based control onnonlinear, partially-observed dynamical systems. The parameterization is basedon a nonlinear version of the Youla parameterization and the recently proposedRecurrent Equilibrium Network (REN) class of models. We prove that theresulting Youla-REN parameterization automatically satisfies stability(contraction) and user-tunable robustness (Lipschitz) conditions on theclosed-loop system. This means it can be used for safe learning-based controlwith no additional constraints or projections required to enforce stability orrobustness. We test the new policy class in simulation on two reinforcementlearning tasks: 1) magnetic suspension, and 2) inverting a rotary-arm pendulum.We find that the Youla-REN performs similarly to existing learning-based andoptimal control methods while also ensuring stability and exhibiting improvedrobustness to adversarial disturbances.", "output": "Learning Over All Contracting and Lipschitz Closed-Loops for Partially-Observed Nonlinear Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Modeling the mechanics of fluid in complex scenes is vital to applications indesign, graphics, and robotics. Learning-based methods provide fast anddifferentiable fluid simulators, however most prior work is unable toaccurately model how fluids interact with genuinely novel surfaces not seenduring training. We introduce SURFSUP, a framework that represents objectsimplicitly using signed distance functions (SDFs), rather than an explicitrepresentation of meshes or particles. This continuous representation ofgeometry enables more accurate simulation of fluid-object interactions overlong time periods while simultaneously making computation more efficient.Moreover, SURFSUP trained on simple shape primitives generalizes considerablyout-of-distribution, even to complex real-world scenes and objects. Finally, weshow we can invert our model to design simple objects to manipulate fluid flow.", "output": "SURFSUP: Learning Fluid Simulation for Novel Surfaces."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Early warning systems (EWS) are prediction algorithms that have recentlytaken a central role in efforts to improve graduation rates in public schoolsacross the US. These systems assist in targeting interventions at individualstudents by predicting which students are at risk of dropping out. Despitesignificant investments and adoption, there remain significant gaps in ourunderstanding of the efficacy of EWS. In this work, we draw on nearly adecade's worth of data from a system used throughout Wisconsin to provide thefirst large-scale evaluation of the long-term impact of EWS on graduationoutcomes.We present evidence that risk assessments made by the prediction system arehighly accurate, including for students from marginalized backgrounds. Despitethe system's accuracy and widespread use, we find no evidence that it has ledto improved graduation rates. We surface a robust statistical pattern that canexplain why these seemingly contradictory insights hold. Namely, environmentalfeatures, measured at the level of schools, contain significant signal aboutdropout risk. Within each school, however, academic outcomes are essentiallyindependent of individual student performance. This empirical observationindicates that assigning all students within the same school the sameprobability of graduation is a nearly optimal prediction.Our work provides an empirical backbone for the robust, qualitativeunderstanding among education researchers and policy-makers that dropout isstructurally determined. The primary barrier to improving outcomes lies not inidentifying students at risk of dropping out within specific schools, butrather in overcoming structural differences across different school districts.Our findings indicate that we should carefully evaluate the decision to fundearly warning systems without also devoting resources to interventions tacklingstructural barriers.", "output": "Difficult Lessons on Social Prediction from Wisconsin Public Schools."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Real-time forecasting of travel demand during wildfire evacuations is crucialfor emergency managers and transportation planners to make timely andbetter-informed decisions. However, few studies focus on accurate travel demandforecasting in large-scale emergency evacuations. Therefore, this studydevelops and tests a new methodological framework for modeling trip generationin wildfire evacuations by using (a) large-scale GPS data generated by mobiledevices and (b) state-of-the-art AI technologies. The proposed methodology aimsat forecasting evacuation trips and other types of trips. Based on the traveldemand inferred from the GPS data, we develop a new deep learning model, i.e.,Situational-Aware Multi-Graph Convolutional Recurrent Network (SA-MGCRN), alongwith a model updating scheme to achieve real-time forecasting of travel demandduring wildfire evacuations. The proposed methodological framework is tested inthis study for a real-world case study: the 2019 Kincade Fire in Sonoma County,CA. The results show that SA-MGCRN significantly outperforms all the selectedstate-of-the-art benchmarks in terms of prediction performance. Our findingsuggests that the most important model components of SA-MGCRN are evacuationorder/warning information, proximity to fire, and population change, which areconsistent with behavioral theories and empirical findings.", "output": "Situational-Aware Multi-Graph Convolutional Recurrent Network (SA-MGCRN) for Travel Demand Forecasting During Wildfires."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Our recent intensive study has found that physics-informed neural networks(PINN) tend to be local approximators after training. This observation leads tothis novel physics-informed radial basis network (PIRBN), which can maintainthe local property throughout the entire training process. Compared to deepneural networks, a PIRBN comprises of only one hidden layer and a radial basis\"activation\" function. Under appropriate conditions, we demonstrated that thetraining of PIRBNs using gradient descendent methods can converge to Gaussianprocesses. Besides, we studied the training dynamics of PIRBN via the neuraltangent kernel (NTK) theory. In addition, comprehensive investigationsregarding the initialisation strategies of PIRBN were conducted. Based onnumerical examples, PIRBN has been demonstrated to be more effective andefficient than PINN in solving PDEs with high-frequency features and ill-posedcomputational domains. Moreover, the existing PINN numerical techniques, suchas adaptive learning, decomposition and different types of loss functions, areapplicable to PIRBN. The programs that can regenerate all numerical results canbe found at ", "output": "Physics-informed radial basis network (PIRBN): A local approximation neural network for solving nonlinear PDEs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurate delineation of key waveforms in an ECG is a critical initial step inextracting relevant features to support the diagnosis and treatment of heartconditions. Although deep learning based methods using a segmentation model tolocate P, QRS and T waves have shown promising results, their ability to handlesignals exhibiting arrhythmia remains unclear. In this study, we propose anovel approach that leverages a deep learning model to accurately delineatesignals with a wide range of arrhythmia. Our approach involves training asegmentation model using a hybrid loss function that combines segmentation withthe task of arrhythmia classification. In addition, we use a diverse trainingset containing various arrhythmia types, enabling our model to handle a widerange of challenging cases. Experimental results show that our model accuratelydelineates signals with a broad range of abnormal rhythm types, and thecombined training with classification guidance can effectively reduce falsepositive P wave predictions, particularly during atrial fibrillation and atrialflutter. Furthermore, our proposed method shows competitive performance withprevious delineation algorithms on the Lobachevsky University Database (LUDB).", "output": "An Arrhythmia Classification-Guided Segmentation Model for Electrocardiogram Delineation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural image compression methods have seen increasingly strong performance inrecent years. However, they suffer orders of magnitude higher computationalcomplexity compared to traditional codecs, which stands in the way ofreal-world deployment. This paper takes a step forward in closing this gap indecoding complexity by adopting shallow or even linear decoding transforms. Tocompensate for the resulting drop in compression performance, we exploit theoften asymmetrical computation budget between encoding and decoding, byadopting more powerful encoder networks and iterative encoding. Wetheoretically formalize the intuition behind, and our experimental resultsestablish a new frontier in the trade-off between rate-distortion and decodingcomplexity for neural image compression. Specifically, we achieverate-distortion performance competitive with the established mean-scalehyperprior architecture of Minnen et al. (2018), while reducing the overalldecoding complexity by 80 %, or over 90 % for the synthesis transform alone.Our code can be found at ", "output": "Asymmetrically-powered Neural Image Compression with Shallow Decoders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph neural networks (GNNs) demonstrate great performance in compoundproperty and activity prediction due to their capability to efficiently learncomplex molecular graph structures. However, two main limitations persistincluding compound representation and model interpretability. While atom-levelmolecular graph representations are commonly used because of their ability tocapture natural topology, they may not fully express important substructures orfunctional groups which significantly influence molecular properties.Consequently, recent research proposes alternative representations employingreduction techniques to integrate higher-level information and leverages bothrepresentations for model learning. However, there is still a lack of studyabout different molecular graph representations on model learning andinterpretation. Interpretability is also crucial for drug discovery as it canoffer chemical insights and inspiration for optimization. Numerous studiesattempt to include model interpretation to explain the rationale behindpredictions, but most of them focus solely on individual prediction with littleanalysis of the interpretation on different molecular graph representations.This research introduces multiple molecular graph representations thatincorporate higher-level information and investigates their effects on modellearning and interpretation from diverse perspectives. The results indicatethat combining atom graph representation with reduced molecular graphrepresentation can yield promising model performance. Furthermore, theinterpretation results can provide significant features and potentialsubstructures consistently aligning with background knowledge. These multiplemolecular graph representations and interpretation analysis can bolster modelcomprehension and facilitate relevant applications in drug discovery.", "output": "Enhancing Model Learning and Interpretation Using Multiple Molecular Graph Representations for Compound Property and Activity Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent applications of deep convolutional neural networks in medical imagingraise concerns about their interpretability. While most explainable deeplearning applications use post hoc methods (such as GradCAM) to generatefeature attribution maps, there is a new type of case-based reasoning models,namely ProtoPNet and its variants, which identify prototypes during trainingand compare input image patches with those prototypes. We propose the firstmedical prototype network (MProtoNet) to extend ProtoPNet to brain tumorclassification with 3D multi-parametric magnetic resonance imaging (mpMRI)data. To address different requirements between 2D natural images and 3D mpMRIsespecially in terms of localizing attention regions, a new attention modulewith soft masking and online-CAM loss is introduced. Soft masking helps sharpenattention maps, while online-CAM loss directly utilizes image-level labels whentraining the attention module. MProtoNet achieves statistically significantimprovements in interpretability metrics of both correctness and localizationcoherence (with a best activation precision of $0.713pm0.058$) withouthuman-annotated labels during training, when compared with GradCAM and severalProtoPNet variants. The source code is available at", "output": "MProtoNet: A Case-Based Interpretable Model for Brain Tumor Classification with 3D Multi-parametric Magnetic Resonance Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A method for the nonintrusive and structure-preserving model reduction ofcanonical and noncanonical Hamiltonian systems is presented. Based on the ideaof operator inference, this technique is provably convergent and reduces to astraightforward linear solve given snapshot data and gray-box knowledge of thesystem Hamiltonian. Examples involving several hyperbolic partial differentialequations show that the proposed method yields reduced models which, inaddition to being accurate and stable with respect to the addition of basismodes, preserve conserved quantities well outside the range of their trainingdata.", "output": "Canonical and Noncanonical Hamiltonian Operator Inference."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Improving performance in multiple domains is a challenging task, and oftenrequires significant amounts of data to train and test models. Active learningtechniques provide a promising solution by enabling models to select the mostinformative samples for labeling, thus reducing the amount of labeled datarequired to achieve high performance. In this paper, we present an activelearning-based framework for improving performance across multiple domains. Ourapproach consists of two stages: first, we use an initial set of labeled datato train a base model, and then we iteratively select the most informativesamples for labeling to refine the model. We evaluate our approach on severalmulti-domain datasets, including image classification, sentiment analysis, andobject recognition. Our experiments demonstrate that our approach consistentlyoutperforms baseline methods and achieves state-of-the-art performance onseveral datasets. We also show that our method is highly efficient, requiringsignificantly fewer labeled samples than other active learning-based methods.Overall, our approach provides a practical and effective solution for improvingperformance across multiple domains using active learning techniques.", "output": "Optimizing Multi-Domain Performance with Active Learning-based Improvement Strategies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-Agent Reinforcement Learning (MARL) discovers policies that maximizereward but do not have safety guarantees during the learning and deploymentphases. Although shielding with Linear Temporal Logic (LTL) is a promisingformal method to ensure safety in single-agent Reinforcement Learning (RL), itresults in conservative behaviors when scaling to multi-agent scenarios.Additionally, it poses computational challenges for synthesizing shields incomplex multi-agent environments. This work introduces Model-based DynamicShielding (MBDS) to support MARL algorithm design. Our algorithm synthesizesdistributive shields, which are reactive systems running in parallel with eachMARL agent, to monitor and rectify unsafe behaviors. The shields candynamically split, merge, and recompute based on agents' states. This designenables efficient synthesis of shields to monitor agents in complexenvironments without coordination overheads. We also propose an algorithm tosynthesize shields without prior knowledge of the dynamics model. The proposedalgorithm obtains an approximate world model by interacting with theenvironment during the early stage of exploration, making our MBDS enjoy formalsafety guarantees with high probability. We demonstrate in simulations that ourframework can surpass existing baselines in terms of safety guarantees andlearning performance.", "output": "Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Labeling mistakes are frequently encountered in real-world applications. Ifnot treated well, the labeling mistakes can deteriorate the classificationperformances of a model seriously. To address this issue, we propose animproved Naive Bayes method for text classification. It is analytically simpleand free of subjective judgements on the correct and incorrect labels. Byspecifying the generating mechanism of incorrect labels, we optimize thecorresponding log-likelihood function iteratively by using an EM algorithm. Oursimulation and experiment results show that the improved Naive Bayes methodgreatly improves the performances of the Naive Bayes method with mislabeleddata.", "output": "Improved Naive Bayes with Mislabeled Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Soft-sensors are gaining popularity due to their ability to provide estimatesof key process variables with little intervention required on the asset and ata low cost. In oil and gas production, virtual flow metering (VFM) is a popularsoft-sensor that attempts to estimate multiphase flow rates in real time. VFMsare based on models, and these models require calibration. The calibration ishighly dependent on the application, both due to the great diversity of themodels, and in the available measurements. The most accurate calibration isachieved by careful tuning of the VFM parameters to well tests, but this can bework intensive, and not all wells have frequent well test data available. Thispaper presents a calibration method based on the measurement provided by theproduction separator, and the assumption that the observed flow should be equalto the sum of flow rates from each individual well. This allows us to jointlycalibrate the VFMs continuously. The method applies Sequential Monte Carlo(SMC) to infer a tuning factor and the flow composition for each well. Themethod is tested on a case with ten wells, using both synthetic and real data.The results are promising and the method is able to provide reasonableestimates of the parameters without relying on well tests. However, somechallenges are identified and discussed, particularly related to the processnoise and how to manage varying data quality.", "output": "Sequential Monte Carlo applied to virtual flow meter calibration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial training and data augmentation with noise are widely adoptedtechniques to enhance the performance of neural networks. This paperinvestigates adversarial training and data augmentation with noise in thecontext of regularized regression in a reproducing kernel Hilbert space (RKHS).We establish the limiting formula for these techniques as the attack and noisesize, as well as the regularization parameter, tend to zero. Based on thislimiting formula, we analyze specific scenarios and demonstrate that, withoutappropriate regularization, these two methods may have larger generalizationerror and Lipschitz constant than standard kernel regression. However, byselecting the appropriate regularization parameter, these two methods canoutperform standard kernel regression and achieve smaller generalization errorand Lipschitz constant. These findings support the empirical observations thatadversarial training can lead to overfitting, and appropriate regularizationmethods, such as early stopping, can alleviate this issue.", "output": "Understanding Overfitting in Adversarial Training in Kernel Regression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "When choosing between competing symbolic models for a data set, a human willnaturally prefer the \"simpler\" expression or the one which more closelyresembles equations previously seen in a similar context. This suggests anon-uniform prior on functions, which is, however, rarely considered within asymbolic regression (SR) framework. In this paper we develop methods toincorporate detailed prior information on both functions and their parametersinto SR. Our prior on the structure of a function is based on a $n$-gramlanguage model, which is sensitive to the arrangement of operators relative toone another in addition to the frequency of occurrence of each operator. Wealso develop a formalism based on the Fractional Bayes Factor to treatnumerical parameter priors in such a way that models may be fairly comparedthough the Bayesian evidence, and explicitly compare Bayesian, MinimumDescription Length and heuristic methods for model selection. We demonstratethe performance of our priors relative to literature standards on benchmarksand a real-world dataset from the field of cosmology.", "output": "Priors for symbolic regression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Falls are the public health issue for the elderly all over the world sincethe fall-induced injuries are associated with a large amount of healthcarecost. Falls can cause serious injuries, even leading to death if the elderlysuffers a \"long-lie\". Hence, a reliable fall detection (FD) system is requiredto provide an emergency alarm for first aid. Due to the advances in wearabledevice technology and artificial intelligence, some fall detection systems havebeen developed using machine learning and deep learning methods to analyze thesignal collected from accelerometer and gyroscopes. In order to achieve betterfall detection performance, an ensemble model that combines a coarse-fineconvolutional neural network and gated recurrent unit is proposed in thisstudy. The parallel structure design used in this model restores the differentgrains of spatial characteristics and capture temporal dependencies for featurerepresentation. This study applies the FallAllD public dataset to validate thereliability of the proposed model, which achieves a recall, precision, andF-score of 92.54%, 96.13%, and 94.26%, respectively. The results demonstratethe reliability of the proposed ensemble model in discriminating falls fromdaily living activities and its superior performance compared to thestate-of-the-art convolutional neural network long short-term memory (CNN-LSTM)for FD.", "output": "Deep Learning-based Fall Detection Algorithm Using Ensemble Model of Coarse-fine CNN and GRU Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Heterogeneous graph neural networks aim to discover discriminative nodeembeddings and relations from multi-relational networks.One challenge ofheterogeneous graph learning is the design of learnable meta-paths, whichsignificantly influences the quality of learned embeddings.Thus, in this paper,we propose an Attributed Multi-Order Graph Convolutional Network (AMOGCN),which automatically studies meta-paths containing multi-hop neighbors from anadaptive aggregation of multi-order adjacency matrices. The proposed modelfirst builds different orders of adjacency matrices from manually designed nodeconnections. After that, an intact multi-order adjacency matrix is attachedfrom the automatic fusion of various orders of adjacency matrices. This processis supervised by the node semantic information, which is extracted from thenode homophily evaluated by attributes. Eventually, we utilize a one-layersimplifying graph convolutional network with the learned multi-order adjacencymatrix, which is equivalent to the cross-hop node information propagation withmulti-layer graph neural networks. Substantial experiments reveal that AMOGCNgains superior semi-supervised classification performance compared withstate-of-the-art competitors.", "output": "Attributed Multi-order Graph Convolutional Network for Heterogeneous Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes a framework for developing forecasting models bystreamlining the connections between core components of the developmentalprocess. The proposed framework enables swift and robust integration of newdatasets, experimentation on different algorithms, and selection of the bestmodels. We start with the datasets of different issues and apply pre-processingsteps to clean and engineer meaningful representations of time-series data. Toidentify robust training configurations, we introduce a novel mechanism ofmultiple cross-validation strategies. We apply different evaluation metrics tofind the best-suited models for varying applications. One of the referentapplications is our participation in the intelligent forecasting competitionheld by the United States Agency of International Development (USAID). Finally,we leverage the flexibility of the framework by applying different evaluationmetrics to assess the performance of the models in inventory managementsettings.", "output": "Streamlined Framework for Agile Forecasting Model Development towards Efficient Inventory Management."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Effective quantification of uncertainty is an essential and still missingstep towards a greater adoption of deep-learning approaches in differentapplications, including mission-critical ones. In particular, investigations onthe predictive uncertainty of deep-learning models describing non-lineardynamical systems are very limited to date. This paper is aimed at filling thisgap and presents preliminary results on uncertainty quantification for systemidentification with neural state-space models. We frame the learning problem ina Bayesian probabilistic setting and obtain posterior distributions for theneural network's weights and outputs through approximate inference techniques.Based on the posterior, we construct credible intervals on the outputs anddefine a surprise index which can effectively diagnose usage of the model in apotentially dangerous out-of-distribution regime, where predictions cannot betrusted.", "output": "Neural State-Space Models: Empirical Evaluation of Uncertainty Quantification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Exact computation of the partition function is known to be intractable,necessitating approximate inference techniques. Existing methods forapproximate inference are slow to converge for many benchmarks. The control ofaccuracy-complexity trade-off is also non-trivial in many of these methods. Wepropose a novel incremental build-infer-approximate (IBIA) framework forapproximate inference that addresses these issues. In this framework, theprobabilistic graphical model is converted into a sequence of clique treeforests (SCTF) with bounded clique sizes. We show that the SCTF can be used toefficiently compute the partition function. We propose two new algorithms whichare used to construct the SCTF and prove the correctness of both. The first isan algorithm for incremental construction of CTFs that is guaranteed to give avalid CTF with bounded clique sizes and the second is an approximationalgorithm that takes a calibrated CTF as input and yields a valid andcalibrated CTF with reduced clique sizes as the output. We have evaluated ourmethod using several benchmark sets from recent UAI competitions and ourresults show good accuracies with competitive runtimes.", "output": "IBIA: An Incremental Build-Infer-Approximate Framework for Approximate Inference of Partition Function."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sepsis is a life-threatening organ malfunction caused by the host's inabilityto fight infection, which can lead to death without proper and immediatetreatment. Therefore, early diagnosis and medical treatment of sepsis incritically ill populations at high risk for sepsis and sepsis-associatedmortality are vital to providing the patient with rapid therapy. Studies showthat advancing sepsis detection by 6 hours leads to earlier administration ofantibiotics, which is associated with improved mortality. However, clinicalscores like Sequential Organ Failure Assessment (SOFA) are not applicable forearly prediction, while machine learning algorithms can help capture theprogressing pattern for early prediction. Therefore, we aim to develop amachine learning algorithm that predicts sepsis onset 6 hours before it issuspected clinically. Although some machine learning algorithms have beenapplied to sepsis prediction, many of them did not consider the fact that sixhours is not a small gap. To overcome this big gap challenge, we explore amulti-subset approach in which the likelihood of sepsis occurring earlier than6 hours is output from a previous subset and feed to the target subset asadditional features. Moreover, we use the hourly sampled data like vital signsin an observation window to derive a temporal change trend to further assist,which however is often ignored by previous studies. Our empirical study showsthat both the multi-subset approach to alleviating the 6-hour gap and the addedtemporal trend features can help improve the performance of sepsis-relatedearly prediction.", "output": "Multi-Subset Approach to Early Sepsis Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The lack of interpretability of the Vision Transformer may hinder its use incritical real-world applications despite its effectiveness. To overcome thisissue, we propose a post-hoc interpretability method called VISION DIFFMASK,which uses the activations of the model's hidden layers to predict the relevantparts of the input that contribute to its final predictions. Our approach usesa gating mechanism to identify the minimal subset of the original input thatpreserves the predicted distribution over classes. We demonstrate thefaithfulness of our method, by introducing a faithfulness task, and comparingit to other state-of-the-art attribution methods on CIFAR-10 and ImageNet-1K,achieving compelling results. To aid reproducibility and further extension ofour work, we open source our implementation:", "output": "VISION DIFFMASK: Faithful Interpretation of Vision Transformers with Differentiable Patch Masking."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose an ultrafast automated model compression frameworkcalled SeerNet for flexible network deployment. Conventionalnon-differen-tiable methods discretely search the desirable compression policybased on the accuracy from exhaustively trained lightweight models, andexisting differentiable methods optimize an extremely large supernet to obtainthe required compressed model for deployment. They both cause heavycomputational cost due to the complex compression policy search and evaluationprocess. On the contrary, we obtain the optimal efficient networks by directlyoptimizing the compression policy with an accurate performance predictor, wherethe ultrafast automated model compression for various computational costconstraint is achieved without complex compression policy search andevaluation. Specifically, we first train the performance predictor based on theaccuracy from uncertain compression policies actively selected by efficientevolutionary search, so that informative supervision is provided to learn theaccurate performance predictor with acceptable cost. Then we leverage thegradient that maximizes the predicted performance under the barrier complexityconstraint for ultrafast acquisition of the desirable compression policy, whereadaptive update stepsizes with momentum are employed to enhance optimality ofthe acquired pruning and quantization strategy. Compared with thestate-of-the-art automated model compression methods, experimental results onimage classification and object detection show that our method achievescompetitive accuracy-complexity trade-offs with significant reduction of thesearch cost.", "output": "Learning Accurate Performance Predictors for Ultrafast Automated Model Compression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces a comprehensive, multi-stage machine learningmethodology that effectively integrates information systems and artificialintelligence to enhance decision-making processes within the domain ofoperations research. The proposed framework adeptly addresses commonlimitations of existing solutions, such as the neglect of data-drivenestimation for vital production parameters, exclusive generation of pointforecasts without considering model uncertainty, and lacking explanationsregarding the sources of such uncertainty. Our approach employs QuantileRegression Forests for generating interval predictions, alongside both localand global variants of SHapley Additive Explanations for the examinedpredictive process monitoring problem. The practical applicability of theproposed methodology is substantiated through a real-world production planningcase study, emphasizing the potential of prescriptive analytics in refiningdecision-making procedures. This paper accentuates the imperative of addressingthese challenges to fully harness the extensive and rich data resourcesaccessible for well-informed decision-making.", "output": "Quantifying and Explaining Machine Learning Uncertainty in Predictive Process Monitoring: An Operations Research Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a systematic investigation into the effectiveness ofSelf-Supervised Learning (SSL) methods for Electrocardiogram (ECG) arrhythmiadetection. We begin by conducting a novel distribution analysis on threepopular ECG-based arrhythmia datasets: PTB-XL, Chapman, and Ribeiro. To thebest of our knowledge, our study is the first to quantify these distributionsin this area. We then perform a comprehensive set of experiments usingdifferent augmentations and parameters to evaluate the effectiveness of variousSSL methods, namely SimCRL, BYOL, and SwAV, for ECG representation learning,where we observe the best performance achieved by SwAV. Furthermore, ouranalysis shows that SSL methods achieve highly competitive results to thoseachieved by supervised state-of-the-art methods. To further assess theperformance of these methods on both In-Distribution (ID) andOut-of-Distribution (OOD) ECG data, we conduct cross-dataset training andtesting experiments. Our comprehensive experiments show almost identicalresults when comparing ID and OOD schemes, indicating that SSL techniques canlearn highly effective representations that generalize well across differentOOD datasets. This finding can have major implications for ECG-based arrhythmiadetection. Lastly, to further analyze our results, we perform detailedper-disease studies on the performance of the SSL methods on the threedatasets.", "output": "In-Distribution and Out-of-Distribution Self-supervised ECG Representation Learning for Arrhythmia Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vision transformers have been applied successfully for image recognitiontasks. There have been either multi-headed self-attention based (ViTcite{dosovitskiy2020image}, DeIT, cite{touvron2021training}) similar to theoriginal work in textual models or more recently based on spectral layers(Fnetcite{lee2021fnet}, GFNetcite{rao2021global},AFNOcite{guibas2021efficient}). We hypothesize that both spectral andmulti-headed attention plays a major role. We investigate this hypothesisthrough this work and observe that indeed combining spectral and multi-headedattention layers provides a better transformer architecture. We thus proposethe novel Spectformer architecture for transformers that combines spectral andmulti-headed attention layers. We believe that the resulting representationallows the transformer to capture the feature representation appropriately andit yields improved performance over other transformer representations. Forinstance, it improves the top-1 accuracy by 2% on ImageNet compared to bothGFNet-H and LiT. SpectFormer-S reaches 84.25% top-1 accuracy on ImageNet-1K(state of the art for small version). Further, Spectformer-L achieves 85.7%that is the state of the art for the comparable base version of thetransformers. We further ensure that we obtain reasonable results in otherscenarios such as transfer learning on standard datasets such as CIFAR-10,CIFAR-100, Oxford-IIIT-flower, and Standford Car datasets. We then investigateits use in downstream tasks such of object detection and instance segmentationon the MS-COCO dataset and observe that Spectformer shows consistentperformance that is comparable to the best backbones and can be furtheroptimized and improved. Hence, we believe that combined spectral and attentionlayers are what are needed for vision transformers.", "output": "SpectFormer: Frequency and Attention is what you need in a Vision Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Preserving the individuals' privacy in sharing spatial-temporal datasets iscritical to prevent re-identification attacks based on unique trajectories.Existing privacy techniques tend to propose ideal privacy-utility tradeoffs,however, largely ignore the fairness implications of mobility models andwhether such techniques perform equally for different groups of users. Thequantification between fairness and privacy-aware models is still unclear andthere barely exists any defined sets of metrics for measuring fairness in thespatial-temporal context. In this work, we define a set of fairness metricsdesigned explicitly for human mobility, based on structural similarity andentropy of the trajectories. Under these definitions, we examine the fairnessof two state-of-the-art privacy-preserving models that rely on GAN andrepresentation learning to reduce the re-identification rate of users for datasharing. Our results show that while both models guarantee group fairness interms of demographic parity, they violate individual fairness criteria,indicating that users with highly similar trajectories receive disparateprivacy gain. We conclude that the tension between the re-identification taskand individual fairness needs to be considered for future spatial-temporal dataanalysis and modelling to achieve a privacy-preserving fairness-aware setting.", "output": "Analysing Fairness of Privacy-Utility Mobility Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Classifying EEG data is integral to the performance of Brain ComputerInterfaces (BCI) and their applications. However, external noise oftenobstructs EEG data due to its biological nature and complex data collectionprocess. Especially when dealing with classification tasks, standard EEGpreprocessing approaches extract relevant events and features from the entiredataset. However, these approaches treat all relevant cognitive events equallyand overlook the dynamic nature of the brain over time. In contrast, we areinspired by neuroscience studies to use a novel approach that integratesfeature selection and time segmentation of EEG data. When tested on theEEGEyeNet dataset, our proposed method significantly increases the performanceof Machine Learning classifiers while reducing their respective computationalcomplexity.", "output": "Two Heads are Better than One: A Bio-inspired Method for Improving Classification on EEG-ET Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Wi-Fi channel state information (CSI) has become a promising solution fornon-invasive breathing and body motion monitoring during sleep. Sleep disordersof apnea and periodic limb movement disorder (PLMD) are often unconscious andfatal. The existing researches detect abnormal sleep disorders in impracticallycontrolled environments. Moreover, it leads to compelling challenges toclassify complex macro- and micro-scales of sleep movements as well asentangled similar waveforms of cases of apnea and PLMD. In this paper, wepropose the attention-based learning for sleep apnea and limb movementdetection (ALESAL) system that can jointly detect sleep apnea and PLMD underdifferent sleep postures across a variety of patients. ALESAL containsantenna-pair and time attention mechanisms for mitigating the impact of modestantenna pairs and emphasizing the duration of interest, respectively.Performance results show that our proposed ALESAL system can achieve a weightedF1-score of 84.33, outperforming the other existing non-attention based methodsof support vector machine and deep multilayer perceptron.", "output": "Attention-based Learning for Sleep Apnea and Limb Movement Detection using Wi-Fi CSI Signals."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Electrocardiography is the most common method to investigate the condition ofthe heart through the observation of cardiac rhythm and electrical activity,for both diagnosis and monitoring purposes. Analysis of electrocardiograms(ECGs) is commonly performed through the investigation of specific patterns,which are visually recognizable by trained physicians and are known to reflectcardiac (dis)function. In this work we study the use of $beta$-variationalautoencoders (VAEs) as an explainable feature extractor, and improve on itspredictive capacities by jointly optimizing signal reconstruction and cardiacfunction prediction. The extracted features are then used for cardiac functionprediction using logistic regression. The method is trained and tested on datafrom 7255 patients, who were treated for acute coronary syndrome at the LeidenUniversity Medical Center between 2010 and 2021. The results show that ourmethod significantly improved prediction and explainability compared to avanilla $beta$-VAE, while still yielding similar reconstruction performance.", "output": "Joint optimization of a $\\beta$-VAE for ECG task-specific feature extraction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Non-Fungible Tokens (NFTs) are non-interchangeable assets, usually digitalart, which are stored on the blockchain. Preliminary studies find that femaleand darker-skinned NFTs are valued less than their male and lighter-skinnedcounterparts. However, these studies analyze only the CryptoPunks collection.We test the statistical significance of race and gender biases in the prices ofCryptoPunks and present the first study of gender bias in the broader NFTmarket. We find evidence of racial bias but not gender bias. Our work alsointroduces a dataset of gender-labeled NFT collections to advance the broaderstudy of social equity in this emerging market.", "output": "Exploring Gender and Race Biases in the NFT Market."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sleep abnormalities can have severe health consequences. Automated sleepstaging, i.e. labelling the sequence of sleep stages from the patient'sphysiological recordings, could simplify the diagnostic process. Previous workon automated sleep staging has achieved great results, mainly relying on theEEG signal. However, often multiple sources of information are available beyondEEG. This can be particularly beneficial when the EEG recordings are noisy oreven missing completely. In this paper, we propose CoRe-Sleep, a CoordinatedRepresentation multimodal fusion network that is particularly focused onimproving the robustness of signal analysis on imperfect data. We demonstratehow appropriately handling multimodal information can be the key to achievingsuch robustness. CoRe-Sleep tolerates noisy or missing modalities segments,allowing training on incomplete data. Additionally, it shows state-of-the-artperformance when testing on both multimodal and unimodal data using a singlemodel on SHHS-1, the largest publicly available study that includes sleep stagelabels. The results indicate that training the model on multimodal data doespositively influence performance when tested on unimodal data. This work aimsat bridging the gap between automated analysis tools and their clinicalutility.", "output": "CoRe-Sleep: A Multimodal Fusion Framework for Time Series Robust to Imperfect Modalities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which isdemonstrated to be one small step for generative AI (GAI), but one giant leapfor artificial general intelligence (AGI). Since its official release inNovember 2022, ChatGPT has quickly attracted numerous users with extensivemedia coverage. Such unprecedented attention has also motivated numerousresearchers to investigate ChatGPT from various aspects. According to Googlescholar, there are more than 500 articles with ChatGPT in their titles ormentioning it in their abstracts. Considering this, a review is urgentlyneeded, and our work fills this gap. Overall, this work is the first to surveyChatGPT with a comprehensive review of its underlying technology, applications,and challenges. Moreover, we present an outlook on how ChatGPT might evolve torealize general-purpose AIGC (a.k.a. AI-generated content), which will be asignificant milestone for the development of AGI.", "output": "One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning-based wearable human activity recognition (WHAR) modelsenable the development of various smart and connected community applicationssuch as sleep pattern monitoring, medication reminders, cognitive healthassessment, sports analytics, etc. However, the widespread adoption of theseWHAR models is impeded by their degraded performance in the presence of datadistribution heterogeneities caused by the sensor placement at different bodypositions, inherent biases and heterogeneities across devices, and personal andenvironmental diversities. Various traditional machine learning algorithms andtransfer learning techniques have been proposed in the literature to addressthe underpinning challenges of handling such data heterogeneities. Domainadaptation is one such transfer learning techniques that has gained significantpopularity in recent literature. In this paper, we survey the recent progressof domain adaptation techniques in the Inertial Measurement Unit (IMU)-basedhuman activity recognition area, discuss potential future directions.", "output": "Domain Adaptation for Inertial Measurement Unit-based Human Activity Recognition: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The demand for device-free indoor localization using commercial Wi-Fi deviceshas rapidly increased in various fields due to its convenience and versatileapplications. However, random frequency offset (RFO) in wireless channels poseschallenges to the accuracy of indoor localization when using fluctuatingchannel state information (CSI). To mitigate the RFO problem, an error vectorspectrum (EVS) is conceived thanks to its higher resolution of signal androbustness to RFO. To address these challenges, this paper proposed a novelerror vector assisted learning (EVAL) for device-free indoor localization. Theproposed EVAL scheme employs deep neural networks to classify the location of aperson in the indoor environment by extracting ample channel features from thephysical layer signals. We conducted realistic experiments based on OpenWiFiproject to extract both EVS and CSI to examine the performance of differentdevice-free localization techniques. Experimental results show that ourproposed EVAL scheme outperforms conventional machine learning methods andbenchmarks utilizing either CSI amplitude or phase information. Compared tomost existing CSI-based localization schemes, a new paradigm with higherpositioning accuracy by adopting EVS is revealed by our proposed EVAL system.", "output": "A New Paradigm for Device-free Indoor Localization: Deep Learning with Error Vector Spectrum in Wi-Fi Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Various faults can occur during the operation of PV arrays, and both thedust-affected operating conditions and various diode configurations make thefaults more complicated. However, current methods for fault diagnosis based onI-V characteristic curves only utilize partial feature information and oftenrely on calibrating the field characteristic curves to standard test conditions(STC). It is difficult to apply it in practice and to accurately identifymultiple complex faults with similarities in different blocking diodesconfigurations of PV arrays under the influence of dust. Therefore, a novelfault diagnosis method for PV arrays considering dust impact is proposed. Inthe preprocessing stage, the Isc-Voc normalized Gramian angular differencefield (GADF) method is presented, which normalizes and transforms the resampledPV array characteristic curves from the field including I-V and P-V to obtainthe transformed graphical feature matrices. Then, in the fault diagnosis stage,the model of convolutional neural network (CNN) with convolutional blockattention modules (CBAM) is designed to extract fault differentiationinformation from the transformed graphical matrices containing full featureinformation and to classify faults. And different graphical featuretransformation methods are compared through simulation cases, and differentCNN-based classification methods are also analyzed. The results indicate thatthe developed method for PV arrays with different blocking diodesconfigurations under various operating conditions has high fault diagnosisaccuracy and reliability.", "output": "Fault diagnosis for PV arrays considering dust impact based on transformed graphical feature of characteristic curves and convolutional neural network with CBAM modules."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neurophysiological time series recordings like the electroencephalogram (EEG)or local field potentials are obtained from multiple sensors. They can bedecoded by machine learning models in order to estimate the ongoing brain stateof a patient or healthy user. In a brain-computer interface (BCI), this decodedbrain state information can be used with minimal time delay to either controlan application, e.g., for communication or for rehabilitation after stroke, orto passively monitor the ongoing brain state of the subject, e.g., in ademanding work environment. A specific challenge in such decoding tasks isposed by the small dataset sizes in BCI compared to other domains of machinelearning like computer vision or natural language processing. A possibility totackle classification or regression problems in BCI despite small training datasets is through transfer learning, which utilizes data from other sessions,subjects or even datasets to train a model. In this exploratory study, wepropose novel domain-specific embeddings for neurophysiological data. Ourapproach is based on metric learning and builds upon the recently proposedladder loss. Using embeddings allowed us to benefit, both from the goodgeneralisation abilities and robustness of deep learning and from the fasttraining of classical machine learning models for subject-specific calibration.In offline analyses using EEG data of 14 subjects, we tested the embeddings'feasibility and compared their efficiency with state-of-the-art deep learningmodels and conventional machine learning pipelines. In summary, we propose theuse of metric learning to obtain pre-trained embeddings of EEG-BCI data as ameans to incorporate domain knowledge and to reach competitive performance onnovel subjects with minimal calibration requirements.", "output": "An embedding for EEG signals learned using a triplet loss."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Electroencephalography (EEG) is an objective tool for emotion recognition andshows promising performance. However, the label scarcity problem is a mainchallenge in this field, which limits the wide application of EEG-based emotionrecognition. In this paper, we propose a novel semi-supervised learningframework (EEGMatch) to leverage both labeled and unlabeled EEG data. First, anEEG-Mixup based data augmentation method is developed to generate more validsamples for model learning. Second, a semi-supervised two-step pairwiselearning method is proposed to bridge prototype-wise and instance-wise pairwiselearning, where the prototype-wise pairwise learning measures the globalrelationship between EEG data and the prototypical representation of eachemotion class and the instance-wise pairwise learning captures the localintrinsic relationship among EEG data. Third, a semi-supervised multi-domainadaptation is introduced to align the data representation among multipledomains (labeled source domain, unlabeled source domain, and target domain),where the distribution mismatch is alleviated. Extensive experiments areconducted on two benchmark databases (SEED and SEED-IV) under a cross-subjectleave-one-subject-out cross-validation evaluation protocol. The results showthe proposed EEGmatch performs better than the state-of-the-art methods underdifferent incomplete label conditions (with 6.89% improvement on SEED and 1.44%improvement on SEED-IV), which demonstrates the effectiveness of the proposedEEGMatch in dealing with the label scarcity problem in emotion recognitionusing EEG signals. The source code is available at", "output": "EEGMatch: Learning with Incomplete Labels for Semi-Supervised EEG-based Cross-Subject Emotion Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Convolutional neural networks learns spatial features and are heavilyinterlinked within kernels. The SE module have broken the traditional route ofneural networks passing the entire result to next layer. Instead SE only passesimportant features to be learned with its squeeze and excitation (SE) module.We propose variations of the SE module which improvises the process of squeezeand excitation and enhances the performance. The proposed squeezing or excitingthe layer makes it possible for having a smooth transition of layer weights.These proposed variations also retain the characteristics of SE module. Theexperimented results are carried out on residual networks and the results aretabulated.", "output": "Variations of Squeeze and Excitation networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This chapter presents some of the fundamental assumptions and principles thatcould form the philosophical foundation of GeoAI and spatial data science.Instead of reviewing the well-established characteristics of spatial data(analysis), including interaction, neighborhoods, and autocorrelation, thechapter highlights themes such as sustainability, bias in training data,diversity in schema knowledge, and the (potential lack of) neutrality of GeoAIsystems from a unifying ethical perspective. Reflecting on our profession'sethical implications will assist us in conducting potentially disruptiveresearch more responsibly, identifying pitfalls in designing, training, anddeploying GeoAI-based systems, and developing a shared understanding of thebenefits but also potential dangers of artificial intelligence and machinelearning research across academic fields, all while sharing our unique(geo)spatial perspective with others.", "output": "Philosophical Foundations of GeoAI: Exploring Sustainability, Diversity, and Bias in GeoAI and Spatial Data Science."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cellular user positioning is a promising service provided by Fifth GenerationNew Radio (5G NR) networks. Besides, Machine Learning (ML) techniques areforeseen to become an integrated part of 5G NR systems improving radioperformance and reducing complexity. In this paper, we investigate MLtechniques for positioning using 5G NR fingerprints consisting of uplinkchannel estimates from the physical layer channel. We show that it is possibleto use Sounding Reference Signals (SRS) channel fingerprints to providesufficient data to infer user position. Furthermore, we show that smallfully-connected moderately Deep Neural Networks, even when applied to verysparse SRS data, can achieve successful outdoor user positioning withmeter-level accuracy in a commercial 5G environment.", "output": "ML-Enabled Outdoor User Positioning in 5G NR Systems via Uplink SRS Channel Estimates."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper considers reliable and secure Spectrum Sensing (SS) based onFederated Learning (FL) in the Cognitive Radio (CR) environment. Motivation,architectures, and algorithms of FL in SS are discussed. Security and privacythreats on these algorithms are overviewed, along with possible countermeasuresto such attacks. Some illustrative examples are also provided, with designrecommendations for FL-based SS in future CRs.", "output": "Secure Federated Learning for Cognitive Radio Sensing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Correct identification and categorization of plant diseases are crucial forensuring the safety of the global food supply and the overall financial successof stakeholders. In this regard, a wide range of solutions has been madeavailable by introducing deep learning-based classification systems fordifferent staple crops. Despite being one of the most important commercialcrops in many parts of the globe, research proposing a smart solution forautomatically classifying apple leaf diseases remains relatively unexplored.This study presents a technique for identifying apple leaf diseases based ontransfer learning. The system extracts features using a pretrainedEfficientNetV2S architecture and passes to a classifier block for effectiveprediction. The class imbalance issues are tackled by utilizing runtime dataaugmentation. The effect of various hyperparameters, such as input resolution,learning rate, number of epochs, etc., has been investigated carefully. Thecompetence of the proposed pipeline has been evaluated on the apple leafdisease subset from the publicly available `PlantVillage' dataset, where itachieved an accuracy of 99.21%, outperforming the existing works.", "output": "An Efficient Transfer Learning-based Approach for Apple Leaf Disease Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spiking neural networks have attracted extensive attention from researchersin many fields due to their brain-like information processing mechanism. Theproposal of surrogate gradient enables the spiking neural networks to migrateto more complex tasks, and gradually close the gap with the conventionalartificial neural networks. Current spiking neural networks utilize the outputof all moments to produce the final prediction, which compromises theirtemporal characteristics and causes a reduction in performance and efficiency.We propose a temporal knowledge sharing approach (TKS) that enables theinteraction of information between different moments, by selecting the outputof specific moments to compose teacher signals to guide the training of thenetwork along with the real labels. We have validated TKS on both staticdatasets CIFAR10, CIFAR100, ImageNet-1k and neuromorphic datasets DVS-CIFAR10,NCALTECH101. Our experimental results indicate that we have achieved thecurrent optimal performance in comparison with other algorithms. Experiments onFine-grained classification datasets further demonstrate our algorithm'ssuperiority with CUB-200-2011, StanfordDogs, and StanfordCars. TKS algorithmhelps the model to have stronger temporal generalization capability, allowingthe network to guarantee performance with large time steps in the trainingphase and with small time steps in the testing phase. This greatly facilitatesthe deployment of SNNs on edge devices.", "output": "Temporal Knowledge Sharing enable Spiking Neural Network Learning from Past and Future."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents two computationally efficient algorithms for theorientation estimation of inertial measurement units (IMUs): thecorrentropy-based gradient descent (CGD) and the correntropy-based decoupledorientation estimation (CDOE). Traditional methods, such as gradient descent(GD) and decoupled orientation estimation (DOE), rely on the mean squared error(MSE) criterion, making them vulnerable to external acceleration and magneticinterference. To address this issue, we demonstrate that the multi-kernelcorrentropy loss (MKCL) is an optimal objective function for maximum likelihoodestimation (MLE) when the noise follows a type of heavy-tailed distribution. Incertain situations, the estimation error of the MKCL is bounded even in thepresence of arbitrarily large outliers. By replacing the standard MSE costfunction with MKCL, we develop the CGD and CDOE algorithms. We evaluate theeffectiveness of our proposed methods by comparing them with existingalgorithms in various situations. Experimental results indicate that ourproposed methods (CGD and CDOE) outperform their conventional counterparts (GDand DOE), especially when faced with external acceleration and magneticdisturbances. Furthermore, the new algorithms demonstrate significantly lowercomputational complexity than Kalman filter-based approaches, making themsuitable for applications with low-cost microprocessors.", "output": "Multi-kernel Correntropy-based Orientation Estimation of IMUs: Gradient Descent Methods."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unmanned aerial vehicles (UAV) or drones play many roles in a modern smartcity such as the delivery of goods, mapping real-time road traffic andmonitoring pollution. The ability of drones to perform these functions oftenrequires the support of machine learning technology. However, traditionalmachine learning models for drones encounter data privacy problems,communication costs and energy limitations. Federated Learning, an emergingdistributed machine learning approach, is an excellent solution to addressthese issues. Federated learning (FL) allows drones to train local modelswithout transmitting raw data. However, existing FL requires a central serverto aggregate the trained model parameters of the UAV. A failure of the centralserver can significantly impact the overall training. In this paper, we proposetwo aggregation methods: Commutative FL and Alternate FL, based on the existingarchitecture of decentralised Federated Learning for UAV Networks (DFL-UN) byadding a unique aggregation method of decentralised FL. Those two methods caneffectively control energy consumption and communication cost by controllingthe number of local training epochs, local communication, and globalcommunication. The simulation results of the proposed training methods are alsopresented to verify the feasibility and efficiency of the architecture comparedwith two benchmark methods (e.g. standard machine learning training andstandard single aggregation server training). The simulation results show thatthe proposed methods outperform the benchmark methods in terms of operationalstability, energy consumption and communication cost.", "output": "Decentralized federated learning methods for reducing communication cost and energy consumption in UAV networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep reinforcement learning (DRL) has demonstrated its potential in solvingcomplex manufacturing decision-making problems, especially in a context wherethe system learns over time with actual operation in the absence of trainingdata. One interesting and challenging application for such methods is theassembly sequence planning (ASP) problem. In this paper, we propose an approachto the implementation of DRL methods in ASP. The proposed approach introducesin the RL environment parametric actions to improve training time and sampleefficiency and uses two different reward signals: (1) user's preferences and(2) total assembly time duration. The user's preferences signal addresses thedifficulties and non-ergonomic properties of the assembly faced by the humanand the total assembly time signal enforces the optimization of the assembly.Three of the most powerful deep RL methods were studied, Advantage Actor-Critic(A2C), Deep Q-Learning (DQN), and Rainbow, in two different scenarios: astochastic and a deterministic one. Finally, the performance of the DRLalgorithms was compared to tabular Q-Learnings performance. After 10,000episodes, the system achieved near optimal behaviour for the algorithms tabularQ-Learning, A2C, and Rainbow. Though, for more complex scenarios, the algorithmtabular Q-Learning is expected to underperform in comparison to the other 2algorithms. The results support the potential for the application of deepreinforcement learning in assembly sequence planning problems with humaninteraction.", "output": "Deep reinforcement learning applied to an assembly sequence planning problem with user preferences."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Counterfactual explanation methods provide information on how feature valuesof individual observations must be changed to obtain a desired prediction.Despite the increasing amount of proposed methods in research, only a fewimplementations exist whose interfaces and requirements vary widely. In thiswork, we introduce the counterfactuals R package, which provides a modular andunified R6-based interface for counterfactual explanation methods. Weimplemented three existing counterfactual explanation methods and propose someoptional methodological extensions to generalize these methods to differentscenarios and to make them more comparable. We explain the structure andworkflow of the package using real use cases and show how to integrateadditional counterfactual explanation methods into the package. In addition, wecompared the implemented methods for a variety of models and datasets withregard to the quality of their counterfactual explanations and their runtimebehavior.", "output": "counterfactuals: An R Package for Counterfactual Explanation Methods."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training a classifier with noisy labels typically requires the learner tospecify the distribution of label noise, which is often unknown in practice.Although there have been some recent attempts to relax that requirement, weshow that the Bayes decision rule is unidentified in most classificationproblems with noisy labels. This suggests it is generally not possible tobypass/relax the requirement. In the special cases in which the Bayes decisionrule is identified, we develop a simple algorithm to learn the Bayes decisionrule, that does not require knowledge of the noise distribution.", "output": "Bayes classifier cannot be learned from noisy responses with unknown noise rates."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial examples have been found for various deep as well as shallowlearning models, and have at various times been suggested to be either fixablemodel-specific bugs, or else inherent dataset feature, or both. We presenttheoretical and empirical results to show that adversarial examples areapproximate discontinuities resulting from models that specify approximatelybijective maps $f: Bbb R^n to Bbb R^m; n neq m$ over their inputs, and thisdiscontinuity follows from the topological invariance of dimension.", "output": "Adversarial Examples from Dimensional Invariance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models have become ubiquitous in modern life, findingapplications in various domains such as natural language processing, languagetranslation, and speech recognition. Recently, a breakthrough work [Zhao,Panigrahi, Ge, and Arora Arxiv 2023] explains the attention model fromprobabilistic context-free grammar (PCFG). One of the central computation taskfor computing probability in PCFG is formulating a particular tensor low rankapproximation problem, we can call it tensor cycle rank. Given an $n times ntimes n$ third order tensor $A$, we say that $A$ has cycle rank-$k$ if thereexists three $n times k^2$ size matrices $U , V$, and $W$ such that for eachentry in each begin{align*} A_{a,b,c} = sum_{i=1}^k sum_{j=1}^k sum_{l=1}^kU_{a,i+k(j-1)} otimes V_{b, j + k(l-1)} otimes W_{c, l + k(i-1) }end{align*} for all $a in [n], b in [n], c in [n]$. For the tensorclassical rank, tucker rank and train rank, it has been well studied in [Song,Woodruff, Zhong SODA 2019]. In this paper, we generalize the previous``rotation and sketch'' technique in page 186 of [Song, Woodruff, Zhong SODA2019] and show an input sparsity time algorithm for cycle rank.", "output": "Solving Tensor Low Cycle Rank Approximation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning algorithms play an important role in a variety of importantdecision-making processes, including targeted advertisement displays, home loanapprovals, and criminal behavior predictions. Given the far-reaching impact ofthese algorithms, it is crucial that they operate fairly, free from bias orprejudice towards certain groups in the population. Ensuring impartiality inthese algorithms is essential for promoting equality and avoidingdiscrimination. To this end we introduce a unified framework for randomizedsubset selection that incorporates group fairness constraints. Our probleminvolves a global utility function and a set of group utility functions foreach group, here a group refers to a group of individuals (e.g., people)sharing the same attributes (e.g., gender). Our aim is to generate adistribution across feasible subsets, specifying the selection probability ofeach feasible set, to maximize the global utility function while meeting apredetermined quota for each group utility function in expectation. Note thatthere may not necessarily be any direct connections between the global utilityfunction and each group utility function. We demonstrate that this frameworkunifies and generalizes many significant applications in machine learning andoperations research. Our algorithmic results either improves the best knownresult or provide the first approximation algorithms for new applications.", "output": "Beyond Submodularity: A Unified Framework of Randomized Set Selection with Group Fairness Constraints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent works have shown that large models pretrained on common visuallearning tasks can provide useful representations for a wide range ofspecialized perception problems, as well as a variety of robotic manipulationtasks. While prior work on robotic manipulation has predominantly used frozenpretrained features, we demonstrate that in robotics this approach can fail toreach optimal performance, and that fine-tuning of the full model can lead tosignificantly better results. Unfortunately, fine-tuning disrupts thepretrained visual representation, and causes representational drift towards thefine-tuned task thus leading to a loss of the versatility of the originalmodel. We introduce \"lossless adaptation\" to address this shortcoming ofclassical fine-tuning. We demonstrate that appropriate placement of ourparameter efficient adapters can significantly reduce the performance gapbetween frozen pretrained representations and full end-to-end fine-tuningwithout changes to the original representation and thus preserving originalcapabilities of the pretrained model. We perform a comprehensive investigationacross three major model architectures (ViTs, NFNets, and ResNets), supervised(ImageNet-1K classification) and self-supervised pretrained weights (CLIP,BYOL, Visual MAE) in 3 task domains and 35 individual tasks, and demonstratethat our claims are strongly validated in various settings.", "output": "Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Without access to the source data, source-free domain adaptation (SFDA)transfers knowledge from a source-domain trained model to target domains.Recently, SFDA has gained popularity due to the need to protect the dataprivacy of the source domain, but it suffers from catastrophic forgetting onthe source domain due to the lack of data. To systematically investigate themechanism of catastrophic forgetting, we first reimplement previous SFDAapproaches within a unified framework and evaluate them on four benchmarks. Weobserve that there is a trade-off between adaptation gain and forgetting loss,which motivates us to design a consistency regularization to mitigateforgetting. In particular, we propose a continual source-free domain adaptationapproach named CoSDA, which employs a dual-speed optimized teacher-studentmodel pair and is equipped with consistency learning capability. Ourexperiments demonstrate that CoSDA outperforms state-of-the-art approaches incontinuous adaptation. Notably, our CoSDA can also be integrated with otherSFDA methods to alleviate forgetting.", "output": "CoSDA: Continual Source-Free Domain Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent approaches have attempted to personalize dialogue systems byleveraging profile information into models. However, this knowledge is scarceand difficult to obtain, which makes the extraction/generation of profileinformation from dialogues a fundamental asset. To surpass this limitation, weintroduce the Profile Generation Task (PGTask). We contribute with a newdataset for this problem, comprising profile sentences aligned with relatedutterances, extracted from a corpus of dialogues. Furthermore, usingstate-of-the-art methods, we provide a benchmark for profile generation on thisnovel dataset. Our experiments disclose the challenges of profile generation,and we hope that this introduces a new research direction.", "output": "PGTask: Introducing the Task of Profile Generation from Dialogues."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Controllable text generation (CTG) by large language models has a hugepotential to transform education for teachers and students alike. Specifically,high quality and diverse question generation can dramatically reduce the loadon teachers and improve the quality of their educational content. Recent workin this domain has made progress with generation, but fails to show that realteachers judge the generated questions as sufficiently useful for the classroomsetting; or if instead the questions have errors and/or pedagogically unhelpfulcontent. We conduct a human evaluation with teachers to assess the quality andusefulness of outputs from combining CTG and question taxonomies (Bloom's and adifficulty taxonomy). The results demonstrate that the questions generated arehigh quality and sufficiently useful, showing their promise for widespread usein the classroom setting.", "output": "How Useful are Educational Questions Generated by Large Language Models?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It has been reported that clustering-based topic models, which clusterhigh-quality sentence embeddings with an appropriate word selection method, cangenerate better topics than generative probabilistic topic models. However,these approaches suffer from the inability to select appropriate parameters andincomplete models that overlook the quantitative relation between words withtopics and topics with text. To solve these issues, we propose graph to topic(G2T), a simple but effective framework for topic modelling. The framework iscomposed of four modules. First, document representation is acquired usingpretrained language models. Second, a semantic graph is constructed accordingto the similarity between document representations. Third, communities indocument semantic graphs are identified, and the relationship between topicsand documents is quantified accordingly. Fourth, the word--topic distributionis computed based on a variant of TFIDF. Automatic evaluation suggests that G2Tachieved state-of-the-art performance on both English and Chinese documentswith different lengths. Human judgements demonstrate that G2T can producetopics with better interpretability and coverage than baselines. In addition,G2T can not only determine the topic number automatically but also give theprobabilistic distribution of words in topics and topics in documents. Finally,G2T is publicly available, and the distillation experiments provide instructionon how it works.", "output": "G2T: A simple but versatile framework for topic modeling based on pretrained language model and community detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper considers distributed optimization algorithms, with application inbinary classification via distributed support-vector-machines (D-SVM) overmulti-agent networks subject to some link nonlinearities. The agents solve aconsensus-constraint distributed optimization cooperatively via continuous-timedynamics, while the links are subject to strongly sign-preserving odd nonlinearconditions. Logarithmic quantization and clipping (saturation) are two examplesof such nonlinearities. In contrast to existing literature that mostlyconsiders ideal links and perfect information exchange over linear channels, weshow how general sector-bounded models affect the convergence to the optimizer(i.e., the SVM classifier) over dynamic balanced directed networks. In general,any odd sector-bounded nonlinear mapping can be applied to our dynamics. Themain challenge is to show that the proposed system dynamics always have onezero eigenvalue (associated with the consensus) and the other eigenvalues allhave negative real parts. This is done by recalling arguments from matrixperturbation theory. Then, the solution is shown to converge to the agreementstate under certain conditions. For example, the gradient tracking (GT) stepsize is tighter than the linear case by factors related to the upper/lowersector bounds. To the best of our knowledge, no existing work in distributedoptimization and learning literature considers non-ideal link conditions.", "output": "D-SVM over Networked Systems with Non-Ideal Linking Conditions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The remarkable performance of overparameterized deep neural networks (DNNs)must arise from an interplay between network architecture, training algorithms,and structure in the data. To disentangle these three components, we apply aBayesian picture, based on the functions expressed by a DNN, to supervisedlearning. The prior over functions is determined by the network, and is variedby exploiting a transition between ordered and chaotic regimes. For Booleanfunction classification, we approximate the likelihood using the error spectrumof functions on data. When combined with the prior, this accurately predictsthe posterior, measured for DNNs trained with stochastic gradient descent. Thisanalysis reveals that structured data, combined with an intrinsic Occam'srazor-like inductive bias towards (Kolmogorov) simple functions that is strongenough to counteract the exponential growth of the number of functions withcomplexity, is a key to the success of DNNs.", "output": "Do deep neural networks have an inbuilt Occam's razor?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spatial control is a core capability in controllable image generation.Advancements in layout-guided image generation have shown promising results onin-distribution (ID) datasets with similar spatial configurations. However, itis unclear how these models perform when facing out-of-distribution (OOD)samples with arbitrary, unseen layouts. In this paper, we propose LayoutBench,a diagnostic benchmark for layout-guided image generation that examines fourcategories of spatial control skills: number, position, size, and shape. Webenchmark two recent representative layout-guided image generation methods andobserve that the good ID layout control may not generalize well to arbitrarylayouts in the wild (e.g., objects at the boundary). Next, we proposeIterInpaint, a new baseline that generates foreground and background regions ina step-by-step manner via inpainting, demonstrating stronger generalizabilitythan existing models on OOD layouts in LayoutBench. We perform quantitative andqualitative evaluation and fine-grained analysis on the four LayoutBench skillsto pinpoint the weaknesses of existing models. Lastly, we show comprehensiveablation studies on IterInpaint, including training task ratio, crop&paste vs.repaint, and generation order. Project website: ", "output": "Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-objective portfolio optimisation is a critical problem researchedacross various fields of study as it achieves the objective of maximising theexpected return while minimising the risk of a given portfolio at the sametime. However, many studies fail to include realistic constraints in the model,which limits practical trading strategies. This study introduces realisticconstraints, such as transaction and holding costs, into an optimisation model.Due to the non-convex nature of this problem, metaheuristic algorithms, such asNSGA-II, R-NSGA-II, NSGA-III and U-NSGA-III, will play a vital role in solvingthe problem. Furthermore, a learnheuristic approach is taken as surrogatemodels enhance the metaheuristics employed. These algorithms are then comparedto the baseline metaheuristic algorithms, which solve a constrained,multi-objective optimisation problem without using learnheuristics. The resultsof this study show that, despite taking significantly longer to run tocompletion, the learnheuristic algorithms outperform the baseline algorithms interms of hypervolume and rate of convergence. Furthermore, the backtestingresults indicate that utilising learnheuristics to generate weights for assetallocation leads to a lower risk percentage, higher expected return and higherSharpe ratio than backtesting without using learnheuristics. This leads us toconclude that using learnheuristics to solve a constrained, multi-objectiveportfolio optimisation problem produces superior and preferable results thansolving the problem without using learnheuristics.", "output": "A Learnheuristic Approach to A Constrained Multi-Objective Portfolio Optimisation Problem."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider an important problem in scientific discovery, identifying sparsegoverning equations for nonlinear dynamical systems. This involves solvingsparse ridge regression problems to provable optimality in order to determinewhich terms drive the underlying dynamics. We propose a fast algorithm,OKRidge, for sparse ridge regression, using a novel lower bound calculationinvolving, first, a saddle point formulation, and from there, either solving(i) a linear system or (ii) using an ADMM-based approach, where the proximaloperators can be efficiently evaluated by solving another linear system and anisotonic regression problem. We also propose a method to warm-start our solver,which leverages a beam search. Experimentally, our methods attain provableoptimality with run times that are orders of magnitude faster than those of theexisting MIP formulations solved by the commercial solver Gurobi.", "output": "OKRidge: Scalable Optimal k-Sparse Ridge Regression for Learning Dynamical Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a novel way of solving the issue of classification ofout-of-vocabulary gestures using Artificial Neural Networks (ANNs) trained inthe Generative Adversarial Network (GAN) framework. A generative model augmentsthe data set in an online fashion with new samples and stochastic targetvectors, while a discriminative model determines the class of the samples. Theapproach was evaluated on the UC2017 SG and UC2018 DualMyo data sets. Thegenerative models performance was measured with a distance metric betweengenerated and real samples. The discriminative models were evaluated by theiraccuracy on trained and novel classes. In terms of sample generation quality,the GAN is significantly better than a random distribution (noise) in meandistance, for all classes. In the classification tests, the baseline neuralnetwork was not capable of identifying untrained gestures. When the proposedmethodology was implemented, we found that there is a trade-off between thedetection of trained and untrained gestures, with some trained samples beingmistaken as novelty. Nevertheless, a novelty detection accuracy of 95.4% or90.2% (depending on the data set) was achieved with just 5% loss of accuracy ontrained classes.", "output": "Improving novelty detection with generative adversarial networks on hand gesture data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have recently become the de-facto approach for generativemodeling in the 2D domain. However, extending diffusion models to 3D ischallenging due to the difficulties in acquiring 3D ground truth data fortraining. On the other hand, 3D GANs that integrate implicit 3D representationsinto GANs have shown remarkable 3D-aware generation when trained only onsingle-view image datasets. However, 3D GANs do not provide straightforwardways to precisely control image synthesis. To address these challenges, Wepresent Control3Diff, a 3D diffusion model that combines the strengths ofdiffusion models and 3D GANs for versatile, controllable 3D-aware imagesynthesis for single-view datasets. Control3Diff explicitly models theunderlying latent distribution (optionally conditioned on external inputs),thus enabling direct control during the diffusion process. Moreover, ourapproach is general and applicable to any type of controlling input, allowingus to train it with the same diffusion objective without any auxiliarysupervision. We validate the efficacy of Control3Diff on standard imagegeneration benchmarks, including FFHQ, AFHQ, and ShapeNet, using variousconditioning inputs such as images, sketches, and text prompts. Please see theproject website (url{ for video comparisons.", "output": "Learning Controllable 3D Diffusion Models from Single-view Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Individual human decision-makers may benefit from different forms of supportto improve decision outcomes. However, a key question is which form of supportwill lead to accurate decisions at a low cost. In this work, we proposelearning a decision support policy that, for a given input, chooses which formof support, if any, to provide. We consider decision-makers for whom we have noprior information and formalize learning their respective policies as amulti-objective optimization problem that trades off accuracy and cost. Usingtechniques from stochastic contextual bandits, we propose $texttt{THREAD}$, anonline algorithm to personalize a decision support policy for eachdecision-maker, and devise a hyper-parameter tuning strategy to identify acost-performance trade-off using simulated human behavior. We providecomputational experiments to demonstrate the benefits of $texttt{THREAD}$compared to offline baselines. We then introduce $texttt{Modiste}$, aninteractive tool that provides $texttt{THREAD}$ with an interface. We conducthuman subject experiments to show how $texttt{Modiste}$ learns policiespersonalized to each decision-maker and discuss the nuances of learningdecision support policies online for real users.", "output": "Learning Personalized Decision Support Policies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural Radiance Field training can be accelerated through the use ofgrid-based representations in NeRF's learned mapping from spatial coordinatesto colors and volumetric density. However, these grid-based approaches lack anexplicit understanding of scale and therefore often introduce aliasing, usuallyin the form of jaggies or missing scene content. Anti-aliasing has previouslybeen addressed by mip-NeRF 360, which reasons about sub-volumes along a conerather than points along a ray, but this approach is not natively compatiblewith current grid-based techniques. We show how ideas from rendering and signalprocessing can be used to construct a technique that combines mip-NeRF 360 andgrid-based models such as Instant NGP to yield error rates that are 8% - 76%lower than either prior technique, and that trains 22x faster than mip-NeRF360.", "output": "Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Interpretability methods are valuable only if their explanations faithfullydescribe the explained model. In this work, we consider neural networks whosepredictions are invariant under a specific symmetry group. This includespopular architectures, ranging from convolutional to graph neural networks. Anyexplanation that faithfully explains this type of model needs to be inagreement with this invariance property. We formalize this intuition throughthe notion of explanation invariance and equivariance by leveraging theformalism from geometric deep learning. Through this rigorous formalism, wederive (1) two metrics to measure the robustness of any interpretability methodwith respect to the model symmetry group; (2) theoretical robustness guaranteesfor some popular interpretability methods and (3) a systematic approach toincrease the invariance of any interpretability method with respect to asymmetry group. By empirically measuring our metrics for explanations of modelsassociated with various modalities and symmetry groups, we derive a set of 5guidelines to allow users and developers of interpretability methods to producerobust explanations.", "output": "Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Plain text has become a prevalent interface for text-to-image synthesis.However, its limited customization options hinder users from accuratelydescribing desired outputs. For example, plain text makes it hard to specifycontinuous quantities, such as the precise RGB color value or importance ofeach word. Furthermore, creating detailed text prompts for complex scenes istedious for humans to write and challenging for text encoders to interpret. Toaddress these challenges, we propose using a rich-text editor supportingformats such as font style, size, color, and footnote. We extract each word'sattributes from rich text to enable local style control, explicit tokenreweighting, precise color rendering, and detailed region synthesis. We achievethese capabilities through a region-based diffusion process. We first obtaineach word's region based on cross-attention maps of a vanilla diffusion processusing plain text. For each region, we enforce its text attributes by creatingregion-specific detailed prompts and applying region-specific guidance. Wepresent various examples of image generation from rich text and demonstratethat our method outperforms strong baselines with quantitative evaluations.", "output": "Expressive Text-to-Image Generation with Rich Text."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Previous hypergraph expansions are solely carried out on either vertex levelor hyperedge level, thereby missing the symmetric nature of data co-occurrence,and resulting in information loss. To address the problem, this paper treatsvertices and hyperedges equally and proposes a new hypergraph formulation namedthe emph{line expansion (LE)} for hypergraphs learning. The new expansionbijectively induces a homogeneous structure from the hypergraph by treatingvertex-hyperedge pairs as \"line nodes\". By reducing the hypergraph to a simplegraph, the proposed emph{line expansion} makes existing graph learningalgorithms compatible with the higher-order structure and has been proven as aunifying framework for various hypergraph expansions. We evaluate the proposedline expansion on five hypergraph datasets, the results show that our methodbeats SOTA baselines by a significant margin.", "output": "Semi-supervised Hypergraph Node Classification on Hypergraph Line Expansion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Overfitting data is a well-known phenomenon related with the generation of amodel that mimics too closely (or exactly) a particular instance of data, andmay therefore fail to predict future observations reliably. In practice, thisbehaviour is controlled by various--sometimes heuristics--regularizationtechniques, which are motivated by developing upper bounds to thegeneralization error. In this work, we study the generalization error ofclassifiers relying on stochastic encodings trained on the cross-entropy loss,which is often used in deep learning for classification problems. We derivebounds to the generalization error showing that there exists a regime where thegeneralization error is bounded by the mutual information between inputfeatures and the corresponding representations in the latent space, which arerandomly generated according to the encoding distribution. Our bounds providean information-theoretic understanding of generalization in the so-called classof variational classifiers, which are regularized by a Kullback-Leibler (KL)divergence term. These results give theoretical grounds for the highly popularKL term in variational inference methods that was already recognized to acteffectively as a regularization penalty. We further observe connections withwell studied notions such as Variational Autoencoders, Information Dropout,Information Bottleneck and Boltzmann Machines. Finally, we perform numericalexperiments on MNIST and CIFAR datasets and show that mutual information isindeed highly representative of the behaviour of the generalization error.", "output": "The Role of Mutual Information in Variational Classifiers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Univariate and multivariate normal probability distributions are widely usedwhen modeling decisions under uncertainty. Computing the performance of suchmodels requires integrating these distributions over specific domains, whichcan vary widely across models. Besides some special cases, there exist nogeneral analytical expressions, standard numerical methods or software forthese integrals. Here we present mathematical results and open-source softwarethat provide (i) the probability in any domain of a normal in any dimensionswith any parameters, (ii) the probability density, cumulative distribution, andinverse cumulative distribution of any function of a normal vector, (iii) theclassification errors among any number of normal distributions, theBayes-optimal discriminability index and relation to the operatingcharacteristic, (iv) dimension reduction and visualizations for such problems,and (v) tests for how reliably these methods may be used on given data. Wedemonstrate these tools with vision research applications of detectingoccluding objects in natural scenes, and detecting camouflage.", "output": "A method to integrate and classify normal distributions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Areas under ROC (AUROC) and precision-recall curves (AUPRC) are commonmetrics for evaluating classification performance for imbalanced problems.Compared with AUROC, AUPRC is a more appropriate metric for highly imbalanceddatasets. While stochastic optimization of AUROC has been studied extensively,principled stochastic optimization of AUPRC has been rarely explored. In thiswork, we propose a principled technical method to optimize AUPRC for deeplearning. Our approach is based on maximizing the averaged precision (AP),which is an unbiased point estimator of AUPRC. We cast the objective into a sumof {it dependent compositional functions} with inner functions dependent onrandom variables of the outer level. We propose efficient adaptive andnon-adaptive stochastic algorithms named SOAP with {it provable convergenceguarantee under mild conditions} by leveraging recent advances in stochasticcompositional optimization. Extensive experimental results on image and graphdatasets demonstrate that our proposed method outperforms prior methods onimbalanced problems in terms of AUPRC. To the best of our knowledge, our workrepresents the first attempt to optimize AUPRC with provable convergence. TheSOAP has been implemented in the libAUC library at~url{", "output": "Stochastic Optimization of Areas Under Precision-Recall Curves with Provable Convergence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the infinite-horizon LQR control problem. Motivated bycompetitive analysis in online learning, as a criterion for controller designwe introduce the dynamic regret, defined as the difference between the LQR costof a causal controller (that has only access to past disturbances) and the LQRcost of the emph{unique} clairvoyant one (that has also access to futuredisturbances) that is known to dominate all other controllers. The regretitself is a function of the disturbances, and we propose to find a causalcontroller that minimizes the worst-case regret over all bounded energydisturbances. The resulting controller has the interpretation of guaranteeingthe smallest regret compared to the best non-causal controller that can see thefuture. We derive explicit formulas for the optimal regret and for theregret-optimal controller for the state-space setting. These explicit solutionsare obtained by showing that the regret-optimal control problem can be reducedto a Nehari extension problem that can be solved explicitly. The regret-optimalcontroller is shown to be linear and can be expressed as the sum of theclassical $H_2$ state-feedback law and an $n$-th order controller ($n$ is thestate dimension), and its construction simply requires a solution to thestandard LQR Riccati equation and two Lyapunov equations. Simulations over arange of plants demonstrate that the regret-optimal controller interpolatesnicely between the $H_2$ and the $H_infty$ optimal controllers, and generallyhas $H_2$ and $H_infty$ costs that are simultaneously close to their optimalvalues. The regret-optimal controller thus presents itself as a viable optionfor control systems design.", "output": "Regret-Optimal LQR Control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In federated learning (FL), data does not leave personal devices when theyare jointly training a machine learning model. Instead, these devices sharegradients, parameters, or other model updates, with a central party (e.g., acompany) coordinating the training. Because data never \"leaves\" personaldevices, FL is often presented as privacy-preserving. Yet, recently it wasshown that this protection is but a thin facade, as even a passive,honest-but-curious attacker observing gradients can reconstruct data ofindividual users contributing to the protocol. In this work, we show a noveldata reconstruction attack which allows an active and dishonest central partyto efficiently extract user data from the received gradients. While prior workon data reconstruction in FL relies on solving computationally expensiveoptimization problems or on making easily detectable modifications to theshared model's architecture or parameters, in our attack the central partymakes inconspicuous changes to the shared model's weights before sending themout to the users. We call the modified weights of our attack trap weights. Ouractive attacker is able to recover user data perfectly, i.e., with zero error,even when this data stems from the same class. Recovery comes with near-zerocosts: the attack requires no complex optimization objectives. Instead, ourattacker exploits inherent data leakage from model gradients and simplyamplifies this effect by maliciously altering the weights of the shared modelthrough the trap weights. These specificities enable our attack to scale tofully-connected and convolutional deep neural networks trained with largemini-batches of data. For example, for the high-dimensional vision datasetImageNet, we perfectly reconstruct more than 50% of the training data pointsfrom mini-batches as large as 100 data points.", "output": "When the Curious Abandon Honesty: Federated Learning Is Not Private."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vertical federated learning (VFL) system has recently become prominent as aconcept to process data distributed across many individual sources without theneed to centralize it. Multiple participants collaboratively train models basedon their local data in a privacy-aware manner. To date, VFL has become a defacto solution to securely learn a model among organizations, allowingknowledge to be shared without compromising privacy of any individuals. Despitethe prosperous development of VFL systems, we find that certain inputs of aparticipant, named adversarial dominating inputs (ADIs), can dominate the jointinference towards the direction of the adversary's will and force other(victim) participants to make negligible contributions, losing rewards that areusually offered regarding the importance of their contributions in federatedlearning scenarios. We conduct a systematic study on ADIs by first provingtheir existence in typical VFL systems. We then propose gradient-based methodsto synthesize ADIs of various formats and exploit common VFL systems. Wefurther launch greybox fuzz testing, guided by the saliency score of ``victim''participants, to perturb adversary-controlled inputs and systematically explorethe VFL attack surface in a privacy-preserving manner. We conduct an in-depthstudy on the influence of critical parameters and settings in synthesizingADIs. Our study reveals new VFL attack opportunities, promoting theidentification of unknown threats before breaches and building more secure VFLsystems.", "output": "ADI: Adversarial Dominating Inputs in Vertical Federated Learning Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning optimal control policies directly on physical systems is challengingsince even a single failure can lead to costly hardware damage. Most existingmodel-free learning methods that guarantee safety, i.e., no failures, duringexploration are limited to local optima. A notable exception is the GoSafealgorithm, which, unfortunately, cannot handle high-dimensional systems andhence cannot be applied to most real-world dynamical systems. This workproposes GoSafeOpt as the first algorithm that can safely discover globallyoptimal policies for high-dimensional systems while giving safety andoptimality guarantees. We demonstrate the superiority of GoSafeOpt overcompeting model-free safe learning methods on a robot arm that would beprohibitive for GoSafe.", "output": "GoSafeOpt: Scalable Safe Exploration for Global Optimization of Dynamical Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Methods for teaching motion skills to robots focus on training for a singleskill at a time. Robots capable of learning from demonstration can considerablybenefit from the added ability to learn new movement skills without forgettingwhat was learned in the past. To this end, we propose an approach for continuallearning from demonstration using hypernetworks and neural ordinarydifferential equation solvers. We empirically demonstrate the effectiveness ofthis approach in remembering long sequences of trajectory learning taskswithout the need to store any data from past demonstrations. Our results showthat hypernetworks outperform other state-of-the-art continual learningapproaches for learning from demonstration. In our experiments, we use thepopular LASA benchmark, and two new datasets of kinesthetic demonstrationscollected with a real robot that we introduce in this paper called theHelloWorld and RoboTasks datasets. We evaluate our approach on a physical robotand demonstrate its effectiveness in learning real-world robotic tasksinvolving changing positions as well as orientations. We report both trajectoryerror metrics and continual learning metrics, and we propose two new continuallearning metrics. Our code, along with the newly collected datasets, isavailable at ", "output": "Continual Learning from Demonstration of Robotics Skills."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a new consistency-based approach for defining and solvingnonnegative/positive matrix and tensor completion problems. The novelty of theframework is that instead of artificially making the problem well-posed in theform of an application-arbitrary optimization problem, e.g., minimizing a bulkstructural measure such as rank or norm, we show that a singleproperty/constraint: preserving unit-scale consistency, guarantees theexistence of both a solution and, under relatively weak support assumptions,uniqueness. The framework and solution algorithms also generalize directly totensors of arbitrary dimensions while maintaining computational complexity thatis linear in problem size for fixed dimension d. In the context of recommendersystem (RS) applications, we prove that two reasonable properties that shouldbe expected to hold for any solution to the RS problem are sufficient to permituniqueness guarantees to be established within our framework. Key theoreticalcontributions include a general unit-consistent tensor-completion frameworkwith proofs of its properties, e.g., consensus-order and fairness, andalgorithms with optimal runtime and space complexities, e.g., O(1)term-completion with preprocessing complexity that is linear in the number ofknown terms of the matrix/tensor. From a practical perspective, the seamlessability of the framework to generalize to exploit high-dimensional structuralrelationships among key state variables, e.g., user and product attributes,offers a means for extracting significantly more information than is possiblefor alternative methods that cannot generalize beyond direct user-productrelationships. Finally, we propose our consensus ordering property as anadmissibility criterion for any proposed RS method.", "output": "Tensor Completion with Provable Consistency and Fairness Guarantees for Recommender Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "``Benign overfitting'', the ability of certain algorithms to interpolatenoisy training data and yet perform well out-of-sample, has been a topic ofconsiderable recent interest. We show, using a fixed design setup, that animportant class of predictors, kernel machines with translation-invariantkernels, does not exhibit benign overfitting in fixed dimensions. Inparticular, the estimated predictor does not converge to the ground truth withincreasing sample size, for any non-zero regression function and any (evenadaptive) bandwidth selection. To prove these results, we give exactexpressions for the generalization error, and its decomposition in terms of anapproximation error and an estimation error that elicits a trade-off based onthe selection of the kernel bandwidth. Our results apply to commonly usedtranslation-invariant kernels such as Gaussian, Laplace, and Cauchy.", "output": "On the Inconsistency of Kernel Ridgeless Regression in Fixed Dimensions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A significant gap remains between today's visual pattern recognition modelsand human-level visual cognition especially when it comes to few-shot learningand compositional reasoning of novel concepts. We introduce Bongard-HOI, a newvisual reasoning benchmark that focuses on compositional learning ofhuman-object interactions (HOIs) from natural images. It is inspired by twodesirable characteristics from the classical Bongard problems (BPs): 1)few-shot concept learning, and 2) context-dependent reasoning. We carefullycurate the few-shot instances with hard negatives, where positive and negativeimages only disagree on action labels, making mere recognition of objectcategories insufficient to complete our benchmarks. We also design multipletest sets to systematically study the generalization of visual learning models,where we vary the overlap of the HOI concepts between the training and testsets of few-shot instances, from partial to no overlaps. Bongard-HOI presents asubstantial challenge to today's visual recognition models. Thestate-of-the-art HOI detection model achieves only 62% accuracy on few-shotbinary prediction while even amateur human testers on MTurk have 91% accuracy.With the Bongard-HOI benchmark, we hope to further advance research efforts invisual reasoning, especially in holistic perception-reasoning systems andbetter representation learning.", "output": "Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The most popular methods for measuring importance of the variables in a blackbox prediction algorithm make use of synthetic inputs that combine predictorvariables from multiple subjects. These inputs can be unlikely, physicallyimpossible, or even logically impossible. As a result, the predictions for suchcases can be based on data very unlike any the black box was trained on. Wethink that users cannot trust an explanation of the decision of a predictionalgorithm when the explanation uses such values. Instead we advocate a methodcalled Cohort Shapley that is grounded in economic game theory and unlike mostother game theoretic methods, it uses only actually observed data to quantifyvariable importance. Cohort Shapley works by narrowing the cohort of subjectsjudged to be similar to a target subject on one or more features. We illustrateit on an algorithmic fairness problem where it is essential to attributeimportance to protected variables that the model was not trained on.", "output": "Variable importance without impossible data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While diffusion models have shown great success in image generation, theirnoise-inverting generative process does not explicitly consider the structureof images, such as their inherent multi-scale nature. Inspired by diffusionmodels and the empirical success of coarse-to-fine modelling, we propose a newdiffusion-like model that generates images through stochastically reversing theheat equation, a PDE that locally erases fine-scale information when run overthe 2D plane of the image. We interpret the solution of the forward heatequation with constant additive noise as a variational approximation in thediffusion latent variable model. Our new model shows emergent qualitativeproperties not seen in standard diffusion models, such as disentanglement ofoverall colour and shape in images. Spectral analysis on natural imageshighlights connections to diffusion models and reveals an implicitcoarse-to-fine inductive bias in them.", "output": "Generative Modelling With Inverse Heat Dissipation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Densely annotating LiDAR point clouds is costly, which restrains thescalability of fully-supervised learning methods. In this work, we study theunderexplored semi-supervised learning (SSL) in LiDAR segmentation. Our coreidea is to leverage the strong spatial cues of LiDAR point clouds to betterexploit unlabeled data. We propose LaserMix to mix laser beams from differentLiDAR scans, and then encourage the model to make consistent and confidentpredictions before and after mixing. Our framework has three appealingproperties: 1) Generic: LaserMix is agnostic to LiDAR representations (e.g.,range view and voxel), and hence our SSL framework can be universally applied.2) Statistically grounded: We provide a detailed analysis to theoreticallyexplain the applicability of the proposed framework. 3) Effective:Comprehensive experimental analysis on popular LiDAR segmentation datasets(nuScenes, SemanticKITTI, and ScribbleKITTI) demonstrates our effectiveness andsuperiority. Notably, we achieve competitive results over fully-supervisedcounterparts with 2x to 5x fewer labels and improve the supervised-onlybaseline significantly by 10.8% on average. We hope this concise yethigh-performing framework could facilitate future research in semi-supervisedLiDAR segmentation. Code is publicly available.", "output": "LaserMix for Semi-Supervised LiDAR Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Given its vast application on online social networks, Influence Maximization(IM) has garnered considerable attention over the last couple of decades. Dueto the intricacy of IM, most current research concentrates on estimating thefirst-order contribution of the nodes to select a seed set, disregarding thehigher-order interplay between different seeds. Consequently, the actualinfluence spread frequently deviates from expectations, and it remains unclearhow the seed set quantitatively contributes to this deviation. To address thisdeficiency, this work dissects the influence exerted on individual seeds andtheir higher-order interactions utilizing the Sobol index, a variance-basedsensitivity analysis. To adapt to IM contexts, seed selection is phrased asbinary variables and split into distributions of varying orders. Based on ouranalysis with various Sobol indices, an IM algorithm dubbed SIM is proposed toimprove the performance of current IM algorithms by over-selecting nodesfollowed by strategic pruning. A case study is carried out to demonstrate thatthe explanation of the impact effect can dependably identify the keyhigher-order interactions among seeds. SIM is empirically proved to be superiorin effectiveness and competitive in efficiency by experiments on synthetic andreal-world graphs.", "output": "Understanding Influence Maximization via Higher-Order Decomposition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data-driven reduced-order models often fail to make accurate forecasts ofhigh-dimensional nonlinear dynamical systems that are sensitive alongcoordinates with low-variance because such coordinates are often truncated,e.g., by proper orthogonal decomposition, kernel principal component analysis,and autoencoders. Such systems are encountered frequently in shear-dominatedfluid flows where non-normality plays a significant role in the growth ofdisturbances. In order to address these issues, we employ ideas from activesubspaces to find low-dimensional systems of coordinates for model reductionthat balance adjoint-based information about the system's sensitivity with thevariance of states along trajectories. The resulting method, which we refer toas covariance balancing reduction using adjoint snapshots (CoBRAS), isanalogous to balanced truncation with state and adjoint-based gradientcovariance matrices replacing the system Gramians and obeying the same keytransformation laws. Here, the extracted coordinates are associated with anoblique projection that can be used to construct Petrov-Galerkin reduced-ordermodels. We provide an efficient snapshot-based computational method analogousto balanced proper orthogonal decomposition. This also leads to the observationthat the reduced coordinates can be computed relying on inner products of stateand gradient samples alone, allowing us to find rich nonlinear coordinates byreplacing the inner product with a kernel function. In these coordinates,reduced-order models can be learned using regression. We demonstrate thesetechniques and compare to a variety of other methods on a simple, yetchallenging three-dimensional system and a nonlinear axisymmetric jet flowsimulation with $10^5$ state variables.", "output": "Model Reduction for Nonlinear Systems by Balanced Truncation of State and Gradient Covariance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the problem of graph structure identification, i.e., of recoveringthe graph of dependencies among time series. We model these time series data ascomponents of the state of linear stochastic networked dynamical systems. Weassume partial observability, where the state evolution of only a subset ofnodes comprising the network is observed. We devise a new feature vectorcomputed from the observed time series and prove that these features arelinearly separable, i.e., there exists a hyperplane that separates the clusterof features associated with connected pairs of nodes from those associated withdisconnected pairs. This renders the features amenable to train a variety ofclassifiers to perform causal inference. In particular, we use these featuresto train Convolutional Neural Networks (CNNs). The resulting causal inferencemechanism outperforms state-of-the-art counterparts w.r.t. sample-complexity.The trained CNNs generalize well over structurally distinct networks (dense orsparse) and noise-level profiles. Remarkably, they also generalize well toreal-world networks while trained over a synthetic network (realization of arandom graph). Finally, the proposed method consistently reconstructs the graphin a pairwise manner, that is, by deciding if an edge or arrow is present orabsent in each pair of nodes, from the corresponding time series of each pair.This fits the framework of large-scale systems, where observation or processingof all nodes in the network is prohibitive.", "output": "Recovering the Graph Underlying Networked Dynamical Systems under Partial Observability: A Deep Learning Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural Network (Deep Learning) is a modern model in Artificial Intelligenceand it has been exploited in Survival Analysis. Although several improvementshave been shown by previous works, training an excellent deep learning modelrequires a huge amount of data, which may not hold in practice. To address thischallenge, we develop a Kullback-Leibler-based (KL) deep learning procedure tointegrate external survival prediction models with newly collectedtime-to-event data. Time-dependent KL discrimination information is utilized tomeasure the discrepancy between the external and internal data. This is thefirst work considering using prior information to deal with short data problemin Survival Analysis for deep learning. Simulation and real data results showthat the proposed model achieves better performance and higher robustnesscompared with previous works.", "output": "KL-divergence Based Deep Learning for Discrete Time Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "How can we design protein sequences folding into the desired structureseffectively and efficiently? AI methods for structure-based protein design haveattracted increasing attention in recent years; however, few methods cansimultaneously improve the accuracy and efficiency due to the lack ofexpressive features and autoregressive sequence decoder. To address theseissues, we propose PiFold, which contains a novel residue featurizer and PiGNNlayers to generate protein sequences in a one-shot way with improved recovery.Experiments show that PiFold could achieve 51.66% recovery on CATH 4.2, whilethe inference speed is 70 times faster than the autoregressive competitors. Inaddition, PiFold achieves 58.72% and 60.42% recovery scores on TS50 andTS500, respectively. We conduct comprehensive ablation studies to reveal therole of different types of protein features and model designs, inspiringfurther simplification and improvement. The PyTorch code is available athref{", "output": "PiFold: Toward effective and efficient protein inverse folding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The method of random Fourier features (RFF), proposed in a seminal paper byRahimi and Recht (NIPS'07), is a powerful technique to find approximatelow-dimensional representations of points in (high-dimensional) kernel space,for shift-invariant kernels. While RFF has been analyzed under various notionsof error guarantee, the ability to preserve the kernel distance withemph{relative} error is less understood. We show that for a significant rangeof kernels, including the well-known Laplacian kernels, RFF cannot approximatethe kernel distance with small relative error using low dimensions. Wecomplement this by showing as long as the shift-invariant kernel is analytic,RFF with $mathrm{poly}(epsilon^{-1} log n)$ dimensions achieves$epsilon$-relative error for pairwise kernel distance of $n$ points, and thedimension bound is improved to $mathrm{poly}(epsilon^{-1}log k)$ for thespecific application of kernel $k$-means. Finally, going beyond RFF, we makethe first step towards data-oblivious dimension-reduction for generalshift-invariant kernels, and we obtain a similar $mathrm{poly}(epsilon^{-1}log n)$ dimension bound for Laplacian kernels. We also validate thedimension-error tradeoff of our methods on simulated datasets, and theydemonstrate superior performance compared with other popular methods includingrandom-projection and Nystr\"{o}m methods.", "output": "On The Relative Error of Random Fourier Features for Preserving Kernel Distance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Bayesian neural networks (BNNs) have received an increased interest in thelast years. In BNNs, a complete posterior distribution of the unknown weightand bias parameters of the network is produced during the training stage. Thisprobabilistic estimation offers several advantages with respect to point-wiseestimates, in particular, the ability to provide uncertainty quantificationwhen predicting new data. This feature inherent to the Bayesian paradigm, isuseful in countless machine learning applications. It is particularly appealingin areas where decision-making has a crucial impact, such as medical healthcareor autonomous driving. The main challenge of BNNs is the computational cost ofthe training procedure since Bayesian techniques often face a severe curse ofdimensionality. Adaptive importance sampling (AIS) is one of the most prominentMonte Carlo methodologies benefiting from sounded convergence guarantees andease for adaptation. This work aims to show that AIS constitutes a successfulapproach for designing BNNs. More precisely, we propose a novel algorithmPMCnet that includes an efficient adaptation mechanism, exploiting geometricinformation on the complex (often multimodal) posterior distribution. Numericalresults illustrate the excellent performance and the improved explorationcapabilities of the proposed method for both shallow and deep neural networks.", "output": "Efficient Bayes Inference in Neural Networks through Adaptive Importance Sampling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Classifier-free guided diffusion models have recently been shown to be highlyeffective at high-resolution image generation, and they have been widely usedin large-scale diffusion frameworks including DALLE-2, Stable Diffusion andImagen. However, a downside of classifier-free guided diffusion models is thatthey are computationally expensive at inference time since they requireevaluating two diffusion models, a class-conditional model and an unconditionalmodel, tens to hundreds of times. To deal with this limitation, we propose anapproach to distilling classifier-free guided diffusion models into models thatare fast to sample from: Given a pre-trained classifier-free guided model, wefirst learn a single model to match the output of the combined conditional andunconditional models, and then we progressively distill that model to adiffusion model that requires much fewer sampling steps. For standard diffusionmodels trained on the pixel-space, our approach is able to generate imagesvisually comparable to that of the original model using as few as 4 samplingsteps on ImageNet 64x64 and CIFAR-10, achieving FID/IS scores comparable tothat of the original model while being up to 256 times faster to sample from.For diffusion models trained on the latent-space (e.g., Stable Diffusion), ourapproach is able to generate high-fidelity images using as few as 1 to 4denoising steps, accelerating inference by at least 10-fold compared toexisting methods on ImageNet 256x256 and LAION datasets. We further demonstratethe effectiveness of our approach on text-guided image editing and inpainting,where our distilled model is able to generate high-quality results using as fewas 2-4 denoising steps.", "output": "On Distillation of Guided Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Progress in deep learning highlights the tremendous potential of utilizingdiverse robotic datasets for attaining effective generalization and makes itenticing to consider leveraging broad datasets for attaining robustgeneralization in robotic learning as well. However, in practice, we often wantto learn a new skill in a new environment that is unlikely to be contained inthe prior data. Therefore we ask: how can we leverage existing diverse offlinedatasets in combination with small amounts of task-specific data to solve newtasks, while still enjoying the generalization benefits of training on largeamounts of data? In this paper, we demonstrate that end-to-end offline RL canbe an effective approach for doing this, without the need for anyrepresentation learning or vision-based pre-training. We present pre-trainingfor robots (PTR), a framework based on offline RL that attempts to effectivelylearn new tasks by combining pre-training on existing robotic datasets withrapid fine-tuning on a new task, with as few as 10 demonstrations. PTR utilizesan existing offline RL method, conservative Q-learning (CQL), but extends it toinclude several crucial design decisions that enable PTR to actually work andoutperform a variety of prior methods. To our knowledge, PTR is the first RLmethod that succeeds at learning new tasks in a new domain on a real WidowXrobot with as few as 10 task demonstrations, by effectively leveraging anexisting dataset of diverse multi-task robot data collected in a variety of toykitchens. We also demonstrate that PTR can enable effective autonomousfine-tuning and improvement in a handful of trials, without needing anydemonstrations. An accompanying overview video can be found in thesupplementary material and at this anonymous URL:", "output": "Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a new task to benchmark scene understanding of embodied agents:Situated Question Answering in 3D Scenes (SQA3D). Given a scene context (e.g.,3D scan), SQA3D requires the tested agent to first understand its situation(position, orientation, etc.) in the 3D scene as described by text, then reasonabout its surrounding environment and answer a question under that situation.Based upon 650 scenes from ScanNet, we provide a dataset centered around 6.8kunique situations, along with 20.4k descriptions and 33.4k diverse reasoningquestions for these situations. These questions examine a wide spectrum ofreasoning capabilities for an intelligent agent, ranging from spatial relationcomprehension to commonsense understanding, navigation, and multi-hopreasoning. SQA3D imposes a significant challenge to current multi-modalespecially 3D reasoning models. We evaluate various state-of-the-art approachesand find that the best one only achieves an overall score of 47.20%, whileamateur human participants can reach 90.06%. We believe SQA3D could facilitatefuture embodied AI research with stronger situation understanding and reasoningcapability.", "output": "SQA3D: Situated Question Answering in 3D Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning (ML) enabled classification models are becoming increasinglypopular for tackling the sheer volume and speed of online misinformation andother content that could be identified as harmful. In building these models,data scientists need to take a stance on the legitimacy, authoritativeness andobjectivity of the sources of ``truth\" used for model training and testing.This has political, ethical and epistemic implications which are rarelyaddressed in technical papers. Despite (and due to) their reported highaccuracy and performance, ML-driven moderation systems have the potential toshape online public debate and create downstream negative impacts such as unduecensorship and the reinforcing of false beliefs. Using collaborativeethnography and theoretical insights from social studies of science andexpertise, we offer a critical analysis of the process of building ML modelsfor (mis)information classification: we identify a series of algorithmiccontingencies--key moments during model development that could lead todifferent future outcomes, uncertainty and harmful effects as these tools aredeployed by social media platforms. We conclude by offering a tentative pathtoward reflexive and responsible development of ML tools for moderatingmisinformation and other harmful content online.", "output": "Addressing contingency in algorithmic (mis)information classification: Toward a responsible machine learning agenda."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurate time prediction of patients' critical events is crucial in urgentscenarios where timely decision-making is important. Though many studies haveproposed automatic prediction methods using Electronic Health Records (EHR),their coarse-grained time resolutions limit their practical usage in urgentenvironments such as the emergency department (ED) and intensive care unit(ICU). Therefore, in this study, we propose an hourly prediction method basedon self-supervised predictive coding and multi-modal fusion for two criticaltasks: mortality and vasopressor need prediction. Through extensiveexperiments, we prove significant performance gains from both multi-modalfusion and self-supervised predictive regularization, most notably infar-future prediction, which becomes especially important in practice. Ouruni-modal/bi-modal/bi-modal self-supervision scored 0.846/0.877/0.897(0.824/0.855/0.886) and 0.817/0.820/0.858 (0.807/0.81/0.855) with mortality(far-future mortality) and with vasopressor need (far-future vasopressor need)prediction data in AUROC, respectively.", "output": "Self-Supervised Predictive Coding with Multimodal Fusion for Patient Deterioration Prediction in Fine-grained Time Resolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of iterative machine teaching, where a teachersequentially provides examples based on the status of a learner under adiscrete input space (i.e., a pool of finite samples), which greatly limits theteacher's capability. To address this issue, we study iterative teaching undera continuous input space where the input example (i.e., image) can be eithergenerated by solving an optimization problem or drawn directly from acontinuous distribution. Specifically, we propose data hallucination teaching(DHT) where the teacher can generate input data intelligently based on labels,the learner's status and the target concept. We study a number of challengingteaching setups (e.g., linear/neural learners in omniscient and black-boxsettings). Extensive empirical results verify the effectiveness of DHT.", "output": "Iterative Teaching by Data Hallucination."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of computing mixed Nash equilibria of two-playerzero-sum games with continuous sets of pure strategies and with first-orderaccess to the payoff function. This problem arises for example ingame-theory-inspired machine learning applications, such asdistributionally-robust learning. In those applications, the strategy sets arehigh-dimensional and thus methods based on discretisation cannot tractablyreturn high-accuracy solutions.In this paper, we introduce and analyze a particle-based method that enjoysguaranteed local convergence for this problem. This method consists inparametrizing the mixed strategies as atomic measures and applying proximalpoint updates to both the atoms' weights and positions. It can be interpretedas a time-implicit discretization of the \"interacting\" Wasserstein-Fisher-Raogradient flow.We prove that, under non-degeneracy assumptions, this method converges at anexponential rate to the exact mixed Nash equilibrium from any initializationsatisfying a natural notion of closeness to optimality. We illustrate ourresults with numerical experiments and discuss applications to max-margin anddistributionally-robust classification using two-layer neural networks, whereour method has a natural interpretation as a simultaneous training of thenetwork's weights and of the adversarial distribution.", "output": "An Exponentially Converging Particle Method for the Mixed Nash Equilibrium of Continuous Games."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The task of testing whether two uncharacterized quantum devices behave in thesame way is crucial for benchmarking near-term quantum computers and quantumsimulators, but has so far remained open for continuous-variable quantumsystems. In this Letter, we develop a machine learning algorithm for comparingunknown continuous variable states using limited and noisy data. The algorithmworks on non-Gaussian quantum states for which similarity testing could not beachieved with previous techniques. Our approach is based on a convolutionalneural network that assesses the similarity of quantum states based on alower-dimensional state representation built from measurement data. The networkcan be trained offline with classically simulated data from a fiducial set ofstates sharing structural similarities with the states to be tested, or withexperimental data generated by measurements on the fiducial states, or with acombination of simulated and experimental data. We test the performance of themodel on noisy cat states and states generated by arbitrary selectivenumber-dependent phase gates. Our network can also be applied to the problem ofcomparing continuous variable states across different experimental platforms,with different sets of achievable measurements, and to the problem ofexperimentally testing whether two states are equivalent up to Gaussian unitarytransformations.", "output": "Quantum Similarity Testing with Convolutional Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Causal DAG(Directed Acyclic Graph) usually lies in a 2D plane withoutdistinguishing correlation changes and causal effects. Also, the causal effectis often approximately estimated by averaging the population's correlationchanges. Now, AI(Artificial Intelligence) enables much larger-scale structuralmodeling, whose complex hidden confoundings make the approximation errors nolonger ignorable but can snowball to considerable population-level CausalRepresentation Bias. Such bias has caused significant problems: ungeneralizablecausal models, unrevealed individual-level features, not utilizable causalknowledge in DL(Deep Learning), etc. In short, DAG must be redefined to enablea new framework for causal AI.Observational time series can only reflect correlation changes in statistics.But the DL-based autoencoder can represent them as individual-level featurechanges in latent space to reflect causal effects. In this paper, we introducethe redefined do-DAG concept and propose Causal Representation Learning (CRL)framework as the generic solution, along with a novel architecture to realizeCRL and experimentally verify its feasibility.", "output": "Realization of Causal Representation Learning and Redefined DAG for Causal AI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose Token Turing Machines (TTM), a sequential, autoregressiveTransformer model with memory for real-world sequential visual understanding.Our model is inspired by the seminal Neural Turing Machine, and has an externalmemory consisting of a set of tokens which summarise the previous history(i.e., frames). This memory is efficiently addressed, read and written using aTransformer as the processing unit/controller at each step. The model's memorymodule ensures that a new observation will only be processed with the contentsof the memory (and not the entire history), meaning that it can efficientlyprocess long sequences with a bounded computational cost at each step. We showthat TTM outperforms other alternatives, such as other Transformer modelsdesigned for long sequences and recurrent neural networks, on two real-worldsequential visual understanding tasks: online temporal activity detection fromvideos and vision-based robot action policy learning.Code is publicly available at:", "output": "Token Turing Machines."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing Graph Neural Networks (GNNs) follow the message-passing mechanismthat conducts information interaction among nodes iteratively. Whileconsiderable progress has been made, such node interaction paradigms still havethe following limitation. First, the scalability limitation precludes the broadapplication of GNNs in large-scale industrial settings since the nodeinteraction among rapidly expanding neighbors incurs high computation andmemory costs. Second, the over-smoothing problem restricts the discriminationability of nodes, i.e., node representations of different classes will convergeto indistinguishable after repeated node interactions. In this work, we proposea novel hop interaction paradigm to address these limitations simultaneously.The core idea is to convert the interaction target among nodes to pre-processedmulti-hop features inside each node. We design a simple yet effective HopGNNframework that can easily utilize existing GNNs to achieve hop interaction.Furthermore, we propose a multi-task learning strategy with a self-supervisedlearning objective to enhance HopGNN. We conduct extensive experiments on 12benchmark datasets in a wide range of domains, scales, and smoothness ofgraphs. Experimental results show that our methods achieve superior performancewhile maintaining high scalability and efficiency. The code is at", "output": "From Node Interaction to Hop Interaction: New Effective and Scalable Graph Learning Paradigm."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Due to the rapid advancements in recent years, medical image analysis islargely dominated by deep learning (DL). However, building powerful and robustDL models requires training with large multi-party datasets. While multiplestakeholders have provided publicly available datasets, the ways in which thesedata are labeled vary widely. For Instance, an institution might provide adataset of chest radiographs containing labels denoting the presence ofpneumonia, while another institution might have a focus on determining thepresence of metastases in the lung. Training a single AI model utilizing allthese data is not feasible with conventional federated learning (FL). Thisprompts us to propose an extension to the widespread FL process, namelyflexible federated learning (FFL) for collaborative training on such data.Using 695,000 chest radiographs from five institutions from across the globe -each with differing labels - we demonstrate that having heterogeneously labeleddatasets, FFL-based training leads to significant performance increase comparedto conventional FL training, where only the uniformly annotated images areutilized. We believe that our proposed algorithm could accelerate the processof bringing collaborative training methods from research and simulation phaseto the real-world applications in healthcare.", "output": "Collaborative Training of Medical Artificial Intelligence Models with non-uniform Labels."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Our goal with this survey is to provide an overview of the state of the artdeep learning technologies for face generation and editing. We will coverpopular latest architectures and discuss key ideas that make them work, such asinversion, latent representation, loss functions, training procedures, editingmethods, and cross domain style transfer. We particularly focus on GAN-basedarchitectures that have culminated in the StyleGAN approaches, which allowgeneration of high-quality face images and offer rich interfaces forcontrollable semantics editing and preserving photo quality. We aim to providean entry point into the field for readers that have basic knowledge about thefield of deep learning and are looking for an accessible introduction andoverview.", "output": "Face Generation and Editing with StyleGAN: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a new approach to learning the subgrid-scale model when simulatingpartial differential equations (PDEs) solved by the method of lines and theirrepresentation in chaotic ordinary differential equations, based on neuralordinary differential equations (NODEs). Solving systems with fine temporal andspatial grid scales is an ongoing computational challenge, and closure modelsare generally difficult to tune. Machine learning approaches have increased theaccuracy and efficiency of computational fluid dynamics solvers. In thisapproach neural networks are used to learn the coarse- to fine-grid map, whichcan be viewed as subgrid-scale parameterization. We propose a strategy thatuses the NODE and partial knowledge to learn the source dynamics at acontinuous level. Our method inherits the advantages of NODEs and can be usedto parameterize subgrid scales, approximate coupling operators, and improve theefficiency of low-order solvers. Numerical results with the two-scale Lorenz 96ODE, the convection-diffusion PDE, and the viscous Burgers' PDE are used toillustrate this approach.", "output": "Learning Subgrid-scale Models with Neural Ordinary Differential Equations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) is a framework for users to jointly train a machinelearning model. FL is promoted as a privacy-enhancing technology (PET) thatprovides data minimization: data never \"leaves\" personal devices and usersshare only model updates with a server (e.g., a company) coordinating thedistributed training. While prior work showed that in vanilla FL a maliciousserver can extract users' private data from the model updates, in this work wetake it further and demonstrate that a malicious server can reconstruct userdata even in hardened versions of the protocol. More precisely, we propose anattack against FL protected with distributed differential privacy (DDP) andsecure aggregation (SA). Our attack method is based on the introduction ofsybil devices that deviate from the protocol to expose individual users' datafor reconstruction by the server. The underlying root cause for thevulnerability to our attack is a power imbalance: the server orchestrates thewhole protocol and users are given little guarantees about the selection ofother users participating in the protocol. Moving forward, we discussrequirements for privacy guarantees in FL. We conclude that users should onlyparticipate in the protocol when they trust the server or they apply localprimitives such as local DP, shifting power away from the server. Yet, thelatter approaches come at significant overhead in terms of performancedegradation of the trained model, making them less likely to be deployed inpractice.", "output": "Reconstructing Individual Data Points in Federated Learning Hardened with Differential Privacy and Secure Aggregation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "$L_{p}$-norm regularization schemes such as $L_{0}$, $L_{1}$, and$L_{2}$-norm regularization and $L_{p}$-norm-based regularization techniquessuch as weight decay and group LASSO compute a quantity which depends on modelweights considered in isolation from one another. This paper describes a novelregularizer which is not based on an $L_{p}$-norm. In contrast with$L_{p}$-norm-based regularization, this regularizer is concerned with thespatial arrangement of weights within a weight matrix. This regularizer is anadditive term for the loss function and is differentiable, simple and fast tocompute, scale-invariant, requires a trivial amount of additional memory, andcan easily be parallelized. Empirically this method yields approximately a oneorder-of-magnitude improvement in the number of nonzero model parameters at agiven level of accuracy.", "output": "A Novel Sparse Regularizer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper demonstrates an approach for learning highly semantic imagerepresentations without relying on hand-crafted data-augmentations. Weintroduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), anon-generative approach for self-supervised learning from images. The ideabehind I-JEPA is simple: from a single context block, predict therepresentations of various target blocks in the same image. A core designchoice to guide I-JEPA towards producing semantic representations is themasking strategy; specifically, it is crucial to (a) sample target blocks withsufficiently large scale (semantic), and to (b) use a sufficiently informative(spatially distributed) context block. Empirically, when combined with VisionTransformers, we find I-JEPA to be highly scalable. For instance, we train aViT-Huge/14 on ImageNet using 16 A100 GPUs in under 72 hours to achieve strongdownstream performance across a wide range of tasks, from linear classificationto object counting and depth prediction.", "output": "Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural Architecture Search (NAS) is widely used to automatically obtain theneural network with the best performance among a large number of candidatearchitectures. To reduce the search time, zero-shot NAS aims at designingtraining-free proxies that can predict the test performance of a givenarchitecture. However, as shown recently, none of the zero-shot proxiesproposed to date can actually work consistently better than a naive proxy,namely, the number of network parameters (#Params). To improve this state ofaffairs, as the main theoretical contribution, we first reveal how somespecific gradient properties across different samples impact the convergencerate and generalization capacity of neural networks. Based on this theoreticalanalysis, we propose a new zero-shot proxy, ZiCo, the first proxy that worksconsistently better than #Params. We demonstrate that ZiCo works better thanState-Of-The-Art (SOTA) proxies on several popular NAS-Benchmarks (NASBench101,NATSBench-SSS/TSS, TransNASBench-101) for multiple applications (e.g., imageclassification/reconstruction and pixel-level prediction). Finally, wedemonstrate that the optimal architectures found via ZiCo are as competitive asthe ones found by one-shot and multi-shot NAS methods, but with much lesssearch time. For example, ZiCo-based NAS can find optimal architectures with78.1%, 79.4%, and 80.4% test accuracy under inference budgets of 450M, 600M,and 1000M FLOPs, respectively, on ImageNet within 0.4 GPU days. Our code isavailable at ", "output": "ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present two new classes of algorithms for efficient field integration ongraphs encoding point clouds. The first class, SeparatorFactorization(SF),leverages the bounded genus of point cloud mesh graphs, while the second class,RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representationsfor point clouds. Both can be viewed as providing the functionality of FastMultipole Methods (FMMs), which have had a tremendous impact on efficientintegration, but for non-Euclidean spaces. We focus on geometries induced bydistributions of walk lengths between points (e.g., shortest-path distance). Weprovide an extensive theoretical analysis of our algorithms, obtaining newresults in structural graph theory as a byproduct. We also perform exhaustiveempirical evaluation, including on-surface interpolation for rigid anddeformable objects (particularly for mesh-dynamics modeling), Wassersteindistance computations for point clouds, and the Gromov-Wasserstein variant.", "output": "Efficient Graph Field Integrators Meet Point Clouds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reliability of machine learning evaluation -- the consistency of observedevaluation scores across replicated model training runs -- is affected byseveral sources of nondeterminism which can be regarded as measurement noise.Current tendencies to remove noise in order to enforce reproducibility ofresearch results neglect inherent nondeterminism at the implementation leveland disregard crucial interaction effects between algorithmic noise factors anddata properties. This limits the scope of conclusions that can be drawn fromsuch experiments. Instead of removing noise, we propose to incorporate severalsources of variance, including their interaction with data properties, into ananalysis of significance and reliability of machine learning evaluation, withthe aim to draw inferences beyond particular instances of trained models. Weshow how to use linear mixed effects models (LMEMs) to analyze performanceevaluation scores, and to conduct statistical inference with a generalizedlikelihood ratio test (GLRT). This allows us to incorporate arbitrary sourcesof noise like meta-parameter variations into statistical significance testing,and to assess performance differences conditional on data properties.Furthermore, a variance component analysis (VCA) enables the analysis of thecontribution of noise sources to overall variance and the computation of areliability coefficient by the ratio of substantial to total variance.", "output": "Towards Inferential Reproducibility of Machine Learning Research."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Over the last few years, key architectural advances have been proposed forneural network interatomic potentials (NNIPs), such as incorporatingmessage-passing networks, equivariance, or many-body expansion terms. Althoughmodern NNIP models exhibit small differences in energy/forces errors,improvements in accuracy are still considered the main target when developingnew NNIP architectures. In this work, we show how architectural andoptimization choices influence the generalization of NNIPs, revealing trends inmolecular dynamics (MD) stability, data efficiency, and loss landscapes. Usingthe 3BPA dataset, we show that test errors in NNIP follow a scaling relationand can be robust to noise, but cannot predict MD stability in thehigh-accuracy regime. To circumvent this problem, we propose the use of losslandscape visualizations and a metric of loss entropy for predicting thegeneralization power of NNIPs. With a large-scale study on NequIP and MACE, weshow that the loss entropy predicts out-of-distribution error and MD stabilitydespite being computed only on the training set. Using this probe, wedemonstrate how the choice of optimizers, loss function weighting, datanormalization, and other architectural decisions influence the extrapolationbehavior of NNIPs. Finally, we relate loss entropy to data efficiency,demonstrating that flatter landscapes also predict learning curve slopes. Ourwork provides a deep learning justification for the extrapolation performanceof many common NNIPs, and introduces tools beyond accuracy metrics that can beused to inform the development of next-generation models.", "output": "Data efficiency and extrapolation trends in neural network interatomic potentials."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces Block Data Representations (BDR), a framework forexploring and evaluating a wide spectrum of narrow-precision formats for deeplearning. It enables comparison of popular quantization standards, and throughBDR, new formats based on shared microexponents (MX) are identified, whichoutperform other state-of-the-art quantization approaches, includingnarrow-precision floating-point and block floating-point. MX utilizes multiplelevels of quantization scaling with ultra-fine scaling factors based on sharedmicroexponents in the hardware. The effectiveness of MX is demonstrated onreal-world models including large-scale generative pretraining and inferencing,and production-scale recommendation systems.", "output": "With Shared Microexponents, A Little Shifting Goes a Long Way."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning agent behaviors from observational data has shown to improve ourunderstanding of their decision-making processes, advancing our ability toexplain their interactions with the environment and other agents. Whilemultiple learning techniques have been proposed in the literature, there is oneparticular setting that has not been explored yet: multi agent systems whereagent identities remain anonymous. For instance, in financial markets labeleddata that identifies market participant strategies is typically proprietary,and only the anonymous state-action pairs that result from the interaction ofmultiple market participants are publicly available. As a result, sequences ofagent actions are not observable, restricting the applicability of existingwork. In this paper, we propose a Policy Clustering algorithm, called K-SHAP,that learns to group anonymous state-action pairs according to the agentpolicies. We frame the problem as an Imitation Learning (IL) task, and we learna world-policy able to mimic all the agent behaviors upon differentenvironmental states. We leverage the world-policy to explain each anonymousobservation through an additive feature attribution method called SHAP (SHapleyAdditive exPlanations). Finally, by clustering the explanations we show that weare able to identify different agent policies and group observationsaccordingly. We evaluate our approach on simulated synthetic market data and areal-world financial dataset. We show that our proposal significantly andconsistently outperforms the existing methods, identifying different agentstrategies.", "output": "K-SHAP: Policy Clustering Algorithm for Anonymous State-Action Pairs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transfer learning has been widely utilized to mitigate the data scarcityproblem in the field of Alzheimer's disease (AD). Conventional transferlearning relies on re-using models trained on AD-irrelevant tasks such asnatural image classification. However, it often leads to negative transfer dueto the discrepancy between the non-medical source and target medical domains.To address this, we present evidence-empowered transfer learning for ADdiagnosis. Unlike conventional approaches, we leverage an AD-relevant auxiliarytask, namely morphological change prediction, without requiring additional MRIdata. In this auxiliary task, the diagnosis model learns the evidential andtransferable knowledge from morphological features in MRI scans. Experimentalresults demonstrate that our framework is not only effective in improvingdetection performance regardless of model capacity, but also moredata-efficient and faithful.", "output": "Evidence-empowered Transfer Learning for Alzheimer's Disease."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) have taken the scientific world by storm,changing the landscape of natural language processing and human-computerinteraction. These powerful tools can answer complex questions and,surprisingly, perform challenging creative tasks (e.g., generate code andapplications to solve problems, write stories, pieces of music, etc.). In thispaper, we present a collaborative game design framework that combinesinteractive evolution and large language models to simulate the typical humandesign process. We use the former to exploit users' feedback for selecting themost promising ideas and large language models for a very complex creative task- the recombination and variation of ideas. In our framework, the processstarts with a brief and a set of candidate designs, either generated using alanguage model or proposed by the users. Next, users collaborate on the designprocess by providing feedback to an interactive genetic algorithm that selects,recombines, and mutates the most promising designs. We evaluated our frameworkon three game design tasks with human designers who collaborated remotely.", "output": "ChatGPT and Other Large Language Models as Evolutionary Engines for Online Interactive Collaborative Game Design."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Rendering and inverse-rendering algorithms that drive conventional computergraphics have recently been superseded by neural representations (NR). NRs haverecently been used to learn the geometric and the material properties of thescenes and use the information to synthesize photorealistic imagery, therebypromising a replacement for traditional rendering algorithms with scalablequality and predictable performance. In this work we ask the question: Doesneural graphics (NG) need hardware support? We studied representative NGapplications showing that, if we want to render 4k res. at 60FPS there is a gapof 1.5X-55X in the desired performance on current GPUs. For AR/VR applications,there is an even larger gap of 2-4 OOM between the desired performance and therequired system power. We identify that the input encoding and the MLP kernelsare the performance bottlenecks, consuming 72%,60% and 59% of application timefor multi res. hashgrid, multi res. densegrid and low res. densegrid encodings,respectively. We propose a NG processing cluster, a scalable and flexiblehardware architecture that directly accelerates the input encoding and MLPkernels through dedicated engines and supports a wide range of NG applications.We also accelerate the rest of the kernels by fusing them together in Vulkan,which leads to 9.94X kernel-level performance improvement compared to un-fusedimplementation of the pre-processing and the post-processing kernels. Ourresults show that, NGPC gives up to 58X end-to-end application-levelperformance improvement, for multi res. hashgrid encoding on average across thefour NG applications, the performance benefits are 12X,20X,33X and 39X for thescaling factor of 8,16,32 and 64, respectively. Our results show that withmulti res. hashgrid encoding, NGPC enables the rendering of 4k res. at 30FPSfor NeRF and 8k res. at 120FPS for all our other NG applications.", "output": "Hardware Acceleration of Neural Graphics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of finding the optimal value of n in the n-steptemporal difference (TD) learning algorithm. We find the optimal n by resortingto a model-free optimization technique involving a one-simulation simultaneousperturbation stochastic approximation (SPSA) based procedure that we adopt tothe discrete optimization setting by using a random projection approach. Weprove the convergence of our proposed algorithm, SDPSA, using a differentialinclusions approach and show that it finds the optimal value of n in n-step TD.Through experiments, we show that the optimal value of n is achieved with SDPSAfor arbitrary initial values.", "output": "n-Step Temporal Difference Learning with Optimal n."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph neural networks (GNNs) have emerged as a popular strategy for handlingnon-Euclidean data due to their state-of-the-art performance. However, most ofthe current GNN model designs mainly focus on task accuracy, lacking inconsidering hardware resources limitation and real-time requirements of edgeapplication scenarios. Comprehensive profiling of typical GNN models indicatesthat their execution characteristics are significantly affected acrossdifferent computing platforms, which demands hardware awareness for efficientGNN designs. In this work, HGNAS is proposed as the first Hardware-aware GraphNeural Architecture Search framework targeting resource constraint edgedevices. By decoupling the GNN paradigm, HGNAS constructs a fine-grained designspace and leverages an efficient multi-stage search strategy to explore optimalarchitectures within a few GPU hours. Moreover, HGNAS achieves hardwareawareness during the GNN architecture design by leveraging a hardwareperformance predictor, which could balance the GNN model accuracy andefficiency corresponding to the characteristics of targeted devices.Experimental results show that HGNAS can achieve about $10.6times$ speedup and$88.2%$ peak memory reduction with a negligible accuracy loss compared toDGCNN on various edge devices, including Nvidia RTX3080, Jetson TX2, Inteli7-8700K and Raspberry Pi 3B+.", "output": "Hardware-Aware Graph Neural Network Automated Design for Edge Computing Platforms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vibration signals have been increasingly utilized in various engineeringfields for analysis and monitoring purposes, including structural healthmonitoring, fault diagnosis and damage detection, where vibration signals canprovide valuable information about the condition and integrity of structures.In recent years, there has been a growing trend towards the use of vibrationsignals in the field of bioengineering. Activity-induced structural vibrations,particularly footstep-induced signals, are useful for analyzing the movement ofbiological systems such as the human body and animals, providing valuableinformation regarding an individual's gait, body mass, and posture, making theman attractive tool for health monitoring, security, and human-computerinteraction. However, the presence of various types of noise can compromise theaccuracy of footstep-induced signal analysis. In this paper, we propose a novelensemble model that leverages both the ensemble of multiple signals and ofrecurrent and convolutional neural network predictions. The proposed modelconsists of three stages: preprocessing, hybrid modeling, and ensemble. In thepreprocessing stage, features are extracted using the Fast Fourier Transformand wavelet transform to capture the underlying physics-governed dynamics ofthe system and extract spatial and temporal features. In the hybrid modelingstage, a bi-directional LSTM is used to denoise the noisy signal concatenatedwith FFT results, and a CNN is used to obtain a condensed featurerepresentation of the signal. In the ensemble stage, three layers of afully-connected neural network are used to produce the final denoised signal.The proposed model addresses the challenges associated with structuralvibration signals, which outperforms the prevailing algorithms for a wide rangeof noise levels, evaluated using PSNR, SNR, and WMAPE.", "output": "Structural Vibration Signal Denoising Using Stacking Ensemble of Hybrid CNN-RNN."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Executing machine learning inference tasks on resource-constrained edgedevices requires careful hardware-software co-design optimizations. Recentexamples have shown how transformer-based deep neural network models such asALBERT can be used to enable the execution of natural language processing (NLP)inference on mobile systems-on-chip housing custom hardware accelerators.However, while these existing solutions are effective in alleviating thelatency, energy, and area costs of running single NLP tasks, achievingmulti-task inference requires running computations over multiple variants ofthe model parameters, which are tailored to each of the targeted tasks. Thisapproach leads to either prohibitive on-chip memory requirements or paying thecost of off-chip memory access. This paper proposes adapter-ALBERT, anefficient model optimization for maximal data reuse across different tasks. Theproposed model's performance and robustness to data compression methods areevaluated across several language tasks from the GLUE benchmark. Additionally,we demonstrate the advantage of mapping the model to a heterogeneous on-chipmemory architecture by performing simulations on a validated NLP edgeaccelerator to extrapolate performance, power, and area improvements over theexecution of a traditional ALBERT model on the same hardware platform.", "output": "Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present an artificial intelligence system to remotely assess the motorperformance of individuals with Parkinson's disease (PD). Participantsperformed a motor task (i.e., tapping fingers) in front of a webcam, and datafrom 250 global participants were rated by three expert neurologists followingthe Movement Disorder Society Unified Parkinson's Disease Rating Scale(MDS-UPDRS). The neurologists' ratings were highly reliable, with anintra-class correlation coefficient (ICC) of 0.88. We developed computeralgorithms to obtain objective measurements that align with the MDS-UPDRSguideline and are strongly correlated with the neurologists' ratings. Ourmachine learning model trained on these measures outperformed an MDS-UPDRScertified rater, with a mean absolute error (MAE) of 0.59 compared to therater's MAE of 0.79. However, the model performed slightly worse than theexpert neurologists (0.53 MAE). The methodology can be replicated for similarmotor tasks, providing the possibility of evaluating individuals with PD andother movement disorders remotely, objectively, and in areas with limitedaccess to neurological care.", "output": "Using AI to Measure Parkinson's Disease Severity at Home."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Frank-Wolfe algorithm is a popular method in structurally constrainedmachine learning applications, due to its fast per-iteration complexity.However, one major limitation of the method is a slow rate of convergence thatis difficult to accelerate due to erratic, zig-zagging step directions, evenasymptotically close to the solution. We view this as an artifact ofdiscretization; that is to say, the Frank-Wolfe emph{flow}, which is itstrajectory at asymptotically small step sizes, does not zig-zag, and reducingdiscretization error will go hand-in-hand in producing a more stabilizedmethod, with better convergence properties. We propose two improvements: amultistep Frank-Wolfe method that directly applies optimized higher-orderdiscretization schemes; and an LMO-averaging scheme with reduced discretizationerror, and whose local convergence rate over general convex sets acceleratesfrom a rate of $O(1/k)$ to up to $O(1/k^{3/2})$.", "output": "Reducing Discretization Error in the Frank-Wolfe Method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning image classification and image generation using the same set ofnetwork parameters is a challenging problem. Recent advanced approaches performwell in one task often exhibit poor performance in the other. This workintroduces an energy-based classifier and generator, namely EGC, which canachieve superior performance in both tasks using a single neural network.Unlike a conventional classifier that outputs a label given an image (i.e., aconditional distribution $p(y|mathbf{x})$), the forward pass in EGC is aclassifier that outputs a joint distribution $p(mathbf{x},y)$, enabling animage generator in its backward pass by marginalizing out the label $y$. Thisis done by estimating the energy and classification probability given a noisyimage in the forward pass, while denoising it using the score functionestimated in the backward pass. EGC achieves competitive generation resultscompared with state-of-the-art approaches on ImageNet-1k, CelebA-HQ and LSUNChurch, while achieving superior classification accuracy and robustness againstadversarial attacks on CIFAR-10. This work represents the first successfulattempt to simultaneously excel in both tasks using a single set of networkparameters. We believe that EGC bridges the gap between discriminative andgenerative learning.", "output": "EGC: Image Generation and Classification via a Diffusion Energy-Based Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Response curves exhibit the magnitude of the response of a sensitive systemto a varying stimulus. However, response of such systems may be sensitive tomultiple stimuli (i.e., input features) that are not necessarily independent.As a consequence, the shape of response curves generated for a selected inputfeature (referred to as \"active feature\") might depend on the values of theother input features (referred to as \"passive features\"). In this work, weconsider the case of systems whose response is approximated using regressionneural networks. We propose to use counterfactual explanations (CFEs) for theidentification of the features with the highest relevance on the shape ofresponse curves generated by neural network black boxes. CFEs are generated bya genetic algorithm-based approach that solves a multi-objective optimizationproblem. In particular, given a response curve generated for an active feature,a CFE finds the minimum combination of passive features that need to bemodified to alter the shape of the response curve. We tested our method on asynthetic dataset with 1-D inputs and two crop yield prediction datasets with2-D inputs. The relevance ranking of features and feature combinations obtainedon the synthetic dataset coincided with the analysis of the equation that wasused to generate the problem. Results obtained on the yield prediction datasetsrevealed that the impact on fertilizer responsivity of passive features dependson the terrain characteristics of each field.", "output": "Counterfactual Explanations of Neural Network-Generated Response Curves."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In applied fields where the speed of inference and model flexibility arecrucial, the use of Bayesian inference for models with a stochastic process astheir prior, e.g. Gaussian processes (GPs) is ubiquitous. Recent literature hasdemonstrated that the computational bottleneck caused by GP priors or theirfinite realizations can be encoded using deep generative models such asvariational autoencoders (VAEs), and the learned generators can then be usedinstead of the original priors during Markov chain Monte Carlo (MCMC) inferencein a drop-in manner. While this approach enables fast and highly efficientinference, it loses information about the stochastic process hyperparameters,and, as a consequence, makes inference over hyperparameters impossible and thelearned priors indistinct. We propose to resolve this issue and disentangle thelearned priors by conditioning the VAE on stochastic process hyperparameters.This way, the hyperparameters are encoded alongside GP realisations and can beexplicitly estimated at the inference stage. We believe that the new method,termed PriorCVAE, will be a useful tool among approximate inference approachesand has the potential to have a large impact on spatial and spatiotemporalinference in crucial real-life applications. Code showcasing PriorCVAE can befound on GitHub: ", "output": "PriorCVAE: scalable MCMC parameter inference with Bayesian deep generative modelling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human intelligence has the remarkable ability to assemble basic skills intocomplex ones so as to solve complex tasks. This ability is equally importantfor Artificial Intelligence (AI), and thus, we assert that in addition to thedevelopment of large, comprehensive intelligent models, it is equally crucialto equip such models with the capability to harness various domain-specificexpert models for complex task-solving in the pursuit of Artificial GeneralIntelligence (AGI). Recent developments in Large Language Models (LLMs) havedemonstrated remarkable learning and reasoning abilities, making them promisingas a controller to select, synthesize, and execute external models to solvecomplex tasks. In this project, we develop OpenAGI, an open-source AGI researchplatform, specifically designed to offer complex, multi-step tasks andaccompanied by task-specific datasets, evaluation metrics, and a diverse rangeof extensible models. OpenAGI formulates complex tasks as natural languagequeries, serving as input to the LLM. The LLM subsequently selects,synthesizes, and executes models provided by OpenAGI to address the task.Furthermore, we propose a Reinforcement Learning from Task Feedback (RLTF)mechanism, which uses the task-solving result as feedback to improve the LLM'stask-solving ability. Thus, the LLM is responsible for synthesizing variousexternal models for solving complex tasks, while RLTF provides feedback toimprove its task-solving ability, enabling a feedback loop for self-improvingAI. We believe that the paradigm of LLMs operating various expert models forcomplex task-solving is a promising approach towards AGI. To facilitate thecommunity's long-term improvement and evaluation of AGI's ability, weopen-source the code, benchmark, and evaluation methods of the OpenAGI projectat ", "output": "OpenAGI: When LLM Meets Domain Experts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) is a new distributed learning paradigm, with privacy,utility, and efficiency as its primary pillars. Existing research indicatesthat it is unlikely to simultaneously attain infinitesimal privacy leakage,utility loss, and efficiency. Therefore, how to find an optimal trade-offsolution is the key consideration when designing the FL algorithm. One commonway is to cast the trade-off problem as a multi-objective optimization problem,i.e., the goal is to minimize the utility loss and efficiency reduction whileconstraining the privacy leakage not exceeding a predefined value. However,existing multi-objective optimization frameworks are very time-consuming, anddo not guarantee the existence of the Pareto frontier, this motivates us toseek a solution to transform the multi-objective problem into asingle-objective problem because it is more efficient and easier to be solved.To this end, we propose FedPAC, a unified framework that leverages PAC learningto quantify multiple objectives in terms of sample complexity, suchquantification allows us to constrain the solution space of multiple objectivesto a shared dimension, so that it can be solved with the help of asingle-objective optimization algorithm. Specifically, we provide the resultsand detailed analyses of how to quantify the utility loss, privacy leakage,privacy-utility-efficiency trade-off, as well as the cost of the attacker fromthe PAC learning perspective.", "output": "Probably Approximately Correct Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Score-based diffusion models learn to reverse a stochastic differentialequation that maps data to noise. However, for complex tasks, numerical errorcan compound and result in highly unnatural samples. Previous work mitigatesthis drift with thresholding, which projects to the natural data domain (suchas pixel space for images) after each diffusion step, but this leads to amismatch between the training and generative processes. To incorporate dataconstraints in a principled manner, we present Reflected Diffusion Models,which instead reverse a reflected stochastic differential equation evolving onthe support of the data. Our approach learns the perturbed score functionthrough a generalized score matching loss and extends key components ofstandard diffusion models including diffusion guidance, likelihood-basedtraining, and ODE sampling. We also bridge the theoretical gap withthresholding: such schemes are just discretizations of reflected SDEs. Onstandard image benchmarks, our method is competitive with or surpasses thestate of the art and, for classifier-free guidance, our approach enables fastexact sampling with ODEs and produces more faithful samples under high guidanceweight.", "output": "Reflected Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a deep-learning based approach for measuring small planetaryradial velocities in the presence of stellar variability. We use neuralnetworks to reduce stellar RV jitter in three years of HARPS-N sun-as-a-starspectra. We develop and compare dimensionality-reduction and data splittingmethods, as well as various neural network architectures including single lineCNNs, an ensemble of single line CNNs, and a multi-line CNN. We injectplanet-like RVs into the spectra and use the network to recover them. We findthat the multi-line CNN is able to recover planets with 0.2 m/s semi-amplitude,50 day period, with 8.8% error in the amplitude and 0.7% in the period. Thisapproach shows promise for mitigating stellar RV variability and enabling thedetection of small planetary RVs with unprecedented precision.", "output": "Deep-learning based measurement of planetary radial velocities in the presence of stellar variability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent data regulations necessitate machine unlearning (MU): The removal ofthe effect of specific examples from the model. While exact unlearning ispossible by conducting a model retraining with the remaining data from scratch,its computational cost has led to the development of approximate but efficientunlearning schemes. Beyond data-centric MU solutions, we advance MU through anovel model-based viewpoint: sparsification via weight pruning. Our results inboth theory and practice indicate that model sparsity can boost themulti-criteria unlearning performance of an approximate unlearner, closing theapproximation gap, while continuing to be efficient. With this insight, wedevelop two new sparsity-aware unlearning meta-schemes, termed `prune first,then unlearn' and `sparsity-aware unlearning'. Extensive experiments show thatour findings and proposals consistently benefit MU in various scenarios,including class-wise data scrubbing, random data scrubbing, and backdoor dataforgetting. One highlight is the 77% unlearning efficacy gain of fine-tuning(one of the simplest approximate unlearning methods) in the proposedsparsity-aware unlearning paradigm. Codes are available at", "output": "Model Sparsification Can Simplify Machine Unlearning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Time series classification (TSC) is a challenging task due to the diversityof types of feature that may be relevant for different classification tasks,including trends, variance, frequency, magnitude, and various patterns. Toaddress this challenge, several alternative classes of approach have beendeveloped, including similarity-based, features and intervals, shapelets,dictionary, kernel, neural network, and hybrid approaches. While kernel, neuralnetwork, and hybrid approaches perform well overall, some specializedapproaches are better suited for specific tasks. In this paper, we propose anew similarity-based classifier, Proximity Forest version 2.0 (PF 2.0), whichoutperforms previous state-of-the-art similarity-based classifiers across theUCR benchmark and outperforms state-of-the-art kernel, neural network, andhybrid methods on specific datasets in the benchmark that are best addressed bysimilarity-base methods. PF 2.0 incorporates three recent advances in timeseries similarity measures -- (1) computationally efficient early abandoningand pruning to speedup elastic similarity computations; (2) a new elasticsimilarity measure, Amerced Dynamic Time Warping (ADTW); and (3) cost functiontuning. It rationalizes the set of similarity measures employed, reducing theeight base measures of the original PF to three and using the first derivativetransform with all similarity measures, rather than a limited subset. We haveimplemented both PF 1.0 and PF 2.0 in a single C++ framework, making the PFframework more efficient.", "output": "Proximity Forest 2.0: A new effective and scalable similarity-based classifier for time series."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The development of approaches for trajectory prediction requires metrics tovalidate and compare their performance. Currently established metrics are basedon Euclidean distance, which means that errors are weighted equally in alldirections. Euclidean metrics are insufficient for structured environments likeroads, since they do not properly capture the agent's intent relative to theunderlying lane. In order to provide a reasonable assessment of trajectoryprediction approaches with regard to the downstream planning task, we propose anew metric that is lane distance-based: Lane Miss Rate (LMR). For thecalculation of LMR, the ground-truth and predicted endpoints are assigned tolane segments, more precisely their centerlines. Measured by the distance alongthe lane segments, predictions that are within a certain threshold distance tothe ground-truth count as hits, otherwise they count as misses. LMR is thendefined as the ratio of sequences that yield a miss. Our results on threestate-of-the-art trajectory prediction models show that LMR preserves the orderof Euclidean distance-based metrics. In contrast to the Euclidean Miss Rate,qualitative results show that LMR yields misses for sequences where predictionsare located on wrong lanes. Hits on the other hand result for sequences wherepredictions are located on the correct lane. This means that LMR implicitlyweights Euclidean error relative to the lane and goes into the direction ofcapturing intents of traffic agents. The source code of LMR for Argoverse 2 ispublicly available.", "output": "LMR: Lane Distance-Based Metric for Trajectory Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present ImageReward -- the first general-purpose text-to-image humanpreference reward model -- to address various prevalent issues in generativemodels and align them with human values and preferences. Its training is basedon our systematic annotation pipeline that covers both the rating and rankingcomponents, collecting a dataset of 137k expert comparisons to date. In humanevaluation, ImageReward outperforms existing scoring methods (e.g., CLIP by38.6%), making it a promising automatic metric for evaluating and improvingtext-to-image synthesis. The reward model is publicly available via thetexttt{image-reward} package at url{", "output": "ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advancements in areas such as natural language processing and computervision rely on intricate and massive models that have been trained using vastamounts of unlabelled or partly labeled data and training or deploying thesestate-of-the-art methods to resource constraint environments has been achallenge. Galaxy morphologies are crucial to understanding the processes bywhich galaxies form and evolve. Efficient methods to classify galaxymorphologies are required to extract physical information from modern-dayastronomy surveys. In this paper, we introduce methods to learn from lessamounts of data. We propose using a hybrid transformer-convolutionalarchitecture drawing much inspiration from the success of CoAtNet and MaxViT.Concretely, we use the transformer-convolutional hybrid with a new stack designfor the network, a different way of creating a relative self-attention layer,and pair it with a careful selection of data augmentation and regularizationtechniques. Our approach sets a new state-of-the-art on predicting galaxymorphologies from images on the Galaxy10 DECals dataset, a science objective,which consists of 17736 labeled images achieving $94.86%$ top-$1$ accuracy,beating the current state-of-the-art for this task by $4.62%$. Furthermore,this approach also sets a new state-of-the-art on CIFAR-100 and Tiny ImageNet.We also find that models and training methods used for larger datasets wouldoften not work very well in the low-data regime. Our code and models will bereleased at a later date before the conference.", "output": "Astroformer: More Data Might Not be All You Need for Classification."}] \ No newline at end of file +[{"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cybersecurity has emerged as a critical challenge for the industry. With thelarge complexity of the security landscape, sophisticated and costly deeplearning models often fail to provide timely detection of cyber threats on edgedevices. Brain-inspired hyperdimensional computing (HDC) has been introduced asa promising solution to address this issue. However, existing HDC approachesuse static encoders and require very high dimensionality and hundreds oftraining iterations to achieve reasonable accuracy. This results in a seriousloss of learning efficiency and causes huge latency for detecting attacks. Inthis paper, we propose CyberHD, an innovative HDC learning framework thatidentifies and regenerates insignificant dimensions to capture complicatedpatterns of cyber threats with remarkably lower dimensionality. Additionally,the holographic distribution of patterns in high dimensional space providesCyberHD with notably high robustness against hardware errors.", "output": "Late Breaking Results: Scalable and Efficient Hyperdimensional Computing for Network Intrusion Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Meta-learning is a framework for learning learning algorithms throughrepeated interactions with an environment as opposed to designing them by hand.In recent years, this framework has established itself as a promising tool forbuilding models of human cognition. Yet, a coherent research program aroundmeta-learned models of cognition is still missing. The purpose of this articleis to synthesize previous work in this field and establish such a researchprogram. We rely on three key pillars to accomplish this goal. We first pointout that meta-learning can be used to construct Bayes-optimal learningalgorithms. This result not only implies that any behavioral phenomenon thatcan be explained by a Bayesian model can also be explained by a meta-learnedmodel but also allows us to draw strong connections to the rational analysis ofcognition. We then discuss several advantages of the meta-learning frameworkover traditional Bayesian methods. In particular, we argue that meta-learningcan be applied to situations where Bayesian inference is impossible and that itenables us to make rational models of cognition more realistic, either byincorporating limited computational resources or neuroscientific knowledge.Finally, we reexamine prior studies from psychology and neuroscience that haveapplied meta-learning and put them into the context of these new insights. Insummary, our work highlights that meta-learning considerably extends the scopeof rational analysis and thereby of cognitive theories more generally.", "output": "Meta-Learned Models of Cognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Humans excel at continually acquiring, consolidating, and retaininginformation from an ever-changing environment, whereas artificial neuralnetworks (ANNs) exhibit catastrophic forgetting. There are considerabledifferences in the complexity of synapses, the processing of information, andthe learning mechanisms in biological neural networks and their artificialcounterparts, which may explain the mismatch in performance. We consider abiologically plausible framework that constitutes separate populations ofexclusively excitatory and inhibitory neurons that adhere to Dale's principle,and the excitatory pyramidal neurons are augmented with dendritic-likestructures for context-dependent processing of stimuli. We then conduct acomprehensive study on the role and interactions of different mechanismsinspired by the brain, including sparse non-overlapping representations,Hebbian learning, synaptic consolidation, and replay of past activations thataccompanied the learning event. Our study suggests that the employing ofmultiple complementary mechanisms in a biologically plausible architecture,similar to the brain, may be effective in enabling continual learning in ANNs.", "output": "A Study of Biologically Plausible Neural Network: The Role and Interactions of Brain-Inspired Mechanisms in Continual Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large decoder-only language models (LMs) can be largely improved in terms ofperplexity by retrieval (e.g., RETRO), but its impact on text generationquality and downstream task accuracy is unclear. Thus, it is still an openquestion: shall we pretrain large autoregressive LMs with retrieval? To answerit, we perform a comprehensive study on a scalable pre-trainedretrieval-augmented LM (i.e., RETRO) compared with standard GPT andretrieval-augmented GPT incorporated at fine-tuning or inference stages. Wefirst provide the recipe to reproduce RETRO up to 9.5B parameters whileretrieving a text corpus with 330B tokens. Based on that, we have the followingnovel findings: i) RETRO outperforms GPT on text generation with much lessdegeneration (i.e., repetition), moderately higher factual accuracy, andslightly lower toxicity with a nontoxic retrieval database. ii) On the LMEvaluation Harness benchmark, RETRO largely outperforms GPT onknowledge-intensive tasks, but is on par with GPT on other tasks. Furthermore,we introduce a simple variant of the model, RETRO++, which largely improvesopen-domain QA results of original RETRO (e.g., EM score +8.6 on NaturalQuestion) and significantly outperforms retrieval-augmented GPT acrossdifferent model sizes. Our findings highlight the promising direction ofpretraining autoregressive LMs with retrieval as future foundation models. Werelease our implementation at: ", "output": "Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative foundation models are susceptible to implicit biases that canarise from extensive unsupervised training data. Such biases can producesuboptimal samples, skewed outcomes, and unfairness, with potentiallysignificant repercussions. Consequently, aligning these models with humanethics and preferences is an essential step toward ensuring their responsibleand effective deployment in real-world applications. Prior research hasprimarily employed Reinforcement Learning from Human Feedback (RLHF) as a meansof addressing this problem, wherein generative models are fine-tuned using RLalgorithms guided by a human-feedback-informed reward model. However, theinefficiencies and instabilities associated with RL algorithms frequentlypresent substantial obstacles to the successful alignment of generative models,necessitating the development of a more robust and streamlined approach. Tothis end, we introduce a new framework, Reward rAnked FineTuning (RAFT),designed to align generative models more effectively. Utilizing a reward modeland a sufficient number of samples, our approach selects the high-qualitysamples, discarding those that exhibit undesired behavior, and subsequentlyassembles a streaming dataset. This dataset serves as the basis for aligningthe generative model and can be employed under both offline and onlinesettings. Notably, the sample generation process within RAFT is gradient-free,rendering it compatible with black-box generators. Through extensiveexperiments, we demonstrate that our proposed algorithm exhibits strongperformance in the context of both large language models and diffusion models.", "output": "RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "ChatGPT (GPT) has become one of the most talked-about innovations in recentyears, with over 100 million users worldwide. However, there is still limitedknowledge about the sources of information GPT utilizes. As a result, wecarried out a study focusing on the sources of information within the field ofenvironmental science. In our study, we asked GPT to identify the ten mostsignificant subdisciplines within the field of environmental science. We thenasked it to compose a scientific review article on each subdiscipline,including 25 references. We proceeded to analyze these references, focusing onfactors such as the number of citations, publication date, and the journal inwhich the work was published. Our findings indicate that GPT tends to citehighly-cited publications in environmental science, with a median citationcount of 1184.5. It also exhibits a preference for older publications, with amedian publication year of 2010, and predominantly refers to well-respectedjournals in the field, with Nature being the most cited journal by GPT.Interestingly, our findings suggest that GPT seems to exclusively rely oncitation count data from Google Scholar for the works it cites, rather thanutilizing citation information from other scientific databases such as Web ofScience or Scopus. In conclusion, our study suggests that Google Scholarcitations play a significant role as a predictor for mentioning a study inGPT-generated content. This finding reinforces the dominance of Google Scholaramong scientific databases and perpetuates the Matthew Effect in science, wherethe rich get richer in terms of citations. With many scholars already utilizingGPT for literature review purposes, we can anticipate further disparities andan expanding gap between lesser-cited and highly-cited publications.", "output": "ChatGPT cites the most-cited articles and journals, relying solely on Google Scholar's citation counts. As a result, AI may amplify the Matthew Effect in environmental science."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large pre-trained models, also known as foundation models (FMs), are trainedin a task-agnostic manner on large-scale data and can be adapted to a widerange of downstream tasks by fine-tuning, few-shot, or even zero-shot learning.Despite their successes in language and vision tasks, we have yet seen anattempt to develop foundation models for geospatial artificial intelligence(GeoAI). In this work, we explore the promises and challenges of developingmultimodal foundation models for GeoAI. We first investigate the potential ofmany existing FMs by testing their performances on seven tasks across multiplegeospatial subdomains including Geospatial Semantics, Health Geography, UrbanGeography, and Remote Sensing. Our results indicate that on several geospatialtasks that only involve text modality such as toponym recognition, locationdescription recognition, and US state-level/county-level dementia time seriesforecasting, these task-agnostic LLMs can outperform task-specificfully-supervised models in a zero-shot or few-shot learning setting. However,on other geospatial tasks, especially tasks that involve multiple datamodalities (e.g., POI-based urban function classification, street viewimage-based urban noise intensity classification, and remote sensing imagescene classification), existing foundation models still underperformtask-specific models. Based on these observations, we propose that one of themajor challenges of developing a FM for GeoAI is to address the multimodalitynature of geospatial tasks. After discussing the distinct challenges of eachgeospatial data modality, we suggest the possibility of a multimodal foundationmodel which can reason over various types of geospatial data through geospatialalignments. We conclude this paper by discussing the unique risks andchallenges to develop such a model for GeoAI.", "output": "On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a novel, physically-constrained and differentiable approach forthe generation of D-dimensional qudit states via spontaneous parametricdown-conversion (SPDC) in quantum optics. We circumvent any limitations imposedby the inherently stochastic nature of the physical process and incorporate aset of stochastic dynamical equations governing its evolution under the SPDCHamiltonian. We demonstrate the effectiveness of our model through the designof structured nonlinear photonic crystals (NLPCs) and shaped pump beams; andshow, theoretically and experimentally, how to generate maximally entangledstates in the spatial degree of freedom. The learning of NLPC structures offersa promising new avenue for shaping and controlling arbitrary quantum states andenables all-optical coherent control of the generated states. We believe thatthis approach can readily be extended from bulky crystals to thin Metasurfacesand potentially applied to other quantum systems sharing a similar Hamiltonianstructures, such as superfluids and superconductors.", "output": "Designing Nonlinear Photonic Crystals for High-Dimensional Quantum State Engineering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Out-of-distribution (OOD) detection aims to identify test examples that donot belong to the training distribution and are thus unlikely to be predictedreliably. Despite a plethora of existing works, most of them focused only onthe scenario where OOD examples come from semantic shift (e.g., unseencategories), ignoring other possible causes (e.g., covariate shift). In thispaper, we present a novel, unifying framework to study OOD detection in abroader scope. Instead of detecting OOD examples from a particular cause, wepropose to detect examples that a deployed machine learning model (e.g., animage classifier) is unable to predict correctly. That is, whether a testexample should be detected and rejected or not is ``model-specific''. We showthat this framework unifies the detection of OOD examples caused by semanticshift and covariate shift, and closely addresses the concern of applying amachine learning model to uncontrolled environments. We provide an extensiveanalysis that involves a variety of models (e.g., different architectures andtraining strategies), sources of OOD examples, and OOD detection approaches,and reveal several insights into improving and understanding OOD detection inuncontrolled environments.", "output": "Unified Out-Of-Distribution Detection: A Model-Specific Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Integrating whole-slide images (WSIs) and bulk transcriptomics for predictingpatient survival can improve our understanding of patient prognosis. However,this multimodal task is particularly challenging due to the different nature ofthese data: WSIs represent a very high-dimensional spatial description of atumor, while bulk transcriptomics represent a global description of geneexpression levels within that tumor. In this context, our work aims to addresstwo key challenges: (1) how can we tokenize transcriptomics in a semanticallymeaningful and interpretable way?, and (2) how can we capture dense multimodalinteractions between these two modalities? Specifically, we propose to learnbiological pathway tokens from transcriptomics that can encode specificcellular functions. Together with histology patch tokens that encode thedifferent morphological patterns in the WSI, we argue that they formappropriate reasoning units for downstream interpretability analyses. Wepropose fusing both modalities using a memory-efficient multimodal Transformerthat can model interactions between pathway and histology patch tokens. Ourproposed model, SURVPATH, achieves state-of-the-art performance when evaluatedagainst both unimodal and multimodal baselines on five datasets from The CancerGenome Atlas. Our interpretability framework identifies key multimodalprognostic factors, and, as such, can provide valuable insights into theinteraction between genotype and phenotype, enabling a deeper understanding ofthe underlying biological mechanisms at play. We make our code public at:", "output": "Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dynamic Graph Neural Networks (DGNNs) are becoming increasingly popular dueto their effectiveness in analyzing and predicting the evolution of complexinterconnected graph-based systems. However, hardware deployment of DGNNs stillremains a challenge. First, DGNNs do not fully utilize hardware resourcesbecause temporal data dependencies cause low hardware parallelism.Additionally, there is currently a lack of generic DGNN hardware acceleratorframeworks, and existing GNN accelerator frameworks have limited ability tohandle dynamic graphs with changing topologies and node features. To addressthe aforementioned challenges, in this paper, we propose DGNN-Booster, which isa novel Field-Programmable Gate Array (FPGA) accelerator framework forreal-time DGNN inference using High-Level Synthesis (HLS). It includes twodifferent FPGA accelerator designs with different dataflows that can supportthe most widely used DGNNs. We showcase the effectiveness of our designs byimplementing and evaluating two representative DGNN models on ZCU102 board andmeasuring the end-to-end performance. The experiment results demonstrate thatDGNN-Booster can achieve a speedup of up to 5.6x compared to the CPU baseline(6226R), 8.4x compared to the GPU baseline (A6000) and 2.1x compared to theFPGA baseline without applying optimizations proposed in this paper. Moreover,DGNN-Booster can achieve over 100x and over 1000x runtime energy efficiencythan the CPU and GPU baseline respectively. Our implementation code andon-board measurements are publicly available at", "output": "DGNN-Booster: A Generic FPGA Accelerator Framework For Dynamic Graph Neural Network Inference."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although multi-task deep neural network (DNN) models have computation andstorage benefits over individual single-task DNN models, they can be furtheroptimized via model compression. Numerous structured pruning methods arealready developed that can readily achieve speedups in single-task models, butthe pruning of multi-task networks has not yet been extensively studied. Inthis work, we investigate the effectiveness of structured pruning on multi-taskmodels. We use an existing single-task filter pruning criterion and alsointroduce an MTL-based filter pruning criterion for estimating the filterimportance scores. We prune the model using an iterative pruning strategy withboth pruning methods. We show that, with careful hyper-parameter tuning,architectures obtained from different pruning methods do not have significantdifferences in their performances across tasks when the number of parameters issimilar. We also show that iterative structure pruning may not be the best wayto achieve a well-performing pruned model because, at extreme pruning levels,there is a high drop in performance across all tasks. But when the same modelsare randomly initialized and re-trained, they show better results.", "output": "Structured Pruning for Multi-Task Deep Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robots operating in real-world environments must reason about possibleoutcomes of stochastic actions and make decisions based on partial observationsof the true world state. A major challenge for making accurate and robustaction predictions is the problem of confounding, which if left untreated canlead to prediction errors. The partially observable Markov decision process(POMDP) is a widely-used framework to model these stochastic andpartially-observable decision-making problems. However, due to a lack ofexplicit causal semantics, POMDP planning methods are prone to confounding biasand thus in the presence of unobserved confounders may produce underperformingpolicies. This paper presents a novel causally-informed extension of \"anytimeregularized determinized sparse partially observable tree\" (AR-DESPOT), amodern anytime online POMDP planner, using causal modelling and inference toeliminate errors caused by unmeasured confounder variables. We further proposea method to learn offline the partial parameterisation of the causal model forplanning, from ground truth model data. We evaluate our methods on a toyproblem with an unobserved confounder and show that the learned causal model ishighly accurate, while our planning method is more robust to confounding andproduces overall higher performing policies than AR-DESPOT.", "output": "CAR-DESPOT: Causally-Informed Online POMDP Planning for Robots in Confounded Environments."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces a sampling-based strategy synthesis algorithm fornondeterministic hybrid systems with complex continuous dynamics under temporaland reachability constraints. We view the evolution of the hybrid system as atwo-player game, where the nondeterminism is an adversarial player whoseobjective is to prevent achieving temporal and reachability goals. The aim isto synthesize a winning strategy -- a reactive (robust) strategy thatguarantees the satisfaction of the goals under all possible moves of theadversarial player. The approach is based on growing a (search) game-tree inthe hybrid space by combining a sampling-based planning method with a novelbandit-based technique to select and improve on partial strategies. We provideconditions under which the algorithm is probabilistically complete, i.e., if awinning strategy exists, the algorithm will almost surely find it. The casestudies and benchmark results show that the algorithm is general andconsistently outperforms the state of the art.", "output": "Sampling-based Reactive Synthesis for Nondeterministic Hybrid Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unmanned Aerial Vehicles (UAVs), specifically drones equipped with remotesensing object detection technology, have rapidly gained a broad spectrum ofapplications and emerged as one of the primary research focuses in the field ofcomputer vision. Although UAV remote sensing systems have the ability to detectvarious objects, small-scale objects can be challenging to detect reliably dueto factors such as object size, image degradation, and real-time limitations.To tackle these issues, a real-time object detection algorithm (YOLO-Drone) isproposed and applied to two new UAV platforms as well as a specific lightsource (silicon-based golden LED). YOLO-Drone presents several novelties: 1)including a new backbone Darknet59; 2) a new complex feature aggregation moduleMSPP-FPN that incorporated one spatial pyramid pooling and three atrous spatialpyramid pooling modules; 3) and the use of Generalized Intersection over Union(GIoU) as the loss function. To evaluate performance, two benchmark datasets,UAVDT and VisDrone, along with one homemade dataset acquired at night undersilicon-based golden LEDs, are utilized. The experimental results show that, inboth UAVDT and VisDrone, the proposed YOLO-Drone outperforms state-of-the-art(SOTA) object detection methods by improving the mAP of 10.13% and 8.59%,respectively. With regards to UAVDT, the YOLO-Drone exhibits both highreal-time inference speed of 53 FPS and a maximum mAP of 34.04%. Notably,YOLO-Drone achieves high performance under the silicon-based golden LEDs, witha mAP of up to 87.71%, surpassing the performance of YOLO series under ordinarylight sources. To conclude, the proposed YOLO-Drone is a highly effectivesolution for object detection in UAV applications, particularly for nightdetection tasks where silicon-based golden light LED technology exhibitssignificant superiority.", "output": "YOLO-Drone:Airborne real-time detection of dense small objects from high-altitude perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We tackle the issue of generalized category discovery (GCD). GCD considersthe open-world problem of automatically clustering a partially labelleddataset, in which the unlabelled data contain instances from novel categoriesand also the labelled classes. In this paper, we address the GCD problemwithout a known category number in the unlabelled data. We propose a framework,named CiPR, to bootstrap the representation by exploiting Cross-instancePositive Relations for contrastive learning in the partially labelled datawhich are neglected in existing methods. First, to obtain reliablecross-instance relations to facilitate the representation learning, weintroduce a semi-supervised hierarchical clustering algorithm, named selectiveneighbor clustering (SNC), which can produce a clustering hierarchy directlyfrom the connected components in the graph constructed by selective neighbors.We also extend SNC to be capable of label assignment for the unlabelledinstances with the given class number. Moreover, we present a method toestimate the unknown class number using SNC with a joint reference scoreconsidering clustering indexes of both labelled and unlabelled data. Finally,we thoroughly evaluate our framework on public generic image recognitiondatasets and challenging fine-grained datasets, all establishing the newstate-of-the-art.", "output": "CiPR: An Efficient Framework with Cross-instance Positive Relations for Generalized Category Discovery."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sparse training is emerging as a promising avenue for reducing thecomputational cost of training neural networks. Several recent studies haveproposed pruning methods using learnable thresholds to efficiently explore thenon-uniform distribution of sparsity inherent within the models. In this paper,we propose Gradient Annealing (GA), where gradients of masked weights arescaled down in a non-linear manner. GA provides an elegant trade-off betweensparsity and accuracy without the need for additional sparsity-inducingregularization. We integrated GA with the latest learnable pruning methods tocreate an automated sparse training algorithm called AutoSparse, which achievesbetter accuracy and/or training/inference FLOPS reduction than existinglearnable pruning methods for sparse ResNet50 and MobileNetV1 on ImageNet-1K:AutoSparse achieves (2x, 7x) reduction in (training,inference) FLOPS forResNet50 on ImageNet at 80% sparsity. Finally, AutoSparse outperformssparse-to-sparse SotA method MEST (uniform sparsity) for 80% sparse ResNet50with similar accuracy, where MEST uses 12% more training FLOPS and 50% moreinference FLOPS.", "output": "AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Prompt engineering and calibration make large language models excel atreasoning tasks, including multiple choice commonsense reasoning. From apractical perspective, we investigate and evaluate these strategies on smallerlanguage models. Through experiments on five commonsense reasoning benchmarks,we find that each strategy favors certain models, but their joint effects aremostly negative.", "output": "Prompt Engineering and Calibration for Zero-Shot Commonsense Reasoning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Depth Estimation has wide reaching applications in the field of Computervision such as target tracking, augmented reality, and self-driving cars. Thegoal of Monocular Depth Estimation is to predict the depth map, given a 2Dmonocular RGB image as input. The traditional depth estimation methods arebased on depth cues and used concepts like epipolar geometry. With theevolution of Convolutional Neural Networks, depth estimation has undergonetremendous strides. In this project, our aim is to explore possible extensionsto existing SoTA Deep Learning based Depth Estimation Models and to see whetherperformance metrics could be further improved. In a broader sense, we arelooking at the possibility of implementing Pose Estimation, Efficient Sub-PixelConvolution Interpolation, Semantic Segmentation Estimation techniques tofurther enhance our proposed architecture and to provide fine-grained and moreglobally coherent depth map predictions. We also plan to do away with cameraintrinsic parameters during training and apply weather augmentations to furthergeneralize our model.", "output": "Self-Supervised Learning based Depth Estimation from Monocular Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Temporal heterogeneous information network (temporal HIN) embedding, aimingto represent various types of nodes of different timestamps into lowdimensional spaces while preserving structural and semantic information, is ofvital importance in diverse real-life tasks. Researchers have made greatefforts on temporal HIN embedding in Euclidean spaces and got some considerableachievements. However, there is always a fundamental conflict that manyreal-world networks show hierarchical property and power-law distribution, andare not isometric of Euclidean spaces. Recently, representation learning inhyperbolic spaces has been proved to be valid for data with hierarchical andpower-law structure. Inspired by this character, we propose a hyperbolicheterogeneous temporal network embedding (H2TNE) model for temporal HINs.Specifically, we leverage a temporally and heterogeneously double-constrainedrandom walk strategy to capture the structural and semantic information, andthen calculate the embedding by exploiting hyperbolic distance in proximitymeasurement. Experimental results show that our method has superior performanceon temporal link prediction and node classification compared with SOTA models.", "output": "H2TNE: Temporal Heterogeneous Information Network Embedding in Hyperbolic Spaces."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text simplification (TS) is the process of generating easy-to-understandsentences from a given sentence or piece of text. The aim of TS is to reduceboth the lexical (which refers to vocabulary complexity and meaning) andsyntactic (which refers to the sentence structure) complexity of a given textor sentence without the loss of meaning or nuance. In this paper, we presenttextsc{SimpLex}, a novel simplification architecture for generating simplifiedEnglish sentences. To generate a simplified sentence, the proposed architectureuses either word embeddings (i.e., Word2Vec) and perplexity, or sentencetransformers (i.e., BERT, RoBERTa, and GPT2) and cosine similarity. Thesolution is incorporated into a user-friendly and simple-to-use software. Weevaluate our system using two metrics, i.e., SARI, and Perplexity Decrease.Experimentally, we observe that the transformer models outperform the othermodels in terms of the SARI score. However, in terms of Perplexity, theWord-Embeddings-based models achieve the biggest decrease. Thus, the maincontributions of this paper are: (1) We propose a new Word Embedding andTransformer based algorithm for text simplification; (2) We designtextsc{SimpLex} -- a modular novel text simplification system -- that canprovide a baseline for further research; and (3) We perform an in-depthanalysis of our solution and compare our results with two state-of-the-artmodels, i.e., LightLS [19] and NTS-w2v [44]. We also make the code publiclyavailable online.", "output": "SimpLex: a lexical text simplification architecture."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph Convolutional Network (GCN) with the powerful capacity to exploregraph-structural data has gained noticeable success in recent years.Nonetheless, most of the existing GCN-based models suffer from the notoriousover-smoothing issue, owing to which shallow networks are extensively adopted.This may be problematic for complex graph datasets because a deeper GCN shouldbe beneficial to propagating information across remote neighbors. Recent workshave devoted effort to addressing over-smoothing problems, includingestablishing residual connection structure or fusing predictions frommulti-layer models. Because of the indistinguishable embeddings from deeplayers, it is reasonable to generate more reliable predictions beforeconducting the combination of outputs from various layers. In light of this, wepropose an Alternating Graph-regularized Neural Network (AGNN) composed ofGraph Convolutional Layer (GCL) and Graph Embedding Layer (GEL). GEL is derivedfrom the graph-regularized optimization containing Laplacian embedding term,which can alleviate the over-smoothing problem by periodic projection from thelow-order feature space onto the high-order space. With more distinguishablefeatures of distinct layers, an improved Adaboost strategy is utilized toaggregate outputs from each layer, which explores integrated embeddings ofmulti-hop neighbors. The proposed model is evaluated via a large number ofexperiments including performance comparison with some multi-layer ormulti-order graph neural networks, which reveals the superior performanceimprovement of AGNN compared with state-of-the-art models.", "output": "AGNN: Alternating Graph-Regularized Neural Networks to Alleviate Over-Smoothing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Long-term stability is a critical property for deep learning-baseddata-driven digital twins of the Earth system. Such data-driven digital twinsenable sub-seasonal and seasonal predictions of extreme environmental events,probabilistic forecasts, that require a large number of ensemble members, andcomputationally tractable high-resolution Earth system models where expensivecomponents of the models can be replaced with cheaper data-driven surrogates.Owing to computational cost, physics-based digital twins, though long-termstable, are intractable for real-time decision-making. Data-driven digitaltwins offer a cheaper alternative to them and can provide real-timepredictions. However, such digital twins can only provide short-term forecastsaccurately since they become unstable when time-integrated beyond 20 days.Currently, the cause of the instabilities is unknown, and the methods that areused to improve their stability horizons are ad-hoc and lack rigorous theory.In this paper, we reveal that the universal causal mechanism for theseinstabilities in any turbulent flow is due to textit{spectral bias} wherein,textit{any} deep learning architecture is biased to learn only the large-scaledynamics and ignores the small scales completely. We further elucidate howturbulence physics and the absence of convergence in deep learning-basedtime-integrators amplify this bias leading to unstable error propagation.Finally, using the quasigeostrophic flow and ECMWF Reanalysis data as testcases, we bridge the gap between deep learning theory and fundamental numericalanalysis to propose one mitigative solution to such instabilities. We developlong-term stable data-driven digital twins for the climate system anddemonstrate accurate short-term forecasts, and hundreds of years of long-termstable time-integration with accurate mean and variability.", "output": "Long-term instabilities of deep learning-based digital twins of the climate system: The cause and a solution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning-based recommender systems (DRSs) are increasingly and widelydeployed in the industry, which brings significant convenience to people'sdaily life in different ways. However, recommender systems are also shown tosuffer from multiple issues,e.g., the echo chamber and the Matthew effect, ofwhich the notation of \"fairness\" plays a core role.While many fairnessnotations and corresponding fairness testing approaches have been developed fortraditional deep classification models, they are essentially hardly applicableto DRSs. One major difficulty is that there still lacks a systematicunderstanding and mapping between the existing fairness notations and thediverse testing requirements for deep recommender systems, not to mentionfurther testing or debugging activities. To address the gap, we proposeFairRec, a unified framework that supports fairness testing of DRSs frommultiple customized perspectives, e.g., model utility, item diversity, itempopularity, etc. We also propose a novel, efficient search-based testingapproach to tackle the new challenge, i.e., double-ended discrete particleswarm optimization (DPSO) algorithm, to effectively search for hidden fairnessissues in the form of certain disadvantaged groups from a vast number ofcandidate groups. Given the testing report, by adopting a simple re-rankingmitigation strategy on these identified disadvantaged groups, we show that thefairness of DRSs can be significantly improved. We conducted extensiveexperiments on multiple industry-level DRSs adopted by leading companies. Theresults confirm that FairRec is effective and efficient in identifying thedeeply hidden fairness issues, e.g., achieving 95% testing accuracy with halfto 1/8 time.", "output": "FairRec: Fairness Testing for Deep Recommender Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work proposes to reduce visibility data volume using abaseline-dependent lossy compression technique that preserves smearing at theedges of the field-of-view. We exploit the relation of the rank of a matrix andthe fact that a low-rank approximation can describe the raw visibility data asa sum of basic components where each basic component corresponds to a specificFourier component of the sky distribution. As such, the entire visibility datais represented as a collection of data matrices from baselines, instead of asingle tensor. The proposed methods are formulated as follows: provided a largedataset of the entire visibility data; the first algorithm, named $simple~SVD$projects the data into a regular sampling space of rank$-r$ data matrices. Inthis space, the data for all the baselines has the same rank, which makes thecompression factor equal across all baselines. The second algorithm, named$BDSVD$ projects the data into an irregular sampling space of rank$-r_{pq}$data matrices. The subscript $pq$ indicates that the rank of the data matrixvaries across baselines $pq$, which makes the compression factorbaseline-dependent. MeerKAT and the European Very Long Baseline InterferometryNetwork are used as reference telescopes to evaluate and compare theperformance of the proposed methods against traditional methods, such astraditional averaging and baseline-dependent averaging (BDA). For the samespatial resolution threshold, both $simple~SVD$ and $BDSVD$ show effectivecompression by two-orders of magnitude higher than traditional averaging andBDA. At the same space-saving rate, there is no decrease in spatial resolutionand there is a reduction in the noise variance in the data which improves theS/N to over $1.5$ dB at the edges of the field-of-view.", "output": "Lossy Compression of Large-Scale Radio Interferometric Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper discusses the results for the second edition of the MonocularDepth Estimation Challenge (MDEC). This edition was open to methods using anyform of supervision, including fully-supervised, self-supervised, multi-task orproxy depth. The challenge was based around the SYNS-Patches dataset, whichfeatures a wide diversity of environments with high-quality dense ground-truth.This includes complex natural environments, e.g. forests or fields, which aregreatly underrepresented in current benchmarks.The challenge received eight unique submissions that outperformed theprovided SotA baseline on any of the pointcloud- or image-based metrics. Thetop supervised submission improved relative F-Score by 27.62%, while the topself-supervised improved it by 16.61%. Supervised submissions generallyleveraged large collections of datasets to improve data diversity.Self-supervised submissions instead updated the network architecture andpretrained backbones. These results represent a significant progress in thefield, while highlighting avenues for future research, such as reducinginterpolation artifacts at depth boundaries, improving self-supervised indoorperformance and overall natural image accuracy.", "output": "The Second Monocular Depth Estimation Challenge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent years have witnessed an exponential increase in the demand for facevideo compression, and the success of artificial intelligence has expanded theboundaries beyond traditional hybrid video coding. Generative coding approacheshave been identified as promising alternatives with reasonable perceptualrate-distortion trade-offs, leveraging the statistical priors of face videos.However, the great diversity of distortion types in spatial and temporaldomains, ranging from the traditional hybrid coding frameworks to generativemodels, present grand challenges in compressed face video quality assessment(VQA). In this paper, we introduce the large-scale Compressed Face VideoQuality Assessment (CFVQA) database, which is the first attempt tosystematically understand the perceptual quality and diversified compressiondistortions in face videos. The database contains 3,240 compressed face videoclips in multiple compression levels, which are derived from 135 source videoswith diversified content using six representative video codecs, including twotraditional methods based on hybrid coding frameworks, two end-to-end methods,and two generative methods. In addition, a FAce VideO IntegeRity (FAVOR) indexfor face video compression was developed to measure the perceptual quality,considering the distinct content characteristics and temporal priors of theface videos. Experimental results exhibit its superior performance on theproposed CFVQA dataset. The benchmark is now made publicly available at:", "output": "Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces DroidBot-GPT, a tool that utilizes GPT-like largelanguage models (LLMs) to automate the interactions with Android mobileapplications. Given a natural language description of a desired task,DroidBot-GPT can automatically generate and execute actions that navigate theapp to complete the task. It works by translating the app GUI state informationand the available actions on the smartphone screen to natural language promptsand asking the LLM to make a choice of actions. Since the LLM is typicallytrained on a large amount of data including the how-to manuals of diversesoftware applications, it has the ability to make reasonable choices of actionsbased on the provided information. We evaluate DroidBot-GPT with a self-createddataset that contains 33 tasks collected from 17 Android applications spanning10 categories. It can successfully complete 39.39% of the tasks, and theaverage partial completion progress is about 66.76%. Given the fact that ourmethod is fully unsupervised (no modification required from both the app andthe LLM), we believe there is great potential to enhance automation performancewith better app development paradigms and/or custom model training.", "output": "DroidBot-GPT: GPT-powered UI Automation for Android."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reasoning on knowledge graphs is a challenging task because it utilizesobserved information to predict the missing one. Specifically, answeringfirst-order logic formulas is of particular interest because of its clearsyntax and semantics. Recently, the query embedding method has been proposedwhich learns the embedding of a set of entities and treats logic operations asset operations. Though there has been much research following the samemethodology, it lacks a systematic inspection from the standpoint of logic. Inthis paper, we characterize the scope of queries investigated previously andprecisely identify the gap between it and the whole family of existentialformulas. Moreover, we develop a new dataset containing ten new formulas anddiscuss the new challenges coming simultaneously. Finally, we propose a newsearch algorithm from fuzzy logic theory which is capable of solving newformulas and outperforming the previous methods in existing formulas.", "output": "On Existential First Order Queries Inference on Knowledge Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Entity alignment (EA) aims to find equivalent entities in different knowledgegraphs (KGs). State-of-the-art EA approaches generally use Graph NeuralNetworks (GNNs) to encode entities. However, most of them train the models andevaluate the results in a fullbatch fashion, which prohibits EA from beingscalable on largescale datasets. To enhance the usability of GNN-based EAmodels in real-world applications, we present SEA, a scalable entity alignmentsystem that enables to (i) train large-scale GNNs for EA, (ii) speed up thenormalization and the evaluation process, and (iii) report clear results forusers to estimate different models and parameter settings. SEA can be run on acomputer with merely one graphic card. Moreover, SEA encompasses sixstate-of-the-art EA models and provides access for users to quickly establishand evaluate their own models. Thus, SEA allows users to perform EA withoutbeing involved in tedious implementations, such as negative sampling andGPU-accelerated evaluation. With SEA, users can gain a clear view of the modelperformance. In the demonstration, we show that SEA is user-friendly and is ofhigh scalability even on computers with limited computational resources.", "output": "SEA: A Scalable Entity Alignment System."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Structured reconstruction is a non-trivial dense prediction problem, whichextracts structural information (eg, building corners and edges) from a rasterimage, then reconstructs it to a 2D planar graph accordingly. Compared withcommon segmentation or detection problems, it significantly relays on thecapability that leveraging holistic geometric information for structuralreasoning. Current transformer-based approaches tackle this challenging problemin a two-stage manner, which detect corners in the first model and classify theproposed edges (corner-pairs) in the second model. However, they separatetwo-stage into different models and only share the backbone encoder. Unlike theexisting modeling strategies, we present an enhanced corner representationmethod: 1) It fuses knowledge between the corner detection and edge predictionby sharing feature in different granularity; 2) Corner candidates are proposedin four heatmap channels w.r.t its direction. Both qualitative and quantitativeevaluations demonstrate that our proposed method can better reconstructfine-grained structures, such as adjacent corners and tiny edges. Consequently,it outperforms the state-of-the-art model by +1.9%@F-1 on Corner and+3.0%@F-1 on Edge.", "output": "CornerFormer: Boosting Corner Representation for Fine-Grained Structured Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automatic and periodic recompiling of building databases with up-to-datehigh-resolution images has become a critical requirement for rapidly developingurban environments. However, the architecture of most existing approaches forchange extraction attempts to learn features related to changes but ignoresobjectives related to buildings. This inevitably leads to the generation ofsignificant pseudo-changes, due to factors such as seasonal changes in imagesand the inclination of building fac{c}ades. To alleviate the above-mentionedproblems, we developed a contrastive learning approach by validating historicalbuilding footprints against single up-to-date remotely sensed images. Thiscontrastive learning strategy allowed us to inject the semantics of buildingsinto a pipeline for the detection of changes, which is achieved by increasingthe distinguishability of features of buildings from those of non-buildings. Inaddition, to reduce the effects of inconsistencies between historical buildingpolygons and buildings in up-to-date images, we employed a deformableconvolutional neural network to learn offsets intuitively. In summary, weformulated a multi-branch building extraction method that identifies newlyconstructed and removed buildings, respectively. To validate our method, weconducted comparative experiments using the public Wuhan University buildingchange detection dataset and a more practical dataset named SI-BU that weestablished. Our method achieved F1 scores of 93.99% and 70.74% on the abovedatasets, respectively. Moreover, when the data of the public dataset weredivided in the same manner as in previous related studies, our method achievedan F1 score of 94.63%, which surpasses that of the state-of-the-art method.", "output": "BCE-Net: Reliable Building Footprints Change Extraction based on Historical Map and Up-to-Date Images using Contrastive Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual reasoning is a long-term goal of vision research. In the last decade,several works have attempted to apply deep neural networks (DNNs) to the taskof learning visual relations from images, with modest results in terms of thegeneralization of the relations learned. In recent years, several innovationsin DNNs have been developed in order to enable learning abstract relation fromimages. In this work, we systematically evaluate a series of DNNs thatintegrate mechanism such as slot attention, recurrently guided attention, andexternal memory, in the simplest possible visual reasoning task: decidingwhether two objects are the same or different. We found that, although somemodels performed better than others in generalizing the same-different relationto specific types of images, no model was able to generalize this relationacross the board. We conclude that abstract visual reasoning remains largely anunresolved challenge for DNNs.", "output": "The role of object-centric representations, guided attention, and external memory on generalizing visual relations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper summarizes our contributions to the document-grounded dialog tasksat the 9th and 10th Dialog System Technology Challenges (DSTC9 and DSTC10). Inboth iterations the task consists of three subtasks: first detect whether thecurrent turn is knowledge seeking, second select a relevant knowledge document,and third generate a response grounded on the selected document. For DSTC9 weproposed different approaches to make the selection task more efficient. Thebest method, Hierarchical Selection, actually improves the results compared tothe original baseline and gives a speedup of 24x. In the DSTC10 iteration ofthe task, the challenge was to adapt systems trained on written dialogs toperform well on noisy automatic speech recognition transcripts. Therefore, weproposed data augmentation techniques to increase the robustness of the modelsas well as methods to adapt the style of generated responses to fit well intothe proceeding dialog. Additionally, we proposed a noisy channel model thatallows for increasing the factuality of the generated responses. In addition tosummarizing our previous contributions, in this work, we also report on a fewsmall improvements and reconsider the automatic evaluation metrics for thegeneration task which have shown a low correlation to human judgments.", "output": "Task-oriented Document-Grounded Dialog Systems by HLTPR@RWTH for DSTC9 and DSTC10."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Point clouds are widely regarded as one of the best dataset types for urbanmapping purposes. Hence, point cloud datasets are commonly investigated asbenchmark types for various urban interpretation methods. Yet, few researchershave addressed the use of point cloud benchmarks for fac{c}ade segmentation.Robust fac{c}ade segmentation is becoming a key factor in various applicationsranging from simulating autonomous driving functions to preserving culturalheritage. In this work, we present a method of enriching existing point clouddatasets with fac{c}ade-related classes that have been designed to facilitatefac{c}ade segmentation testing. We propose how to efficiently extend existingdatasets and comprehensively assess their potential for fac{c}adesegmentation. We use the method to create the TUM-FAc{C}ADE dataset, whichextends the capabilities of TUM-MLS-2016. Not only can TUM-FAc{C}ADEfacilitate the development of point-cloud-based fac{c}ade segmentation tasks,but our procedure can also be applied to enrich further datasets.", "output": "TUM-FA\\c{C}ADE: Reviewing and enriching point cloud benchmarks for fa\\c{c}ade segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Speech separation remains an important area of multi-speaker signalprocessing. Deep neural network (DNN) models have attained the best performanceon many speech separation benchmarks. Some of these models can take significanttime to train and have high memory requirements. Previous work has proposedshortening training examples to address these issues but the impact of this onmodel performance is not yet well understood. In this work, the impact ofapplying these training signal length (TSL) limits is analysed for two speechseparation models: SepFormer, a transformer model, and Conv-TasNet, aconvolutional model. The WJS0-2Mix, WHAMR and Libri2Mix datasets are analysedin terms of signal length distribution and its impact on training efficiency.It is demonstrated that, for specific distributions, applying specific TSLlimits results in better performance. This is shown to be mainly due torandomly sampling the start index of the waveforms resulting in more uniqueexamples for training. A SepFormer model trained using a TSL limit of 4.42s anddynamic mixing (DM) is shown to match the best-performing SepFormer modeltrained with DM and unlimited signal lengths. Furthermore, the 4.42s TSL limitresults in a 44% reduction in training time with WHAMR.", "output": "On Data Sampling Strategies for Training Neural Network Speech Separation Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The car-following (CF) model is the core component for traffic simulationsand has been built-in in many production vehicles with Advanced DrivingAssistance Systems (ADAS). Research of CF behavior allows us to identify thesources of different macro phenomena induced by the basic process of pairwisevehicle interaction. The CF behavior and control model encompasses variousfields, such as traffic engineering, physics, cognitive science, machinelearning, and reinforcement learning. This paper provides a comprehensivesurvey highlighting differences, complementarities, and overlaps among variousCF models according to their underlying logic and principles. We reviewedrepresentative algorithms, ranging from the theory-based kinematic models,stimulus-response models, and cruise control models to data-driven BehaviorCloning (BC) and Imitation Learning (IL) and outlined their strengths andlimitations. This review categorizes CF models that are conceptualized invarying principles and summarize the vast literature with a holistic framework.", "output": "A Review on Longitudinal Car-Following Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A key challenge for a reinforcement learning (RL) agent is to incorporateexternal/expert1 advice in its learning. The desired goals of an algorithm thatcan shape the learning of an RL agent with external advice include (a)maintaining policy invariance; (b) accelerating the learning of the agent; and(c) learning from arbitrary advice [3]. To address this challenge this paperformulates the problem of incorporating external advice in RL as a multi-armedbandit called shaping-bandits. The reward of each arm of shaping banditscorresponds to the return obtained by following the expert or by following adefault RL algorithm learning on the true environment reward.We show thatdirectly applying existing bandit and shaping algorithms that do not reasonabout the non-stationary nature of the underlying returns can lead to poorresults. Thus we propose UCB-PIES (UPIES), Racing-PIES (RPIES), and Lazy PIES(LPIES) three different shaping algorithms built on different assumptions thatreason about the long-term consequences of following the expert policy or thedefault RL algorithm. Our experiments in four different settings show thatthese proposed algorithms achieve the above-mentioned goals whereas the otheralgorithms fail to do so.", "output": "Bandit-Based Policy Invariant Explicit Shaping for Incorporating External Advice in Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "GPT-3 and several other language models (LMs) can effectively address variousnatural language processing (NLP) tasks, including machine translation and textsummarization. Recently, they have also been successfully employed in thebusiness process management (BPM) domain, e.g., for predictive processmonitoring and process extraction from text. This, however, typically requiresfine-tuning the employed LM, which, among others, necessitates large amounts ofsuitable training data. A possible solution to this problem is the use ofprompt engineering, which leverages pre-trained LMs without fine-tuning them.Recognizing this, we argue that prompt engineering can help bring thecapabilities of LMs to BPM research. We use this position paper to develop aresearch agenda for the use of prompt engineering for BPM research byidentifying the associated potentials and challenges.", "output": "Just Tell Me: Prompt Engineering in Business Process Management."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Over the last few years, we have not seen any major developments inmodel-free or model-based learning methods that would make one obsoleterelative to the other. In most cases, the used technique is heavily dependenton the use case scenario or other attributes, e.g. the environment. Bothapproaches have their own advantages, for example, sample efficiency orcomputational efficiency. However, when combining the two, the advantages ofeach can be combined and hence achieve better performance. The TD-MPC frameworkis an example of this approach. On the one hand, a world model in combinationwith model predictive control is used to get a good initial estimate of thevalue function. On the other hand, a Q function is used to provide a goodlong-term estimate. Similar to algorithms like MuZero a latent staterepresentation is used, where only task-relevant information is encoded toreduce the complexity. In this paper, we propose the use of a reconstructionfunction within the TD-MPC framework, so that the agent can reconstruct theoriginal observation given the internal state representation. This allows ouragent to have a more stable learning signal during training and also improvessample efficiency. Our proposed addition of another loss term leads to improvedperformance on both state- and image-based tasks from the DeepMind-Controlsuite.", "output": "Model Predictive Control with Self-supervised Representation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this technical report, we evaluated the performance of the ChatGPT andGPT-3 models for the task of vulnerability detection in code. Our evaluationwas conducted on our real-world dataset, using binary and multi-labelclassification tasks on CWE vulnerabilities. We decided to evaluate the modelbecause it has shown good performance on other code-based tasks, such assolving programming challenges and understanding code at a high level. However,we found that the ChatGPT model performed no better than a dummy classifier forboth binary and multi-label classification tasks for code vulnerabilitydetection.", "output": "Evaluation of ChatGPT Model for Vulnerability Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The localization of objects is a crucial task in various applications such asrobotics, virtual and augmented reality, and the transportation of goods inwarehouses. Recent advances in deep learning have enabled the localizationusing monocular visual cameras. While structure from motion (SfM) predicts theabsolute pose from a point cloud, absolute pose regression (APR) methods learna semantic understanding of the environment through neural networks. However,both fields face challenges caused by the environment such as motion blur,lighting changes, repetitive patterns, and feature-less structures. This studyaims to address these challenges by incorporating additional information andregularizing the absolute pose using relative pose regression (RPR) methods.The optical flow between consecutive images is computed using the Lucas-Kanadealgorithm, and the relative pose is predicted using an auxiliary smallrecurrent convolutional network. The fusion of absolute and relative poses is acomplex task due to the mismatch between the global and local coordinatesystems. State-of-the-art methods fusing absolute and relative poses use posegraph optimization (PGO) to regularize the absolute pose predictions usingrelative poses. In this work, we propose recurrent fusion networks to optimallyalign absolute and relative pose predictions to improve the absolute poseprediction. We evaluate eight different recurrent units and construct asimulation environment to pre-train the APR and RPR networks for bettergeneralized training. Additionally, we record a large database of differentscenarios in a challenging large-scale indoor environment that mimics awarehouse with transportation robots. We conduct hyperparameter searches andexperiments to show the effectiveness of our recurrent fusion method comparedto PGO.", "output": "Fusing Structure from Motion and Simulation-Augmented Pose Regression from Optical Flow for Challenging Indoor Environments."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language models pre-trained on large self-supervised corpora, followed bytask-specific fine-tuning has become the dominant paradigm in NLP. Thesepre-training datasets often have a one-to-many structure--e.g. in dialoguethere are many valid responses for a given context. However, only some of theseresponses will be desirable in our downstream task. This raises the question ofhow we should train the model such that it can emulate the desirablebehaviours, but not the undesirable ones. Current approaches train in aone-to-one setup--only a single target response is given for a single dialoguecontext--leading to models only learning to predict the average response, whileignoring the full range of possible responses. Using text-based games as atestbed, our approach, PASA, uses discrete latent variables to capture therange of different behaviours represented in our larger pre-training dataset.We then use knowledge distillation to distil the posterior probabilitydistribution into a student model. This probability distribution is far richerthan learning from only the hard targets of the dataset, and thus allows thestudent model to benefit from the richer range of actions the teacher model haslearned. Results show up to 49% empirical improvement over the previousstate-of-the-art model on the Jericho Walkthroughs dataset.", "output": "Learn What Is Possible, Then Choose What Is Best: Disentangling One-To-Many Relations in Language Through Text-based Games."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of synthetically generating data that can closelyresemble human decisions made in the context of an interactive human-AI systemlike a computer game. We propose a novel algorithm that can generate synthetic,human-like, decision making data while starting from a very small set ofdecision making data collected from humans. Our proposed algorithm integratesthe concept of reward shaping with an imitation learning algorithm to generatethe synthetic data. We have validated our synthetic data generation techniqueby using the synthetically generated data as a surrogate for human interactiondata to solve three sequential decision making tasks of increasing complexitywithin a small computer game-like setup. Different empirical and statisticalanalyses of our results show that the synthetically generated data cansubstitute the human data and perform the game-playing tasks almostindistinguishably, with very low divergence, from a human performing the sametasks.", "output": "Synthetically Generating Human-like Data for Sequential Decision Making Tasks via Reward-Shaped Imitation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Combinatorial optimization problems (COPs) are an important research topic invarious fields. In recent times, there have been many attempts to solve COPsusing deep learning-based approaches. We propose a novel neural network modelthat solves COPs involving geometry based on self-attention and a new attentionmechanism. The proposed model is designed such that the model efficientlylearns point-to-point relationships in COPs involving geometry usingself-attention in the encoder. We propose efficient input and output sequenceordering methods that reduce ambiguities such that the model learns thesequences more regularly and effectively. Geometric COPs involve geometricrequirements that need to be satisfied. In the decoder, a new masking schemeusing domain knowledge is proposed to provide a high penalty when the geometricrequirement of the problem is not satisfied. The proposed neural net is aflexible framework that can be applied to various COPs involving geometry. Weconduct experiments to demonstrate the effectiveness of the proposed model forthree COPs involving geometry: Delaunay triangulation, convex hull, and theplanar Traveling Salesman problem. Our experimental results show that theproposed model exhibits competitive performance in finding approximatesolutions for solving these problems.", "output": "Learning Geometric Combinatorial Optimization Problems using Self-attention and Domain Knowledge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The traditional production paradigm of large batch production does not offerflexibility towards satisfying the requirements of individual customers. A newgeneration of smart factories is expected to support new multi-variety andsmall-batch customized production modes. For that, Artificial Intelligence (AI)is enabling higher value-added manufacturing by accelerating the integration ofmanufacturing and information communication technologies, including computing,communication, and control. The characteristics of a customized smart factoryare to include self-perception, operations optimization, dynamicreconfiguration, and intelligent decision-making. The AI technologies willallow manufacturing systems to perceive the environment, adapt to externalneeds, and extract the processed knowledge, including business models, such asintelligent production, networked collaboration, and extended service models.This paper focuses on the implementation of AI in customized manufacturing(CM). The architecture of an AI-driven customized smart factory is presented.Details of intelligent manufacturing devices, intelligent informationinteraction, and the construction of a flexible manufacturing line areshowcased. The state-of-the-art AI technologies of potential use in CM, i.e.,machine learning, multi-agent systems, Internet of Things, big data, andcloud-edge computing are surveyed. The AI-enabled technologies in a customizedsmart factory are validated with a case study of customized packaging. Theexperimental results have demonstrated that the AI-assisted CM offers thepossibility of higher production flexibility and efficiency. Challenges andsolutions related to AI in CM are also discussed.", "output": "Artificial Intelligence-Driven Customized Manufacturing Factory: Key Technologies, Applications, and Challenges."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, deep learning-based algorithms are widely adopted due to theadvantage of being able to establish anomaly detection models without or withminimal domain knowledge of the task. Instead, to train the artificial neuralnetwork more stable, it should be better to define the appropriate neuralnetwork structure or the loss function. For the training anomaly detectionmodel, the mean squared error (MSE) function is adopted widely. On the otherhand, the novel loss function, logarithmic mean squared error (LMSE), isproposed in this paper to train the neural network more stable. This studycovers a variety of comparisons from mathematical comparisons, visualization inthe differential domain for backpropagation, loss convergence in the trainingprocess, and anomaly detection performance. In an overall view, LMSE issuperior to the existing MSE function in terms of strongness of lossconvergence, anomaly detection performance. The LMSE function is expected to beapplicable for training not only the anomaly detection model but also thegeneral generative neural network.", "output": "Concise Logarithmic Loss Function for Robust Training of Anomaly Detection Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Background: Many published machine learning studies are irreproducible.Issues with methodology and not properly accounting for variation introduced bythe algorithm themselves or their implementations are attributed as the maincontributors to the irreproducibility.Problem: There exist no theoreticalframework that relates experiment design choices to potential effects on theconclusions. Without such a framework, it is much harder for practitioners andresearchers to evaluate experiment results and describe the limitations ofexperiments. The lack of such a framework also makes it harder for independentresearchers to systematically attribute the causes of failed reproducibilityexperiments. Objective: The objective of this paper is to develop a frameworkthat enable applied data science practitioners and researchers to understandwhich experiment design choices can lead to false findings and how and by thishelp in analyzing the conclusions of reproducibility experiments. Method: Wehave compiled an extensive list of factors reported in the literature that canlead to machine learning studies being irreproducible. These factors areorganized and categorized in a reproducibility framework motivated by thestages of the scientific method. The factors are analyzed for how they canaffect the conclusions drawn from experiments. A model comparison study is usedas an example. Conclusion: We provide a framework that describes machinelearning methodology from experimental design decisions to the conclusionsinferred from them.", "output": "Sources of Irreproducibility in Machine Learning: A Review."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data-driven, learning-based control methods offer the potential to improveoperations in complex systems, and model-free deep reinforcement learningrepresents a popular approach to data-driven control. However, existing classesof algorithms present a trade-off between two important deployment requirementsfor real-world control: (i) practical performance guarantees and (ii) dataefficiency. Off-policy algorithms make efficient use of data through samplereuse but lack theoretical guarantees, while on-policy algorithms guaranteeapproximate policy improvement throughout training but suffer from high samplecomplexity. In order to balance these competing goals, we develop a class ofGeneralized Policy Improvement algorithms that combines the policy improvementguarantees of on-policy methods with the efficiency of sample reuse. Wedemonstrate the benefits of this new class of algorithms through extensiveexperimental analysis on a variety of continuous control tasks from theDeepMind Control Suite.", "output": "Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a methodology for generating random multi-qubit stabilizer codesbased on solving a constraint satisfaction problem (CSP) on random bipartitegraphs. This framework allows us to enforce stabilizer commutation, $X/Z$balancing, finite rate, sparsity, and maximum-degree constraints simultaneouslyin a CSP that we can then solve numerically. Using a state-of-the-art CSPsolver, we obtain convincing evidence for the existence of a satisfiabilitythreshold. Furthermore, the extent of the satisfiable phase increases with thenumber of qubits. In that phase, finding sparse codes becomes an easy problem.Moreover, we observe that the sparse codes found in the satisfiable phasepractically achieve the channel capacity for erasure noise. Our results showthat intermediate-size finite-rate sparse quantum codes are easy to find, whilealso demonstrating a flexible methodology for generating good codes with customproperties. We therefore establish a complete and customizable pipeline forrandom quantum code discovery.", "output": "Finite-rate sparse quantum codes aplenty."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformer-based pre-trained models like BERT have achieved great progresson Semantic Sentence Matching. Meanwhile, dependency prior knowledge has alsoshown general benefits in multiple NLP tasks. However, how to efficientlyintegrate dependency prior structure into pre-trained models to better modelcomplex semantic matching relations is still unsettled. In this paper, wepropose the textbf{D}ependency-Enhanced textbf{A}daptive textbf{F}usiontextbf{A}ttention (textbf{DAFA}), which explicitly introduces dependencystructure into pre-trained models and adaptively fuses it with semanticinformation. Specifically, textbf{emph{(i)}} DAFA first proposes astructure-sensitive paradigm to construct a dependency matrix for calibratingattention weights. It adopts an adaptive fusion module to integrate theobtained dependency information and the original semantic signals. Moreover,DAFA reconstructs the attention calculation flow and provides betterinterpretability. By applying it on BERT, our method achieves state-of-the-artor competitive performance on 10 public datasets, demonstrating the benefits ofadaptively fusing dependency structure in semantic matching task.", "output": "Improving Semantic Matching through Dependency-Enhanced Pre-trained Model with Adaptive Fusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent works have demonstrated that natural language can be used to generateand edit 3D shapes. However, these methods generate shapes with limitedfidelity and diversity. We introduce CLIP-Sculptor, a method to address theseconstraints by producing high-fidelity and diverse 3D shapes without the needfor (text, shape) pairs during training. CLIP-Sculptor achieves this in amulti-resolution approach that first generates in a low-dimensional latentspace and then upscales to a higher resolution for improved shape fidelity. Forimproved shape diversity, we use a discrete latent space which is modeled usinga transformer conditioned on CLIP's image-text embedding space. We also presenta novel variant of classifier-free guidance, which improves theaccuracy-diversity trade-off. Finally, we perform extensive experimentsdemonstrating that CLIP-Sculptor outperforms state-of-the-art baselines. Thecode is available at ", "output": "CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Natural Language."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In off-policy reinforcement learning, a behaviour policy performs exploratoryinteractions with the environment to obtain state-action-reward samples whichare then used to learn a target policy that optimises the expected return. Thisleads to a problem of off-policy evaluation, where one needs to evaluate thetarget policy from samples collected by the often unrelated behaviour policy.Importance sampling is a traditional statistical technique that is oftenapplied to off-policy evaluation. While importance sampling estimators areunbiased, their variance increases exponentially with the horizon of thedecision process due to computing the importance weight as a product of actionprobability ratios, yielding estimates with low accuracy for domains involvinglong-term planning. This paper proposes state-based importance sampling (SIS),which drops the action probability ratios of sub-trajectories with \"negligiblestates\" -- roughly speaking, those for which the chosen actions have no impacton the return estimate -- from the computation of the importance weight.Theoretical results demonstrate a smaller exponent for the variance upper boundas well as a lower mean squared error. To identify negligible states, twosearch algorithms are proposed, one based on covariance testing and one basedon state-action values. Using the formulation of SIS, we then analogouslyformulate state-based variants of weighted importance sampling, per-decisionimportance sampling, and incremental importance sampling based on thestate-action value identification algorithm. Moreover, we note that doublyrobust estimators may also benefit from SIS. Experiments in two gridworlddomains and one inventory management domain show that state-based methods yieldreduced variance and improved accuracy.", "output": "Low Variance Off-policy Evaluation with State-based Importance Sampling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The evolution of wireless communications into 6G and beyond is expected torely on new machine learning (ML)-based capabilities. These can enableproactive decisions and actions from wireless-network components to sustainquality-of-service (QoS) and user experience. Moreover, new use cases in thearea of vehicular and industrial communications will emerge. Specifically inthe area of vehicle communication, vehicle-to-everything (V2X) schemes willbenefit strongly from such advances. With this in mind, we have conducted adetailed measurement campaign that paves the way to a plethora of diverseML-based studies. The resulting datasets offer GPS-located wirelessmeasurements across diverse urban environments for both cellular (with twodifferent operators) and sidelink radio access technologies, thus enabling avariety of different studies towards V2X. The datasets are labeled and sampledwith a high time resolution. Furthermore, we make the data publicly availablewith all the necessary information to support the onboarding of newresearchers. We provide an initial analysis of the data showing some of thechallenges that ML needs to overcome and the features that ML can leverage, aswell as some hints at potential research studies.", "output": "Berlin V2X: A Machine Learning Dataset from Multiple Vehicles and Radio Access Technologies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose to integrate weapon system features (such as weapon systemmanufacturer, deployment time and location, storage time and location, etc.)into a parameterized Cox-Weibull [1] reliability model via a neural network,like DeepSurv [2], to improve predictive maintenance. In parallel, we developan alternative Bayesian model by parameterizing the Weibull parameters with aneural network and employing dropout methods such as Monte-Carlo (MC)-dropoutfor comparative purposes. Due to data collection procedures in weapon systemtesting we employ a novel interval-censored log-likelihood which incorporatesMonte-Carlo Markov Chain (MCMC) [3] sampling of the Weibull parameters duringgradient descent optimization. We compare classification metrics such asreceiver operator curve (ROC) area under the curve (AUC), precision-recall (PR)AUC, and F scores to show our model generally outperforms traditional powerfulmodels such as XGBoost and the current standard conditional Weibull probabilitydensity estimation model.", "output": "Bayesian Weapon System Reliability Modeling with Cox-Weibull Neural Network."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph Neural Networks (GNNs) are often used for tasks involving the geometryof a given graph, such as molecular dynamics simulation. Although the distancematrix of a geometric graph contains complete geometric information, it hasbeen demonstrated that Message Passing Neural Networks (MPNNs) are insufficientfor learning this geometry. In this work, we expand on the families ofcounterexamples that MPNNs are unable to distinguish from their distancematrices, by constructing families of novel and symmetric geometric graphs. Wethen propose $k$-DisGNNs, which can effectively exploit the rich geometrycontained in the distance matrix. We demonstrate the high expressive power ofour models by proving the universality of $k$-DisGNNs for distinguishinggeometric graphs when $k geq 3$, and that some existing well-designedgeometric models can be unified by $k$-DisGNNs as special cases. Mostimportantly, we establish a connection between geometric deep learning andtraditional graph representation learning, showing that those highly expressiveGNN models originally designed for graph structure learning can also be appliedto geometric deep learning problems with impressive performance, and thatexisting complex, equivariant models are not the only solution. Experimentalresults verify our theory.", "output": "Is Distance Matrix Enough for Geometric Deep Learning?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Past work in natural language processing interpretability focused mainly onpopular classification tasks while largely overlooking generation settings,partly due to a lack of dedicated tools. In this work, we introduce Inseq, aPython library to democratize access to interpretability analyses of sequencegeneration models. Inseq enables intuitive and optimized extraction of models'internal information and feature importance scores for popular decoder-only andencoder-decoder Transformers architectures. We showcase its potential byadopting it to highlight gender biases in machine translation models and locatefactual knowledge inside GPT-2. Thanks to its extensible interface supportingcutting-edge techniques such as contrastive feature attribution, Inseq candrive future advances in explainable natural language generation, centralizinggood practices and enabling fair and reproducible model evaluations.", "output": "Inseq: An Interpretability Toolkit for Sequence Generation Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Rewrite systems [6, 10, 12] have been widely employing equality saturation[9], which is an optimisation methodology that uses a saturated e-graph torepresent all possible sequences of rewrite simultaneously, and then extractsthe optimal one. As such, optimal results can be achieved by avoiding thephase-ordering problem. However, we observe that when the e-graph is notsaturated, it cannot represent all possible rewrite opportunities and thereforethe phase-ordering problem is re-introduced during the construction phase ofthe e-graph. To address this problem, we propose MCTS-GEB, a domain-generalrewrite system that applies reinforcement learning (RL) to e-graphconstruction. At its core, MCTS-GEB uses a Monte Carlo Tree Search (MCTS) [3]to efficiently plan for the optimal e-graph construction, and therefore it caneffectively eliminate the phase-ordering problem at the construction phase andachieve better performance within a reasonable time. Evaluation in twodifferent domains shows MCTS-GEB can outperform the state-of-the-art rewritesystems by up to 49x, while the optimisation can generally take less than anhour, indicating MCTS-GEB is a promising building block for the futuregeneration of rewrite systems.", "output": "MCTS-GEB: Monte Carlo Tree Search is a Good E-graph Builder."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "How will superhuman artificial intelligence (AI) affect human decisionmaking? And what will be the mechanisms behind this effect? We address thesequestions in a domain where AI already exceeds human performance, analyzingmore than 5.8 million move decisions made by professional Go players over thepast 71 years (1950-2021). To address the first question, we use a superhumanAI program to estimate the quality of human decisions across time, generating58 billion counterfactual game patterns and comparing the win rates of actualhuman decisions with those of counterfactual AI decisions. We find that humansbegan to make significantly better decisions following the advent of superhumanAI. We then examine human players' strategies across time and find that noveldecisions (i.e., previously unobserved moves) occurred more frequently andbecame associated with higher decision quality after the advent of superhumanAI. Our findings suggest that the development of superhuman AI programs mayhave prompted human players to break away from traditional strategies andinduced them to explore novel moves, which in turn may have improved theirdecision-making.", "output": "Superhuman Artificial Intelligence Can Improve Human Decision Making by Increasing Novelty."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Artificial intelligence (AI) researchers have been developing and refininglarge language models (LLMs) that exhibit remarkable capabilities across avariety of domains and tasks, challenging our understanding of learning andcognition. The latest model developed by OpenAI, GPT-4, was trained using anunprecedented scale of compute and data. In this paper, we report on ourinvestigation of an early version of GPT-4, when it was still in activedevelopment by OpenAI. We contend that (this early version of) GPT-4 is part ofa new cohort of LLMs (along with ChatGPT and Google's PaLM for example) thatexhibit more general intelligence than previous AI models. We discuss therising capabilities and implications of these models. We demonstrate that,beyond its mastery of language, GPT-4 can solve novel and difficult tasks thatspan mathematics, coding, vision, medicine, law, psychology and more, withoutneeding any special prompting. Moreover, in all of these tasks, GPT-4'sperformance is strikingly close to human-level performance, and often vastlysurpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4'scapabilities, we believe that it could reasonably be viewed as an early (yetstill incomplete) version of an artificial general intelligence (AGI) system.In our exploration of GPT-4, we put special emphasis on discovering itslimitations, and we discuss the challenges ahead for advancing towards deeperand more comprehensive versions of AGI, including the possible need forpursuing a new paradigm that moves beyond next-word prediction. We concludewith reflections on societal influences of the recent technological leap andfuture research directions.", "output": "Sparks of Artificial General Intelligence: Early experiments with GPT-4."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent text-to-image generation models like DreamBooth have made remarkableprogress in generating highly customized images of a target subject, byfine-tuning an ``expert model'' for a given subject from a few examples.However, this process is expensive, since a new expert model must be learnedfor each subject. In this paper, we present SuTI, a Subject-drivenText-to-Image generator that replaces subject-specific fine tuning withemph{in-context} learning. Given a few demonstrations of a new subject, SuTIcan instantly generate novel renditions of the subject in different scenes,without any subject-specific optimization. SuTI is powered by {emapprenticeship learning}, where a single apprentice model is learned from datagenerated by massive amount of subject-specific expert models. Specifically, wemine millions of image clusters from the Internet, each centered around aspecific visual subject. We adopt these clusters to train massive amount ofexpert models specialized on different subjects. The apprentice model SuTI thenlearns to mimic the behavior of these experts through the proposedapprenticeship learning algorithm. SuTI can generate high-quality andcustomized subject-specific images 20x faster than optimization-based SoTAmethods. On the challenging DreamBench and DreamBench-v2, our human evaluationshows that SuTI can significantly outperform existing approaches likeInstructPix2Pix, Textual Inversion, Imagic, Prompt2Prompt, Re-Imagen whileperforming on par with DreamBooth.", "output": "Subject-driven Text-to-Image Generation via Apprenticeship Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the field of artificial intelligence for science, it is consistently anessential challenge to face a limited amount of labeled data for real-worldproblems. The prevailing approach is to pretrain a powerful task-agnostic modelon a large unlabeled corpus but may struggle to transfer knowledge todownstream tasks. In this study, we propose InstructMol, a semi-supervisedlearning algorithm, to take better advantage of unlabeled examples. Itintroduces an instructor model to provide the confidence ratios as themeasurement of pseudo-labels' reliability. These confidence scores then guidethe target model to pay distinct attention to different data points, avoidingthe over-reliance on labeled data and the negative influence of incorrectpseudo-annotations. Comprehensive experiments show that InstructBiosubstantially improves the generalization ability of molecular models, in notonly molecular property predictions but also activity cliff estimations,demonstrating the superiority of the proposed method. Furthermore, our evidenceindicates that InstructBio can be equipped with cutting-edge pretrainingmethods and used to establish large-scale and task-specific pseudo-labeledmolecular datasets, which reduces the predictive errors and shortens thetraining process. Our work provides strong evidence that semi-supervisedlearning can be a promising tool to overcome the data scarcity limitation andadvance molecular representation learning.", "output": "InstructBio: A Large-scale Semi-supervised Learning Paradigm for Biochemical Problems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The integration of renewable energy sources into the power grid is becomingincreasingly important as the world moves towards a more sustainable energyfuture in line with SDG 7. However, the intermittent nature of renewable energysources can make it challenging to manage the power grid and ensure a stablesupply of electricity, which is crucial for achieving SDG 9. In this paper, wepropose a deep learning-based approach for predicting energy demand in a smartpower grid, which can improve the integration of renewable energy sources byproviding accurate predictions of energy demand. Our approach aligns with SDG13 on climate action as it enables more efficient management of renewableenergy resources. We use long short-term memory networks, which are well-suitedfor time series data, to capture complex patterns and dependencies in energydemand data. The proposed approach is evaluated using four datasets ofhistorical short term energy demand data from different energy distributioncompanies including American Electric Power, Commonwealth Edison, Dayton Powerand Light, and Pennsylvania-New Jersey-Maryland Interconnection. The proposedmodel is also compared with three other state of the art forecasting algorithmsnamely, Facebook Prophet, Support Vector Regressor, and Random ForestRegressor. The experimental results show that the proposed REDf model canaccurately predict energy demand with a mean absolute error of 1.4%, indicatingits potential to enhance the stability and efficiency of the power grid andcontribute to achieving SDGs 7, 9, and 13. The proposed model also have thepotential to manage the integration of renewable energy sources in an effectivemanner.", "output": "Predicting Short Term Energy Demand in Smart Grid: A Deep Learning Approach for Integrating Renewable Energy Sources in Line with SDGs 7, 9, and 13."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The large-scale multiobjective optimization problem (LSMOP) is characterizedby simultaneously optimizing multiple conflicting objectives and involvinghundreds of decision variables. Many real-world applications in engineeringfields can be modeled as LSMOPs; simultaneously, engineering applicationsrequire insensitivity in performance. This requirement usually means that theresults from the algorithm runs should not only be good for every run in termsof performance but also that the performance of multiple runs should notfluctuate too much, i.e., the algorithm shows good insensitivity. Consideringthat substantial computational resources are requested for each run, it isessential to improve upon the performance of the large-scale multiobjectiveoptimization algorithm, as well as the insensitivity of the algorithm. However,existing large-scale multiobjective optimization algorithms solely focus onimproving the performance of the algorithms, leaving the insensitivitycharacteristics unattended. In this work, we propose an evolutionary algorithmfor solving LSMOPs based on Monte Carlo tree search, the so-called LMMOCTS,which aims to improve the performance and insensitivity for large-scalemultiobjective optimization problems. The proposed method samples the decisionvariables to construct new nodes on the Monte Carlo tree for optimization andevaluation. It selects nodes with good evaluation for further search to reducethe performance sensitivity caused by large-scale decision variables. Wecompare the proposed algorithm with several state-of-the-art designs ondifferent benchmark functions. We also propose two metrics to measure thesensitivity of the algorithm. The experimental results confirm theeffectiveness and performance insensitivity of the proposed design for solvinglarge-scale multiobjective optimization problems.", "output": "Improving Performance Insensitivity of Large-scale Multiobjective Optimization via Monte Carlo Tree Search."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Resource limitations make it hard to provide all students with one of themost effective educational interventions: personalized instruction.Reinforcement learning could be a key tool to reduce the development cost andimprove the effectiveness of intelligent tutoring software that aims to providethe right support, at the right time, to a student. Here we illustrate thatdeep reinforcement learning can be used to provide adaptive pedagogical supportto students learning about the concept of volume in a narrative storylinesoftware. Using explainable artificial intelligence tools, we extractedinterpretable insights about the pedagogical policy learned and demonstratedthat the resulting policy had similar performance in a different studentpopulation. Most importantly, in both studies, the reinforcement-learningnarrative system had the largest benefit for those students with the lowestinitial pretest scores, suggesting the opportunity for AI to adapt and providesupport for those most in need.", "output": "Reinforcement Learning Tutor Better Supported Lower Performers in a Math Task."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "ChatGPT has recently gathered attention from the general public and academiaas a tool that is able to generate plausible and human-sounding text answers tovarious questions. One potential use, or abuse, of ChatGPT is in answeringvarious questions or even generating whole essays and research papers in anacademic or classroom setting. While recent works have explored the use ofChatGPT in the context of humanities, business school, or medical school, thiswork explores how ChatGPT performs in the context of an introductory computerengineering course. This work assesses ChatGPT's aptitude in answering quizzes,homework, exam, and laboratory questions in an introductory-level computerengineering course. This work finds that ChatGPT can do well on questionsasking about generic concepts. However, predictably, as a text-only tool, itcannot handle questions with diagrams or figures, nor can it generate diagramsand figures. Further, also clearly, the tool cannot do hands-on labexperiments, breadboard assembly, etc., but can generate plausible answers tosome laboratory manual questions. One of the key observations presented in thiswork is that the ChatGPT tool could not be used to pass all components of thecourse. Nevertheless, it does well on quizzes and short-answer questions. Onthe other hand, plausible, human-sounding answers could confuse students whengenerating incorrect but still plausible answers.", "output": "Analyzing ChatGPT's Aptitude in an Introductory Computer Engineering Course."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spatial control is a core capability in controllable image generation.Advancements in layout-guided image generation have shown promising results onin-distribution (ID) datasets with similar spatial configurations. However, itis unclear how these models perform when facing out-of-distribution (OOD)samples with arbitrary, unseen layouts. In this paper, we propose LayoutBench,a diagnostic benchmark for layout-guided image generation that examines fourcategories of spatial control skills: number, position, size, and shape. Webenchmark two recent representative layout-guided image generation methods andobserve that the good ID layout control may not generalize well to arbitrarylayouts in the wild (e.g., objects at the boundary). Next, we proposeIterInpaint, a new baseline that generates foreground and background regions ina step-by-step manner via inpainting, demonstrating stronger generalizabilitythan existing models on OOD layouts in LayoutBench. We perform quantitative andqualitative evaluation and fine-grained analysis on the four LayoutBench skillsto pinpoint the weaknesses of existing models. Lastly, we show comprehensiveablation studies on IterInpaint, including training task ratio, crop&paste vs.repaint, and generation order. Project website: ", "output": "Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This chapter provides an introduction to Presentation Attack Detection (PAD)in fingerprint biometrics, also coined anti-spoofing, describes earlydevelopments in this field, and briefly summarizes recent trends and openissues.", "output": "Introduction to Presentation Attack Detection in Fingerprint Biometrics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dynamic neural networks can greatly reduce computation redundancy withoutcompromising accuracy by adapting their structures based on the input. In thispaper, we explore the robustness of dynamic neural networks againstenergy-oriented attacks targeted at reducing their efficiency. Specifically, weattack dynamic models with our novel algorithm GradMDM. GradMDM is a techniquethat adjusts the direction and the magnitude of the gradients to effectivelyfind a small perturbation for each input, that will activate more computationalunits of dynamic models during inference. We evaluate GradMDM on multipledatasets and dynamic models, where it outperforms previous energy-orientedattack techniques, significantly increasing computation complexity whilereducing the perceptibility of the perturbations.", "output": "GradMDM: Adversarial Attack on Dynamic Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Humans excel at continually acquiring, consolidating, and retaininginformation from an ever-changing environment, whereas artificial neuralnetworks (ANNs) exhibit catastrophic forgetting. There are considerabledifferences in the complexity of synapses, the processing of information, andthe learning mechanisms in biological neural networks and their artificialcounterparts, which may explain the mismatch in performance. We consider abiologically plausible framework that constitutes separate populations ofexclusively excitatory and inhibitory neurons that adhere to Dale's principle,and the excitatory pyramidal neurons are augmented with dendritic-likestructures for context-dependent processing of stimuli. We then conduct acomprehensive study on the role and interactions of different mechanismsinspired by the brain, including sparse non-overlapping representations,Hebbian learning, synaptic consolidation, and replay of past activations thataccompanied the learning event. Our study suggests that the employing ofmultiple complementary mechanisms in a biologically plausible architecture,similar to the brain, may be effective in enabling continual learning in ANNs.", "output": "A Study of Biologically Plausible Neural Network: The Role and Interactions of Brain-Inspired Mechanisms in Continual Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative foundation models are susceptible to implicit biases that canarise from extensive unsupervised training data. Such biases can producesuboptimal samples, skewed outcomes, and unfairness, with potentiallysignificant repercussions. Consequently, aligning these models with humanethics and preferences is an essential step toward ensuring their responsibleand effective deployment in real-world applications. Prior research hasprimarily employed Reinforcement Learning from Human Feedback (RLHF) as a meansof addressing this problem, wherein generative models are fine-tuned using RLalgorithms guided by a human-feedback-informed reward model. However, theinefficiencies and instabilities associated with RL algorithms frequentlypresent substantial obstacles to the successful alignment of generative models,necessitating the development of a more robust and streamlined approach. Tothis end, we introduce a new framework, Reward rAnked FineTuning (RAFT),designed to align generative models more effectively. Utilizing a reward modeland a sufficient number of samples, our approach selects the high-qualitysamples, discarding those that exhibit undesired behavior, and subsequentlyassembles a streaming dataset. This dataset serves as the basis for aligningthe generative model and can be employed under both offline and onlinesettings. Notably, the sample generation process within RAFT is gradient-free,rendering it compatible with black-box generators. Through extensiveexperiments, we demonstrate that our proposed algorithm exhibits strongperformance in the context of both large language models and diffusion models.", "output": "RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Point clouds offer comprehensive and precise data regarding the contour andconfiguration of objects. Employing such geometric and topological 3Dinformation of objects in class incremental learning can aid endlessapplication in 3D-computer vision. Well known 3D-point cloud class incrementallearning methods for addressing catastrophic forgetting generally entail theusage of previously encountered data, which can present difficulties insituations where there are restrictions on memory or when there are concernsabout the legality of the data. Towards this we pioneer to leverage exemplarfree class incremental learning on Point Clouds. In this paper we proposePointCLIMB: An exemplar Free Class Incremental Learning Benchmark. We focus ona pragmatic perspective to consider novel classes for class incrementallearning on 3D point clouds. We setup a benchmark for 3D Exemplar free classincremental learning. We investigate performance of various backbones on3D-Exemplar Free Class Incremental Learning framework. We demonstrate ourresults on ModelNet40 dataset.", "output": "PointCLIMB: An Exemplar-Free Point Cloud Class Incremental Benchmark."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Modern image inpainting systems, despite the significant progress, oftenstruggle with mask selection and holes filling. Based on Segment-Anything Model(SAM), we make the first attempt to the mask-free image inpainting and proposea new paradigm of ``clicking and filling'', which is named as Inpaint Anything(IA). The core idea behind IA is to combine the strengths of different modelsin order to build a very powerful and user-friendly pipeline for solvinginpainting-related problems. IA supports three main features: (i) RemoveAnything: users could click on an object and IA will remove it and smooth the``hole'' with the context; (ii) Fill Anything: after certain objects removal,users could provide text-based prompts to IA, and then it will fill the holewith the corresponding generative content via driving AIGC models like StableDiffusion; (iii) Replace Anything: with IA, users have another option to retainthe click-selected object and replace the remaining background with the newlygenerated scenes. We are also very willing to help everyone share and promotenew projects based on our Inpaint Anything (IA). Our codes are available at", "output": "Inpaint Anything: Segment Anything Meets Image Inpainting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large pre-trained models, also known as foundation models (FMs), are trainedin a task-agnostic manner on large-scale data and can be adapted to a widerange of downstream tasks by fine-tuning, few-shot, or even zero-shot learning.Despite their successes in language and vision tasks, we have yet seen anattempt to develop foundation models for geospatial artificial intelligence(GeoAI). In this work, we explore the promises and challenges of developingmultimodal foundation models for GeoAI. We first investigate the potential ofmany existing FMs by testing their performances on seven tasks across multiplegeospatial subdomains including Geospatial Semantics, Health Geography, UrbanGeography, and Remote Sensing. Our results indicate that on several geospatialtasks that only involve text modality such as toponym recognition, locationdescription recognition, and US state-level/county-level dementia time seriesforecasting, these task-agnostic LLMs can outperform task-specificfully-supervised models in a zero-shot or few-shot learning setting. However,on other geospatial tasks, especially tasks that involve multiple datamodalities (e.g., POI-based urban function classification, street viewimage-based urban noise intensity classification, and remote sensing imagescene classification), existing foundation models still underperformtask-specific models. Based on these observations, we propose that one of themajor challenges of developing a FM for GeoAI is to address the multimodalitynature of geospatial tasks. After discussing the distinct challenges of eachgeospatial data modality, we suggest the possibility of a multimodal foundationmodel which can reason over various types of geospatial data through geospatialalignments. We conclude this paper by discussing the unique risks andchallenges to develop such a model for GeoAI.", "output": "On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Out-of-distribution (OOD) detection aims to identify test examples that donot belong to the training distribution and are thus unlikely to be predictedreliably. Despite a plethora of existing works, most of them focused only onthe scenario where OOD examples come from semantic shift (e.g., unseencategories), ignoring other possible causes (e.g., covariate shift). In thispaper, we present a novel, unifying framework to study OOD detection in abroader scope. Instead of detecting OOD examples from a particular cause, wepropose to detect examples that a deployed machine learning model (e.g., animage classifier) is unable to predict correctly. That is, whether a testexample should be detected and rejected or not is ``model-specific''. We showthat this framework unifies the detection of OOD examples caused by semanticshift and covariate shift, and closely addresses the concern of applying amachine learning model to uncontrolled environments. We provide an extensiveanalysis that involves a variety of models (e.g., different architectures andtraining strategies), sources of OOD examples, and OOD detection approaches,and reveal several insights into improving and understanding OOD detection inuncontrolled environments.", "output": "Unified Out-Of-Distribution Detection: A Model-Specific Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a method for adding sound-guided visual effects to specificregions of videos with a zero-shot setting. Animating the appearance of thevisual effect is challenging because each frame of the edited video should havevisual changes while maintaining temporal consistency. Moreover, existing videoediting solutions focus on temporal consistency across frames, ignoring thevisual style variations over time, e.g., thunderstorm, wave, fire crackling. Toovercome this limitation, we utilize temporal sound features for the dynamicstyle. Specifically, we guide denoising diffusion probabilistic models with anaudio latent representation in the audio-visual latent space. To the best ofour knowledge, our work is the first to explore sound-guided natural videoediting from various sound sources with sound-specialized properties, such asintensity, timbre, and volume. Additionally, we design optical flow-basedguidance to generate temporally consistent video frames, capturing thepixel-wise relationship between adjacent frames. Experimental results show thatour method outperforms existing video editing techniques, producing morerealistic visual effects that reflect the properties of sound. Please visit ourpage: ", "output": "Soundini: Sound-Guided Diffusion for Natural Video Editing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Integrating whole-slide images (WSIs) and bulk transcriptomics for predictingpatient survival can improve our understanding of patient prognosis. However,this multimodal task is particularly challenging due to the different nature ofthese data: WSIs represent a very high-dimensional spatial description of atumor, while bulk transcriptomics represent a global description of geneexpression levels within that tumor. In this context, our work aims to addresstwo key challenges: (1) how can we tokenize transcriptomics in a semanticallymeaningful and interpretable way?, and (2) how can we capture dense multimodalinteractions between these two modalities? Specifically, we propose to learnbiological pathway tokens from transcriptomics that can encode specificcellular functions. Together with histology patch tokens that encode thedifferent morphological patterns in the WSI, we argue that they formappropriate reasoning units for downstream interpretability analyses. Wepropose fusing both modalities using a memory-efficient multimodal Transformerthat can model interactions between pathway and histology patch tokens. Ourproposed model, SURVPATH, achieves state-of-the-art performance when evaluatedagainst both unimodal and multimodal baselines on five datasets from The CancerGenome Atlas. Our interpretability framework identifies key multimodalprognostic factors, and, as such, can provide valuable insights into theinteraction between genotype and phenotype, enabling a deeper understanding ofthe underlying biological mechanisms at play. We make our code public at:", "output": "Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we tackle the problem of video alignment, the process ofmatching the frames of a pair of videos containing similar actions. The mainchallenge in video alignment is that accurate correspondence should beestablished despite the differences in the execution processes and appearancesbetween the two videos. We introduce an unsupervised method for alignment thatuses global and local features of the frames. In particular, we introduceeffective features for each video frame by means of three machine vision tools:person detection, pose estimation, and VGG network. Then the features areprocessed and combined to construct a multidimensional time series thatrepresent the video. The resulting time series are used to align videos of thesame actions using a novel version of dynamic time warping named DiagonalizedDynamic Time Warping(DDTW). The main advantage of our approach is that notraining is required, which makes it applicable for any new type of actionwithout any need to collect training samples for it. For evaluation, weconsidered video synchronization and phase classification tasks on the Pennaction dataset. Also, for an effective evaluation of the video synchronizationtask, we present a new metric called Enclosed Area Error(EAE). The results showthat our method outperforms previous state-of-the-art methods, such as TCC andother self-supervised and supervised methods.", "output": "Video alignment using unsupervised learning of local and global features."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work proposes a hybrid unsupervised/supervised learning method topretrain models applied in earth observation downstream tasks where only ahandful of labels denoting very general semantic concepts are available. Wecombine a contrastive approach to pretrain models with a pretext task topredict spatially coarse elevation maps which are commonly available worldwide.The intuition behind is that there is generally some correlation between theelevation and targets in many remote sensing tasks, allowing the model topre-learn useful representations. We assess the performance of our approach ona segmentation downstream task on labels gathering many possible subclasses(pixel level classification of farmlands vs. other) and an image binaryclassification task derived from the former, on a dataset on the north-east ofColombia. On both cases we pretrain our models with 39K unlabeled images, finetune the downstream task only with 80 labeled images and test it with 2944labeled images. Our experiments show that our methods, GLCNet+Elevation forsegmentation and SimCLR+Elevation for classification, outperform theircounterparts without the elevation pretext task in terms of accuracy andmacro-average F1, which supports the notion that including additionalinformation correlated to targets in downstream tasks can lead to improvedperformance.", "output": "A contrastive method based on elevation data for remote sensing with scarce and high level semantic labels."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a new algorithm for selection of informative frames in videoaction recognition. Our approach is designed for aerial videos captured using amoving camera where human actors occupy a small spatial resolution of videoframes. Our algorithm utilizes the motion bias within aerial videos, whichenables the selection of motion-salient frames. We introduce the concept ofpatch mutual information (PMI) score to quantify the motion bias betweenadjacent frames, by measuring the similarity of patches. We use this score toassess the amount of discriminative motion information contained in one framerelative to another. We present an adaptive frame selection strategy usingshifted leaky ReLu and cumulative distribution function, which ensures that thesampled frames comprehensively cover all the essential segments with highmotion salience. Our approach can be integrated with any action recognitionmodel to enhance its accuracy. In practice, our method achieves a relativeimprovement of 2.2 - 13.8% in top-1 accuracy on UAV-Human, 6.8% on NEC Drone,and 9.0% on Diving48 datasets.", "output": "PMI Sampler: Patch similarity guided frame selection for Aerial Action Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advancements in language-image models have led to the development ofhighly realistic images that can be generated from textual descriptions.However, the increased visual quality of these generated images poses apotential threat to the field of media forensics. This paper aims toinvestigate the level of challenge that language-image generation models poseto media forensics. To achieve this, we propose a new approach that leveragesthe DALL-E2 language-image model to automatically generate and splice maskedregions guided by a text prompt. To ensure the creation of realisticmanipulations, we have designed an annotation platform with human checking toverify reasonable text prompts. This approach has resulted in the creation of anew image dataset called AutoSplice, containing 5,894 manipulated and authenticimages. Specifically, we have generated a total of 3,621 images by locally orglobally manipulating real-world image-caption pairs, which we believe willprovide a valuable resource for developing generalized detection methods inthis area. The dataset is evaluated under two media forensic tasks: forgerydetection and localization. Our extensive experiments show that most mediaforensic models struggle to detect the AutoSplice dataset as an unseenmanipulation. However, when fine-tuned models are used, they exhibit improvedperformance in both tasks.", "output": "AutoSplice: A Text-prompt Manipulated Image Dataset for Media Forensics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "High-resolution satellite imagery is a key element for many Earth monitoringapplications. Satellites such as Sentinel-2 feature characteristics that arefavorable for super-resolution algorithms such as aliasing andband-misalignment. Unfortunately the lack of reliable high-resolution (HR)ground truth limits the application of deep learning methods to this task. Inthis work we propose L1BSR, a deep learning-based method for single-imagesuper-resolution and band alignment of Sentinel-2 L1B 10m bands. The method istrained with self-supervision directly on real L1B data by leveragingoverlapping areas in L1B images produced by adjacent CMOS detectors, thus notrequiring HR ground truth. Our self-supervised loss is designed to enforce thesuper-resolved output image to have all the bands correctly aligned. This isachieved via a novel cross-spectral registration network (CSR) which computesan optical flow between images of different spectral bands. The CSR network isalso trained with self-supervision using an Anchor-Consistency loss, which wealso introduce in this work. We demonstrate the performance of the proposedapproach on synthetic and real L1B data, where we show that it obtainscomparable results to supervised methods.", "output": "L1BSR: Exploiting Detector Overlap for Self-Supervised Single-Image Super-Resolution of Sentinel-2 L1B Imagery."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this age of information, images are a critical medium for storing andtransmitting information. With the rapid growth of image data amount, visualcompression and visual data perception are two important research topicsattracting a lot attention. However, those two topics are rarely discussedtogether and follow separate research path. Due to the compact compresseddomain representation offered by learning-based image compression methods,there exists possibility to have one stream targeting both efficient datastorage and compression, and machine perception tasks. In this paper, wepropose a layered generative image compression model achieving high humanvision-oriented image reconstructed quality, even at extreme compressionratios. To obtain analysis efficiency and flexibility, a task-agnosticlearning-based compression model is proposed, which effectively supportsvarious compressed domain-based analytical tasks while reserves outstandingreconstructed perceptual quality, compared with traditional and learning-basedcodecs. In addition, joint optimization schedule is adopted to acquire bestbalance point among compression ratio, reconstructed image quality, anddownstream perception performance. Experimental results verify that ourproposed compressed domain-based multi-task analysis method can achievecomparable analysis results against the RGB image-based methods with up to99.6% bit rate saving (i.e., compared with taking original RGB image as theanalysis model input). The practical ability of our model is further justifiedfrom model size and information fidelity aspects.", "output": "Machine Perception-Driven Image Compression: A Layered Generative Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Pretrained backbones with fine-tuning have been widely adopted in 2D visionand natural language processing tasks and demonstrated significant advantagesto task-specific networks. In this paper, we present a pretrained 3D backbone,named {SST}, which first outperforms all state-of-the-art methods indownstream 3D indoor scene understanding tasks. Our backbone network is basedon a 3D Swin transformer and carefully designed to efficiently conductself-attention on sparse voxels with linear memory complexity and capture theirregularity of point signals via generalized contextual relative positionalembedding. Based on this backbone design, we pretrained a large {SST} model ona synthetic Structed3D dataset that is 10 times larger than the ScanNet datasetand fine-tuned the pretrained model in various downstream real-world indoorscene understanding tasks. The results demonstrate that our model pretrained onthe synthetic dataset not only exhibits good generality in both downstreamsegmentation and detection on real 3D point datasets, but also surpasses thestate-of-the-art methods on downstream tasks after fine-tuning with +2.3 mIoUand +2.2 mIoU on S3DIS Area5 and 6-fold semantic segmentation, +2.1 mIoU onScanNet segmentation (val), +1.9 mAP@0.5 on ScanNet detection, +8.1 mAP@0.5 onS3DIS detection. Our method demonstrates the great potential of pretrained 3Dbackbones with fine-tuning for 3D understanding tasks. The code and models areavailable at .", "output": "Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In most image retrieval systems, images include various high-level semantics,called tags or annotations. Virtually all the state-of-the-art image annotationmethods that handle imbalanced labeling are search-based techniques which aretime-consuming. In this paper, a novel coupled dictionary learning approach isproposed to learn a limited number of visual prototypes and their correspondingsemantics simultaneously. This approach leads to a real-time image annotationprocedure. Another contribution of this paper is that utilizes a marginalizedloss function instead of the squared loss function that is inappropriate forimage annotation with imbalanced labels. We have employed a marginalized lossfunction in our method to leverage a simple and effective method of prototypeupdating. Meanwhile, we have introduced ${ell}_1$ regularization on semanticprototypes to preserve the sparse and imbalanced nature of labels in learnedsemantic prototypes. Finally, comprehensive experimental results on variousdatasets demonstrate the efficiency of the proposed method for image annotationtasks in terms of accuracy and time. The reference implementation is publiclyavailable on ", "output": "Toward Real-Time Image Annotation Using Marginalized Coupled Dictionary Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks (DNNs) have been shown to be vulnerable to adversarialexamples. Moreover, the transferability of the adversarial examples hasreceived broad attention in recent years, which means that adversarial examplescrafted by a surrogate model can also attack unknown models. This phenomenongave birth to the transfer-based adversarial attacks, which aim to improve thetransferability of the generated adversarial examples. In this paper, wepropose to improve the transferability of adversarial examples in thetransfer-based attack via masking unimportant parameters (MUP). The key idea inMUP is to refine the pretrained surrogate models to boost the transfer-basedattack. Based on this idea, a Taylor expansion-based metric is used to evaluatethe parameter importance score and the unimportant parameters are masked duringthe generation of adversarial examples. This process is simple, yet can benaturally combined with various existing gradient-based optimizers forgenerating adversarial examples, thus further improving the transferability ofthe generated adversarial examples. Extensive experiments are conducted tovalidate the effectiveness of the proposed MUP-based methods.", "output": "Generating Adversarial Examples with Better Transferability via Masking Unimportant Parameters of Surrogate Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Masked autoencoders (MAE) have recently been introduced to 3D self-supervisedpretraining for point clouds due to their great success in NLP and computervision. Unlike MAEs used in the image domain, where the pretext task is torestore features at the masked pixels, such as colors, the existing 3D MAEworks reconstruct the missing geometry only, i.e, the location of the maskedpoints. In contrast to previous studies, we advocate that point locationrecovery is inessential and restoring intrinsic point features is muchsuperior. To this end, we propose to ignore point position reconstruction andrecover high-order features at masked points including surface normals andsurface variations, through a novel attention-based decoder which isindependent of the encoder design. We validate the effectiveness of our pretexttask and decoder design using different encoder structures for 3D training anddemonstrate the advantages of our pretrained networks on various point cloudanalysis tasks.", "output": "3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generating a high-quality High Dynamic Range (HDR) image from dynamic sceneshas recently been extensively studied by exploiting Deep Neural Networks(DNNs). Most DNNs-based methods require a large amount of training data withground truth, requiring tedious and time-consuming work. Few-shot HDR imagingaims to generate satisfactory images with limited data. However, it isdifficult for modern DNNs to avoid overfitting when trained on only a fewimages. In this work, we propose a novel semi-supervised approach to realizefew-shot HDR imaging via two stages of training, called SSHDR. Unlikelyprevious methods, directly recovering content and removing ghostssimultaneously, which is hard to achieve optimum, we first generate content ofsaturated regions with a self-supervised mechanism and then address ghosts viaan iterative semi-supervised learning framework. Concretely, considering thatsaturated regions can be regarded as masking Low Dynamic Range (LDR) inputregions, we design a Saturated Mask AutoEncoder (SMAE) to learn a robustfeature representation and reconstruct a non-saturated HDR image. We alsopropose an adaptive pseudo-label selection strategy to pick high-quality HDRpseudo-labels in the second stage to avoid the effect of mislabeled samples.Experiments demonstrate that SSHDR outperforms state-of-the-art methodsquantitatively and qualitatively within and across different datasets,achieving appealing HDR visualization with few labeled samples.", "output": "SMAE: Few-shot Learning for HDR Deghosting with Saturation-Aware Masked Autoencoders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The goal of human stylization is to transfer full-body human photos to astyle specified by a single art character reference image. Although previouswork has succeeded in example-based stylization of faces and generic scenes,full-body human stylization is a more complex domain. This work addressesseveral unique challenges of stylizing full-body human images. We propose amethod for one-shot fine-tuning of a pose-guided human generator to preservethe \"content\" (garments, face, hair, pose) of the input photo and the \"style\"of the artistic reference. Since body shape deformation is an essentialcomponent of an art character's style, we incorporate a novel skeletondeformation module to reshape the pose of the input person and modify the DiOrpose-guided person generator to be more robust to the rescaled poses fallingoutside the distribution of the realistic poses that the generator isoriginally trained on. Several human studies verify the effectiveness of ourapproach.", "output": "One-Shot Stylization for Full-Body Human Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While having achieved great success in rich real-life applications, deepneural network (DNN) models have long been criticized for their vulnerabilityto adversarial attacks. Tremendous research efforts have been dedicated tomitigating the threats of adversarial attacks, but the essential trait ofadversarial examples is not yet clear, and most existing methods are yetvulnerable to hybrid attacks and suffer from counterattacks. In light of this,in this paper, we first reveal a gradient-based correlation between sensitivityanalysis-based DNN interpreters and the generation process of adversarialexamples, which indicates the Achilles's heel of adversarial attacks and shedslight on linking together the two long-standing challenges of DNN: fragilityand unexplainability. We then propose an interpreter-based ensemble frameworkcalled X-Ensemble for robust adversary defense. X-Ensemble adopts a noveldetection-rectification process and features in building multiple sub-detectorsand a rectifier upon various types of interpretation information toward targetclassifiers. Moreover, X-Ensemble employs the Random Forests (RF) model tocombine sub-detectors into an ensemble detector for adversarial hybrid attacksdefense. The non-differentiable property of RF further makes it a preciouschoice against the counterattack of adversaries. Extensive experiments undervarious types of state-of-the-art attacks and diverse attack scenariosdemonstrate the advantages of X-Ensemble to competitive baseline methods.", "output": "Interpretability is a Kind of Safety: An Interpreter-based Ensemble for Adversary Defense."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unmanned Aerial Vehicles (UAVs), specifically drones equipped with remotesensing object detection technology, have rapidly gained a broad spectrum ofapplications and emerged as one of the primary research focuses in the field ofcomputer vision. Although UAV remote sensing systems have the ability to detectvarious objects, small-scale objects can be challenging to detect reliably dueto factors such as object size, image degradation, and real-time limitations.To tackle these issues, a real-time object detection algorithm (YOLO-Drone) isproposed and applied to two new UAV platforms as well as a specific lightsource (silicon-based golden LED). YOLO-Drone presents several novelties: 1)including a new backbone Darknet59; 2) a new complex feature aggregation moduleMSPP-FPN that incorporated one spatial pyramid pooling and three atrous spatialpyramid pooling modules; 3) and the use of Generalized Intersection over Union(GIoU) as the loss function. To evaluate performance, two benchmark datasets,UAVDT and VisDrone, along with one homemade dataset acquired at night undersilicon-based golden LEDs, are utilized. The experimental results show that, inboth UAVDT and VisDrone, the proposed YOLO-Drone outperforms state-of-the-art(SOTA) object detection methods by improving the mAP of 10.13% and 8.59%,respectively. With regards to UAVDT, the YOLO-Drone exhibits both highreal-time inference speed of 53 FPS and a maximum mAP of 34.04%. Notably,YOLO-Drone achieves high performance under the silicon-based golden LEDs, witha mAP of up to 87.71%, surpassing the performance of YOLO series under ordinarylight sources. To conclude, the proposed YOLO-Drone is a highly effectivesolution for object detection in UAV applications, particularly for nightdetection tasks where silicon-based golden light LED technology exhibitssignificant superiority.", "output": "YOLO-Drone:Airborne real-time detection of dense small objects from high-altitude perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We tackle the issue of generalized category discovery (GCD). GCD considersthe open-world problem of automatically clustering a partially labelleddataset, in which the unlabelled data contain instances from novel categoriesand also the labelled classes. In this paper, we address the GCD problemwithout a known category number in the unlabelled data. We propose a framework,named CiPR, to bootstrap the representation by exploiting Cross-instancePositive Relations for contrastive learning in the partially labelled datawhich are neglected in existing methods. First, to obtain reliablecross-instance relations to facilitate the representation learning, weintroduce a semi-supervised hierarchical clustering algorithm, named selectiveneighbor clustering (SNC), which can produce a clustering hierarchy directlyfrom the connected components in the graph constructed by selective neighbors.We also extend SNC to be capable of label assignment for the unlabelledinstances with the given class number. Moreover, we present a method toestimate the unknown class number using SNC with a joint reference scoreconsidering clustering indexes of both labelled and unlabelled data. Finally,we thoroughly evaluate our framework on public generic image recognitiondatasets and challenging fine-grained datasets, all establishing the newstate-of-the-art.", "output": "CiPR: An Efficient Framework with Cross-instance Positive Relations for Generalized Category Discovery."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Scene Dynamic Recovery (SDR) by inverting distorted Rolling Shutter (RS)images to an undistorted high frame-rate Global Shutter (GS) video is aseverely ill-posed problem, particularly when prior knowledge aboutcamera/object motions is unavailable. Commonly used artificial assumptions onmotion linearity and data-specific characteristics, regarding the temporaldynamics information embedded in the RS scanlines, are prone to producingsub-optimal solutions in real-world scenarios. To address this challenge, wepropose an event-based RS2GS framework within a self-supervised learningparadigm that leverages the extremely high temporal resolution of event camerasto provide accurate inter/intra-frame information. % In this paper, we proposeto leverage the event camera to provide inter/intra-frame information as theemitted events have an extremely high temporal resolution and learn anevent-based RS2GS network within a self-supervised learning framework, wherereal-world events and RS images can be exploited to alleviate the performancedegradation caused by the domain gap between the synthesized and real data.Specifically, an Event-based Inter/intra-frame Compensator (E-IC) is proposedto predict the per-pixel dynamic between arbitrary time intervals, includingthe temporal transition and spatial translation. Exploring connections in termsof RS-RS, RS-GS, and GS-RS, we explicitly formulate mutual constraints with theproposed E-IC, resulting in supervisions without ground-truth GS images.Extensive evaluations over synthetic and real datasets demonstrate that theproposed method achieves state-of-the-art and shows remarkable performance forevent-based RS2GS inversion in real-world scenarios. The dataset and code areavailable at ", "output": "Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) has been introduced to the healthcare domain as adecentralized learning paradigm that allows multiple parties to train a modelcollaboratively without privacy leakage. However, most previous studies haveassumed that every client holds an identical label set. In reality, medicalspecialists tend to annotate only diseases within their knowledge domain orinterest. This implies that label sets in each client can be different and evendisjoint. In this paper, we propose the framework FedLSM to solve the problemLabel Set Mismatch. FedLSM adopts different training strategies on data withdifferent uncertainty levels to efficiently utilize unlabeled or partiallylabeled data as well as class-wise adaptive aggregation in the classificationlayer to avoid inaccurate aggregation when clients have missing labels. Weevaluate FedLSM on two public real-world medical image datasets, includingchest x-ray (CXR) diagnosis with 112,120 CXR images and skin lesion diagnosiswith 10,015 dermoscopy images, and show that it significantly outperforms otherstate-of-the-art FL algorithms. Code will be made available upon acceptance.", "output": "Scale Federated Learning for Label Set Mismatch in Medical Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Animating an object in 3D often requires an articulated structure, e.g. akinematic chain or skeleton of the manipulated object with proper skinningweights, to obtain smooth movements and surface deformations. However, existingmodels that allow direct pose manipulations are either limited to specificobject categories or built with specialized equipment. To reduce the workneeded for creating animatable 3D models, we propose a novel reconstructionmethod that learns an animatable kinematic chain for any articulated object.Our method operates on monocular videos without prior knowledge of the object'sshape or underlying structure. Our approach is on par with state-of-the-art 3Dsurface reconstruction methods on various articulated object categories whileenabling direct pose manipulations by re-posing the learned kinematic chain.", "output": "CAMM: Building Category-Agnostic and Animatable 3D Models from Monocular Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In-context vision and language models like Flamingo support arbitrarilyinterleaved sequences of images and text as input. This format not only enablesfew-shot learning via interleaving independent supervised (image, text)examples, but also, more complex prompts involving interaction between images,e.g., \"What do image A and image B have in common?\" To support this interface,pretraining occurs over web corpora that similarly contain interleavedimages+text. To date, however, large-scale data of this form have not beenpublicly available.We release Multimodal C4 (mmc4), an augmentation of the popular text-only c4corpus with images interleaved. We use a linear assignment algorithm to placeimages into longer bodies of text using CLIP features, a process that we showoutperforms alternatives. mmc4 spans everyday topics like cooking, travel,technology, etc. A manual inspection of a random sample of documents shows thata vast majority (90%) of images are topically relevant, and that linearassignment frequently selects individual sentences specifically well-alignedwith each image (78%). After filtering NSFW images, ads, etc., the corpuscontains 103M documents containing 585M images interleaved with 43B Englishtokens.", "output": "Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Mapping Low Dynamic Range (LDR) images with different exposures to HighDynamic Range (HDR) remains nontrivial and challenging on dynamic scenes due toghosting caused by object motion or camera jitting. With the success of DeepNeural Networks (DNNs), several DNNs-based methods have been proposed toalleviate ghosting, they cannot generate approving results when motion andsaturation occur. To generate visually pleasing HDR images in various cases, wepropose a hybrid HDR deghosting network, called HyHDRNet, to learn thecomplicated relationship between reference and non-reference images. Theproposed HyHDRNet consists of a content alignment subnetwork and aTransformer-based fusion subnetwork. Specifically, to effectively avoidghosting from the source, the content alignment subnetwork uses patchaggregation and ghost attention to integrate similar content from othernon-reference images with patch level and suppress undesired components withpixel level. To achieve mutual guidance between patch-level and pixel-level, weleverage a gating module to sufficiently swap useful information both inghosted and saturated regions. Furthermore, to obtain a high-quality HDR image,the Transformer-based fusion subnetwork uses a Residual Deformable TransformerBlock (RDTB) to adaptively merge information for different exposed regions. Weexamined the proposed method on four widely used public HDR image deghostingdatasets. Experiments demonstrate that HyHDRNet outperforms state-of-the-artmethods both quantitatively and qualitatively, achieving appealing HDRvisualization with unified textures and colors.", "output": "A Unified HDR Imaging Method with Pixel and Patch Level."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reconstructing an image from noisy and incomplete measurements is a centraltask in several image processing applications. In recent years,state-of-the-art reconstruction methods have been developed based on recentadvances in deep learning. Especially for highly underdetermined problems,maintaining data consistency is a key goal. This can be achieved either byiterative network architectures or by a subsequent projection of the networkreconstruction. However, for such approaches to be used in safety-criticaldomains such as medical imaging, the network reconstruction should not onlyprovide the user with a reconstructed image, but also with some level ofconfidence in the reconstruction. In order to meet these two key requirements,this paper combines deep null-space networks with uncertainty quantification.Evaluation of the proposed method includes image reconstruction fromundersampled Radon measurements on a toy CT dataset and accelerated MRIreconstruction on the fastMRI dataset. This work is the first approach tosolving inverse problems that additionally models data-dependent uncertainty byestimating an input-dependent scale map, providing a robust assessment ofreconstruction quality.", "output": "Uncertainty-Aware Null Space Networks for Data-Consistent Image Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "CLIP (Contrastive Language-Image Pretraining) is well-developed foropen-vocabulary zero-shot image-level recognition, while its applications inpixel-level tasks are less investigated, where most efforts directly adopt CLIPfeatures without deliberative adaptations. In this work, we first demonstratethe necessity of image-pixel CLIP feature adaption, then provide Multi-ViewPrompt learning (MVP-SEG) as an effective solution to achieve image-pixeladaptation and to solve open-vocabulary semantic segmentation. Concretely,MVP-SEG deliberately learns multiple prompts trained by our OrthogonalConstraint Loss (OCLoss), by which each prompt is supervised to exploit CLIPfeature on different object parts, and collaborative segmentation masksgenerated by all prompts promote better segmentation. Moreover, MVP-SEGintroduces Global Prompt Refining (GPR) to further eliminate class-wisesegmentation noise. Experiments show that the multi-view prompts learned fromseen categories have strong generalization to unseen categories, and MVP-SEG+which combines the knowledge transfer stage significantly outperforms previousmethods on several benchmarks. Moreover, qualitative results justify thatMVP-SEG does lead to better focus on different local parts.", "output": "MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Depth Estimation has wide reaching applications in the field of Computervision such as target tracking, augmented reality, and self-driving cars. Thegoal of Monocular Depth Estimation is to predict the depth map, given a 2Dmonocular RGB image as input. The traditional depth estimation methods arebased on depth cues and used concepts like epipolar geometry. With theevolution of Convolutional Neural Networks, depth estimation has undergonetremendous strides. In this project, our aim is to explore possible extensionsto existing SoTA Deep Learning based Depth Estimation Models and to see whetherperformance metrics could be further improved. In a broader sense, we arelooking at the possibility of implementing Pose Estimation, Efficient Sub-PixelConvolution Interpolation, Semantic Segmentation Estimation techniques tofurther enhance our proposed architecture and to provide fine-grained and moreglobally coherent depth map predictions. We also plan to do away with cameraintrinsic parameters during training and apply weather augmentations to furthergeneralize our model.", "output": "Self-Supervised Learning based Depth Estimation from Monocular Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The limited ability of Convolutional Neural Networks to generalize to imagesfrom previously unseen domains is a major limitation, in particular, forsafety-critical clinical tasks such as dermoscopic skin cancer classification.In order to translate CNN-based applications into the clinic, it is essentialthat they are able to adapt to domain shifts. Such new conditions can arisethrough the use of different image acquisition systems or varying lightingconditions. In dermoscopy, shifts can also occur as a change in patient age oroccurence of rare lesion localizations (e.g. palms). These are not prominentlyrepresented in most training datasets and can therefore lead to a decrease inperformance. In order to verify the generalizability of classification modelsin real world clinical settings it is crucial to have access to data whichmimics such domain shifts. To our knowledge no dermoscopic image dataset existswhere such domain shifts are properly described and quantified. We thereforegrouped publicly available images from ISIC archive based on their metadata(e.g. acquisition location, lesion localization, patient age) to generatemeaningful domains. To verify that these domains are in fact distinct, we usedmultiple quantification measures to estimate the presence and intensity ofdomain shifts. Additionally, we analyzed the performance on these domains withand without an unsupervised domain adaptation technique. We observed that inmost of our grouped domains, domain shifts in fact exist. Based on our results,we believe these datasets to be helpful for testing the generalizationcapabilities of dermoscopic skin cancer classifiers.", "output": "Domain shifts in dermoscopic skin cancer datasets: Evaluation of essential limitations for clinical translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural radiance field (NeRF) has become a popular 3D representation methodfor human avatar reconstruction due to its high-quality rendering capabilities,e.g., regarding novel views and poses. However, previous methods for editingthe geometry and appearance of the avatar only allow for global editing throughbody shape parameters and 2D texture maps. In this paper, we propose a newapproach named textbf{U}nified textbf{V}olumetric textbf{A}vatar(textbf{UVA}) that enables local and independent editing of both geometry andtexture, while retaining the ability to render novel views and poses. UVAtransforms each observation point to a canonical space using a skinning motionfield and represents geometry and texture in separate neural fields. Each fieldis composed of a set of structured latent codes that are attached to anchornodes on a deformable mesh in canonical space and diffused into the entirespace via interpolation, allowing for local editing. To address spatialambiguity in code interpolation, we use a local signed height indicator. Wealso replace the view-dependent radiance color with a pose-dependent shadingfactor to better represent surface illumination in different poses. Experimentson multiple human avatars demonstrate that our UVA achieves competitive resultsin novel view synthesis and novel pose rendering while enabling local andindependent editing of geometry and appearance. The source code will bereleased.", "output": "UVA: Towards Unified Volumetric Avatar for View Synthesis, Pose rendering, Geometry and Texture Editing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning new classes without forgetting is crucial for real-worldapplications for a classification model. Vision Transformers (ViT) recentlyachieve remarkable performance in Class Incremental Learning (CIL). Previousworks mainly focus on block design and model expansion for ViTs. However, inthis paper, we find that when the ViT is incrementally trained, the attentionlayers gradually lose concentration on local features. We call this interestingphenomenon as emph{Locality Degradation} in ViTs for CIL. Since the low-levellocal information is crucial to the transferability of the representation, itis beneficial to preserve the locality in attention layers. In this paper, weencourage the model to preserve more local information as the trainingprocedure goes on and devise a Locality-Preserved Attention (LPA) layer toemphasize the importance of local features. Specifically, we incorporate thelocal information directly into the vanilla attention and control the initialgradients of the vanilla attention by weighting it with a small initial value.Extensive experiments show that the representations facilitated by LPA capturemore low-level general information which is easier to transfer to follow-uptasks. The improved model gets consistently better performance on CIFAR100 andImageNet100.", "output": "Preserving Locality in Vision Transformers for Class Incremental Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we study a real-world JPEG image restoration problem with biterrors on the encrypted bitstream. The bit errors bring unpredictable colorcasts and block shifts on decoded image contents, which cannot be resolved byexisting image restoration methods mainly relying on pre-defined degradationmodels in the pixel domain. To address these challenges, we propose a robustJPEG decoder, followed by a two-stage compensation and alignment framework torestore bitstream-corrupted JPEG images. Specifically, the robust JPEG decoderadopts an error-resilient mechanism to decode the corrupted JPEG bitstream. Thetwo-stage framework is composed of the self-compensation and alignment (SCA)stage and the guided-compensation and alignment (GCA) stage. The SCA adaptivelyperforms block-wise image color compensation and alignment based on theestimated color and block offsets via image content similarity. The GCAleverages the extracted low-resolution thumbnail from the JPEG header to guidefull-resolution pixel-wise image restoration in a coarse-to-fine manner. It isachieved by a coarse-guided pix2pix network and a refine-guided bi-directionalLaplacian pyramid fusion network. We conduct experiments on three benchmarkswith varying degrees of bit error rates. Experimental results and ablationstudies demonstrate the superiority of our proposed method. The code will bereleased at ", "output": "Bitstream-Corrupted JPEG Images are Restorable: Two-stage Compensation and Alignment Framework for Image Restoration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we realize automatic visual recognition and directionestimation of pointing. We introduce the first neural pointing understandingmethod based on two key contributions. The first is the introduction of afirst-of-its-kind large-scale dataset for pointing recognition and directionestimation, which we refer to as the DP Dataset. DP Dataset consists of morethan 2 million frames of over 33 people pointing in various styles annotatedfor each frame with pointing timings and 3D directions. The second is DeePoint,a novel deep network model for joint recognition and 3D direction estimation ofpointing. DeePoint is a Transformer-based network which fully leverages thespatio-temporal coordination of the body parts, not just the hands. Throughextensive experiments, we demonstrate the accuracy and efficiency of DeePoint.We believe DP Dataset and DeePoint will serve as a sound foundation for visualhuman intention understanding.", "output": "DeePoint: Pointing Recognition and Direction Estimation From A Fixed View."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "File fragment classification (FFC) on small chunks of memory is essential inmemory forensics and Internet security. Existing methods mainly treat filefragments as 1d byte signals and utilize the captured inter-byte features forclassification, while the bit information within bytes, i.e., intra-byteinformation, is seldom considered. This is inherently inapt for classifyingvariable-length coding files whose symbols are represented as the variablenumber of bits. Conversely, we propose Byte2Image, a novel data augmentationtechnique, to introduce the neglected intra-byte information into filefragments and re-treat them as 2d gray-scale images, which allows us to captureboth inter-byte and intra-byte correlations simultaneously through powerfulconvolutional neural networks (CNNs). Specifically, to convert file fragmentsto 2d images, we employ a sliding byte window to expose the neglectedintra-byte information and stack their n-gram features row by row. We furtherpropose a byte sequence & image fusion network as a classifier, which canjointly model the raw 1d byte sequence and the converted 2d image to performFFC. Experiments on FFT-75 dataset validate that our proposed method canachieve notable accuracy improvements over state-of-the-art methods in nearlyall scenarios. The code will be released at", "output": "A Byte Sequence is Worth an Image: CNN for File Fragment Classification Using Bit Shift and n-Gram Embeddings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Efficient deep learning-based approaches have achieved remarkable performancein single image super-resolution. However, recent studies on efficientsuper-resolution have mainly focused on reducing the number of parameters andfloating-point operations through various network designs. Although thesemethods can decrease the number of parameters and floating-point operations,they may not necessarily reduce actual running time. To address this issue, wepropose a novel multi-stage lightweight network boosting method, which canenable lightweight networks to achieve outstanding performance. Specifically,we leverage enhanced high-resolution output as additional supervision toimprove the learning ability of lightweight student networks. Upon convergenceof the student network, we further simplify our network structure to a morelightweight level using reparameterization techniques and iterative networkpruning. Meanwhile, we adopt an effective lightweight network training strategythat combines multi-anchor distillation and progressive learning, enabling thelightweight network to achieve outstanding performance. Ultimately, ourproposed method achieves the fastest inference time among all participants inthe NTIRE 2023 efficient super-resolution challenge while maintainingcompetitive super-resolution performance. Additionally, extensive experimentsare conducted to demonstrate the effectiveness of the proposed components. Theresults show that our approach achieves comparable performance inrepresentative dataset DIV2K, both qualitatively and quantitatively, withfaster inference and fewer number of network parameters.", "output": "DIPNet: Efficiency Distillation and Iterative Pruning for Image Super-Resolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The exploitation of visible spectrum datasets has led deep networks to showremarkable success. However, real-world tasks include low-lighting conditionswhich arise performance bottlenecks for models trained on large-scale RGB imagedatasets. Thermal IR cameras are more robust against such conditions.Therefore, the usage of thermal imagery in real-world applications can beuseful. Unsupervised domain adaptation (UDA) allows transferring informationfrom a source domain to a fully unlabeled target domain. Despite substantialimprovements in UDA, the performance gap between UDA and its supervisedlearning counterpart remains significant. By picking a small number of targetsamples to annotate and using them in training, active domain adaptation triesto mitigate this gap with minimum annotation expense. We propose an activedomain adaptation method in order to examine the efficiency of combining thevisible spectrum and thermal imagery modalities. When the domain gap isconsiderably large as in the visible-to-thermal task, we may conclude that themethods without explicit domain alignment cannot achieve their full potential.To this end, we propose a spectral transfer guided active domain adaptationmethod to select the most informative unlabeled target samples while aligningsource and target domains. We used the large-scale visible spectrum datasetMS-COCO as the source domain and the thermal dataset FLIR ADAS as the targetdomain to present the results of our method. Extensive experimental evaluationdemonstrates that our proposed method outperforms the state-of-the-art activedomain adaptation methods. The code and models are publicly available.", "output": "Spectral Transfer Guided Active Domain Adaptation For Thermal Imagery."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Ultrasound is the primary modality to examine fetal growth during pregnancy,while the image quality could be affected by various factors. Qualityassessment is essential for controlling the quality of ultrasound images toguarantee both the perceptual and diagnostic values. Existing automatedapproaches often require heavy structural annotations and the predictions maynot necessarily be consistent with the assessment results by human experts.Furthermore, the overall quality of a scan and the correlation between thequality of frames should not be overlooked. In this work, we propose areinforcement learning framework powered by two hierarchical agents thatcollaboratively learn to perform both frame-level and video-level qualityassessments. It is equipped with a specially-designed reward mechanism thatconsiders temporal dependency among frame quality and only requires sparsebinary annotations to train. Experimental results on a challenging fetal braindataset verify that the proposed framework could perform dual-level qualityassessment and its predictions correlate well with the subjective assessmentresults.", "output": "Hierarchical Agent-based Reinforcement Learning Framework for Automated Quality Assessment of Fetal Ultrasound Video."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Low-light image enhancement (LLIE) investigates how to improve illuminationand produce normal-light images. The majority of existing methods improvelow-light images via a global and uniform manner, without taking into accountthe semantic information of different regions. Without semantic priors, anetwork may easily deviate from a region's original color. To address thisissue, we propose a novel semantic-aware knowledge-guided framework (SKF) thatcan assist a low-light enhancement model in learning rich and diverse priorsencapsulated in a semantic segmentation model. We concentrate on incorporatingsemantic knowledge from three key aspects: a semantic-aware embedding modulethat wisely integrates semantic priors in feature representation space, asemantic-guided color histogram loss that preserves color consistency ofvarious instances, and a semantic-guided adversarial loss that produces morenatural textures by semantic priors. Our SKF is appealing in acting as ageneral framework in LLIE task. Extensive experiments show that models equippedwith the SKF significantly outperform the baselines on multiple datasets andour SKF generalizes to different models and scenes well. The code is availableat Semantic-Aware-Low-Light-Image-Enhancement.", "output": "Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Indirect Time of Flight LiDARs can indirectly calculate the scene's depthfrom the phase shift angle between transmitted and received laser signals withamplitudes modulated at a predefined frequency. Unfortunately, this methodgenerates ambiguity in calculated depth when the phase shift angle valueexceeds $2pi$. Current state-of-the-art methods use raw samples generatedusing two distinct modulation frequencies to overcome this ambiguity problem.However, this comes at the cost of increasing laser components' stress andraising their temperature, which reduces their lifetime and increases powerconsumption. In our work, we study two different methods to recover the entiredepth range of the LiDAR using fewer raw data sample shots from a singlemodulation frequency with the support of sensor's gray scale output to reducethe laser components' stress and power consumption.", "output": "Near Field iToF LIDAR Depth Improvement from Limited Number of Shots."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper discusses the results for the second edition of the MonocularDepth Estimation Challenge (MDEC). This edition was open to methods using anyform of supervision, including fully-supervised, self-supervised, multi-task orproxy depth. The challenge was based around the SYNS-Patches dataset, whichfeatures a wide diversity of environments with high-quality dense ground-truth.This includes complex natural environments, e.g. forests or fields, which aregreatly underrepresented in current benchmarks.The challenge received eight unique submissions that outperformed theprovided SotA baseline on any of the pointcloud- or image-based metrics. Thetop supervised submission improved relative F-Score by 27.62%, while the topself-supervised improved it by 16.61%. Supervised submissions generallyleveraged large collections of datasets to improve data diversity.Self-supervised submissions instead updated the network architecture andpretrained backbones. These results represent a significant progress in thefield, while highlighting avenues for future research, such as reducinginterpolation artifacts at depth boundaries, improving self-supervised indoorperformance and overall natural image accuracy.", "output": "The Second Monocular Depth Estimation Challenge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent years have witnessed an exponential increase in the demand for facevideo compression, and the success of artificial intelligence has expanded theboundaries beyond traditional hybrid video coding. Generative coding approacheshave been identified as promising alternatives with reasonable perceptualrate-distortion trade-offs, leveraging the statistical priors of face videos.However, the great diversity of distortion types in spatial and temporaldomains, ranging from the traditional hybrid coding frameworks to generativemodels, present grand challenges in compressed face video quality assessment(VQA). In this paper, we introduce the large-scale Compressed Face VideoQuality Assessment (CFVQA) database, which is the first attempt tosystematically understand the perceptual quality and diversified compressiondistortions in face videos. The database contains 3,240 compressed face videoclips in multiple compression levels, which are derived from 135 source videoswith diversified content using six representative video codecs, including twotraditional methods based on hybrid coding frameworks, two end-to-end methods,and two generative methods. In addition, a FAce VideO IntegeRity (FAVOR) indexfor face video compression was developed to measure the perceptual quality,considering the distinct content characteristics and temporal priors of theface videos. Experimental results exhibit its superior performance on theproposed CFVQA dataset. The benchmark is now made publicly available at:", "output": "Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generating synthetic datasets for training face recognition models ischallenging because dataset generation entails more than creating high fidelityimages. It involves generating multiple images of same subjects under differentfactors (textit{e.g.}, variations in pose, illumination, expression, aging andocclusion) which follows the real image conditional distribution. Previousworks have studied the generation of synthetic datasets using GAN or 3D models.In this work, we approach the problem from the aspect of combining subjectappearance (ID) and external factor (style) conditions. These two conditionsprovide a direct way to control the inter-class and intra-class variations. Tothis end, we propose a Dual Condition Face Generator (DCFace) based on adiffusion model. Our novel Patch-wise style extractor and Time-step dependentID loss enables DCFace to consistently produce face images of the same subjectunder different styles with precise control. Face recognition models trained onsynthetic images from the proposed DCFace provide higher verificationaccuracies compared to previous works by $6.11%$ on average in $4$ out of $5$test datasets, LFW, CFP-FP, CPLFW, AgeDB and CALFW. Code is available at", "output": "DCFace: Synthetic Face Generation with Dual Condition Diffusion Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The highly structured energy landscape of the loss as a function ofparameters for deep neural networks makes it necessary to use sophisticatedoptimization strategies in order to discover (local) minima that guaranteereasonable performance. Overcoming less suitable local minima is an importantprerequisite and often momentum methods are employed to achieve this. As inother non local optimization procedures, this however creates the necessity tobalance between exploration and exploitation. In this work, we suggest an eventbased control mechanism for switching from exploration to exploitation based onreaching a predefined reduction of the loss function. As we give the momentummethod a port Hamiltonian interpretation, we apply the 'heavy ball withfriction' interpretation and trigger breaking (or friction) when achievingcertain goals. We benchmark our method against standard stochastic gradientdescent and provide experimental evidence for improved performance of deepneural networks when our strategy is applied.", "output": "Who breaks early, looses: goal oriented training of deep neural networks based on port Hamiltonian dynamics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Structured reconstruction is a non-trivial dense prediction problem, whichextracts structural information (eg, building corners and edges) from a rasterimage, then reconstructs it to a 2D planar graph accordingly. Compared withcommon segmentation or detection problems, it significantly relays on thecapability that leveraging holistic geometric information for structuralreasoning. Current transformer-based approaches tackle this challenging problemin a two-stage manner, which detect corners in the first model and classify theproposed edges (corner-pairs) in the second model. However, they separatetwo-stage into different models and only share the backbone encoder. Unlike theexisting modeling strategies, we present an enhanced corner representationmethod: 1) It fuses knowledge between the corner detection and edge predictionby sharing feature in different granularity; 2) Corner candidates are proposedin four heatmap channels w.r.t its direction. Both qualitative and quantitativeevaluations demonstrate that our proposed method can better reconstructfine-grained structures, such as adjacent corners and tiny edges. Consequently,it outperforms the state-of-the-art model by +1.9%@F-1 on Corner and+3.0%@F-1 on Edge.", "output": "CornerFormer: Boosting Corner Representation for Fine-Grained Structured Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automatic and periodic recompiling of building databases with up-to-datehigh-resolution images has become a critical requirement for rapidly developingurban environments. However, the architecture of most existing approaches forchange extraction attempts to learn features related to changes but ignoresobjectives related to buildings. This inevitably leads to the generation ofsignificant pseudo-changes, due to factors such as seasonal changes in imagesand the inclination of building fac{c}ades. To alleviate the above-mentionedproblems, we developed a contrastive learning approach by validating historicalbuilding footprints against single up-to-date remotely sensed images. Thiscontrastive learning strategy allowed us to inject the semantics of buildingsinto a pipeline for the detection of changes, which is achieved by increasingthe distinguishability of features of buildings from those of non-buildings. Inaddition, to reduce the effects of inconsistencies between historical buildingpolygons and buildings in up-to-date images, we employed a deformableconvolutional neural network to learn offsets intuitively. In summary, weformulated a multi-branch building extraction method that identifies newlyconstructed and removed buildings, respectively. To validate our method, weconducted comparative experiments using the public Wuhan University buildingchange detection dataset and a more practical dataset named SI-BU that weestablished. Our method achieved F1 scores of 93.99% and 70.74% on the abovedatasets, respectively. Moreover, when the data of the public dataset weredivided in the same manner as in previous related studies, our method achievedan F1 score of 94.63%, which surpasses that of the state-of-the-art method.", "output": "BCE-Net: Reliable Building Footprints Change Extraction based on Historical Map and Up-to-Date Images using Contrastive Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a DETR-based method for cross-domain weakly supervisedobject detection (CDWSOD), aiming at adapting the detector from source totarget domain through weak supervision. We think DETR has strong potential forCDWSOD due to an insight: the encoder and the decoder in DETR are both based onthe attention mechanism and are thus capable of aggregating semantics acrossthe entire image. The aggregation results, i.e., image-level predictions, cannaturally exploit the weak supervision for domain alignment. Such motivated, wepropose DETR with additional Global Aggregation (DETR-GA), a CDWSOD detectorthat simultaneously makes \"instance-level + image-level\" predictions andutilizes \"strong + weak\" supervisions. The key point of DETR-GA is very simple:for the encoder / decoder, we respectively add multiple class queries / aforeground query to aggregate the semantics into image-level predictions. Ourquery-based aggregation has two advantages. First, in the encoder, theweakly-supervised class queries are capable of roughly locating thecorresponding positions and excluding the distraction from non-relevantregions. Second, through our design, the object queries and the foregroundquery in the decoder share consensus on the class semantics, therefore makingthe strong and weak supervision mutually benefit each other for domainalignment. Extensive experiments on four popular cross-domain benchmarks showthat DETR-GA significantly improves CSWSOD and advances the states of the art(e.g., 29.0% --> 79.4% mAP on PASCAL VOC --> Clipart_all dataset).", "output": "DETR with Additional Global Aggregation for Cross-domain Weakly Supervised Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion probabilistic models have been successful in generatinghigh-quality and diverse images. However, traditional models, whose input andoutput are high-resolution images, suffer from excessive memory requirements,making them less practical for edge devices. Previous approaches for generativeadversarial networks proposed a patch-based method that uses positionalencoding and global content information. Nevertheless, designing a patch-basedapproach for diffusion probabilistic models is non-trivial. In this paper, weresent a diffusion probabilistic model that generates images on apatch-by-patch basis. We propose two conditioning methods for a patch-basedgeneration. First, we propose position-wise conditioning using one-hotrepresentation to ensure patches are in proper positions. Second, we proposeGlobal Content Conditioning (GCC) to ensure patches have coherent content whenconcatenated together. We evaluate our model qualitatively and quantitativelyon CelebA and LSUN bedroom datasets and demonstrate a moderate trade-offbetween maximum memory consumption and generated image quality. Specifically,when an entire image is divided into 2 x 2 patches, our proposed approach canreduce the maximum memory consumption by half while maintaining comparableimage quality.", "output": "Memory Efficient Diffusion Probabilistic Models via Patch-based Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce Delta Denoising Score (DDS), a novel scoring function fortext-based image editing that guides minimal modifications of an input imagetowards the content described in a target prompt. DDS leverages the richgenerative prior of text-to-image diffusion models and can be used as a lossterm in an optimization problem to steer an image towards a desired directiondictated by a text. DDS utilizes the Score Distillation Sampling (SDS)mechanism for the purpose of image editing. We show that using only SDS oftenproduces non-detailed and blurry outputs due to noisy gradients. To addressthis issue, DDS uses a prompt that matches the input image to identify andremove undesired erroneous directions of SDS. Our key premise is that SDSshould be zero when calculated on pairs of matched prompts and images, meaningthat if the score is non-zero, its gradients can be attributed to the erroneouscomponent of SDS. Our analysis demonstrates the competence of DDS for textbased image-to-image translation. We further show that DDS can be used to trainan effective zero-shot image translation model. Experimental results indicatethat DDS outperforms existing methods in terms of stability and quality,highlighting its potential for real-world applications in text-based imageediting.", "output": "Delta Denoising Score."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual reasoning is a long-term goal of vision research. In the last decade,several works have attempted to apply deep neural networks (DNNs) to the taskof learning visual relations from images, with modest results in terms of thegeneralization of the relations learned. In recent years, several innovationsin DNNs have been developed in order to enable learning abstract relation fromimages. In this work, we systematically evaluate a series of DNNs thatintegrate mechanism such as slot attention, recurrently guided attention, andexternal memory, in the simplest possible visual reasoning task: decidingwhether two objects are the same or different. We found that, although somemodels performed better than others in generalizing the same-different relationto specific types of images, no model was able to generalize this relationacross the board. We conclude that abstract visual reasoning remains largely anunresolved challenge for DNNs.", "output": "The role of object-centric representations, guided attention, and external memory on generalizing visual relations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Alzheimer's Disease (AD), which is the most common cause of dementia, is aprogressive disease preceded by Mild Cognitive Impairment (MCI). Earlydetection of the disease is crucial for making treatment decisions. However,most of the literature on computer-assisted detection of AD focuses onclassifying brain images into one of three major categories: healthy, MCI, andAD; or categorising MCI patients into one of (1) progressive: those whoprogress from MCI to AD at a future examination time during a given studyperiod, and (2) stable: those who stay as MCI and never progress to AD. Thismisses the opportunity to accurately identify the trajectory of progressive MCIpatients. In this paper, we revisit the brain image classification task for ADidentification and re-frame it as an ordinal classification task to predict howclose a patient is to the severe AD stage. To this end, we select progressiveMCI patients from the Alzheimer's Disease Neuroimaging Initiative (ADNI)dataset and construct an ordinal dataset with a prediction target thatindicates the time to progression to AD. We train a siamese network model topredict the time to onset of AD based on MRI brain images. We also propose aweighted variety of siamese networks and compare its performance to a baselinemodel. Our evaluations show that incorporating a weighting factor to siamesenetworks brings considerable performance gain at predicting how close inputbrain MRI images are to progressing to AD.", "output": "Weighted Siamese Network to Predict the Time to Onset of Alzheimer's Disease from MRI Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose SampleDepth, a Convolutional Neural Network (CNN), that is suitedfor an adaptive LiDAR. Typically,LiDAR sampling strategy is pre-defined,constant and independent of the observed scene. Instead of letting a LiDARsample the scene in this agnostic fashion, SampleDepth determines, adaptively,where it is best to sample the current frame.To do that, SampleDepth uses depthsamples from previous time steps to predict a sampling mask for the currentframe. Crucially, SampleDepth is trained to optimize the performance of a depthcompletion downstream task. SampleDepth is evaluated on two different depthcompletion networks and two LiDAR datasets, KITTI Depth Completion and thenewly introduced synthetic dataset, SHIFT. We show that SampleDepth iseffective and suitable for different depth completion downstream tasks.", "output": "Prior based Sampling for Adaptive LiDAR."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-organ segmentation, which identifies and separates different organs inmedical images, is a fundamental task in medical image analysis. Recently, theimmense success of deep learning motivated its wide adoption in multi-organsegmentation tasks. However, due to expensive labor costs and expertise, theavailability of multi-organ annotations is usually limited and hence poses achallenge in obtaining sufficient training data for deep learning-basedmethods. In this paper, we aim to address this issue by combining off-the-shelfsingle-organ segmentation models to develop a multi-organ segmentation model onthe target dataset, which helps get rid of the dependence on annotated data formulti-organ segmentation. To this end, we propose a novel dual-stage methodthat consists of a Model Adaptation stage and a Model Ensemble stage. The firststage enhances the generalization of each off-the-shelf segmentation model onthe target domain, while the second stage distills and integrates knowledgefrom multiple adapted single-organ segmentation models. Extensive experimentson four abdomen datasets demonstrate that our proposed method can effectivelyleverage off-the-shelf single-organ segmentation models to obtain a tailoredmodel for multi-organ segmentation with high accuracy.", "output": "Tailored Multi-Organ Segmentation with Model Adaptation and Ensemble."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Optical flow provides information on relative motion that is an importantcomponent in many computer vision pipelines. Neural networks provide highaccuracy optical flow, yet their complexity is often prohibitive forapplication at the edge or in robots, where efficiency and latency play crucialrole. To address this challenge, we build on the latest developments inevent-based vision and spiking neural networks. We propose a new networkarchitecture, inspired by Timelens, that improves the state-of-the-artself-supervised optical flow accuracy when operated both in spiking andnon-spiking mode. To implement a real-time pipeline with a physical eventcamera, we propose a methodology for principled model simplification based onactivity and latency analysis. We demonstrate high speed optical flowprediction with almost two orders of magnitude reduced complexity whilemaintaining the accuracy, opening the path for real-time deployments.", "output": "Neuromorphic Optical Flow and Real-time Implementation with Event Cameras."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Point clouds are widely regarded as one of the best dataset types for urbanmapping purposes. Hence, point cloud datasets are commonly investigated asbenchmark types for various urban interpretation methods. Yet, few researchershave addressed the use of point cloud benchmarks for fac{c}ade segmentation.Robust fac{c}ade segmentation is becoming a key factor in various applicationsranging from simulating autonomous driving functions to preserving culturalheritage. In this work, we present a method of enriching existing point clouddatasets with fac{c}ade-related classes that have been designed to facilitatefac{c}ade segmentation testing. We propose how to efficiently extend existingdatasets and comprehensively assess their potential for fac{c}adesegmentation. We use the method to create the TUM-FAc{C}ADE dataset, whichextends the capabilities of TUM-MLS-2016. Not only can TUM-FAc{C}ADEfacilitate the development of point-cloud-based fac{c}ade segmentation tasks,but our procedure can also be applied to enrich further datasets.", "output": "TUM-FA\\c{C}ADE: Reviewing and enriching point cloud benchmarks for fa\\c{c}ade segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cancer is a highly heterogeneous condition that can occur almost anywhere inthe human body. 18F-fluorodeoxyglucose is an imaging modality commonly used todetect cancer due to its high sensitivity and clear visualisation of thepattern of metabolic activity. Nonetheless, as cancer is highly heterogeneous,it is challenging to train general-purpose discriminative cancer detectionmodels, with data availability and disease complexity often cited as a limitingfactor. Unsupervised anomaly detection models have been suggested as a putativesolution. These models learn a healthy representation of tissue and detectcancer by predicting deviations from the healthy norm, which requires modelscapable of accurately learning long-range interactions between organs and theirimaging patterns with high levels of expressivity. Such characteristics aresuitably satisfied by transformers, which have been shown to generatestate-of-the-art results in unsupervised anomaly detection by training onnormal data. This work expands upon such approaches by introducing multi-modalconditioning of the transformer via cross-attention i.e. supplying anatomicalreference from paired CT. Using 294 whole-body PET/CT samples, we show that ouranomaly detection method is robust and capable of achieving accurate cancerlocalization results even in cases where normal training data is unavailable.In addition, we show the efficacy of this approach on out-of-sample datashowcasing the generalizability of this approach with limited training data.Lastly, we propose to combine model uncertainty with a new kernel densityestimation approach, and show that it provides clinically and statisticallysignificant improvements when compared to the classic residual-based anomalymaps. Overall, a superior performance is demonstrated against leadingstate-of-the-art alternatives, drawing attention to the potential of theseapproaches.", "output": "Cross Attention Transformers for Multi-modal Unsupervised Whole-Body PET Anomaly Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "For visual estimation of optical flow, a crucial function for many visiontasks, unsupervised learning, using the supervision of view synthesis hasemerged as a promising alternative to supervised methods, since ground-truthflow is not readily available in many cases. However, unsupervised learning islikely to be unstable when pixel tracking is lost due to occlusion and motionblur, or the pixel matching is impaired due to variation in image content andspatial structure over time. In natural environments, dynamic occlusion orobject variation is a relatively slow temporal process spanning several frames.We, therefore, explore the optical flow estimation from multiple-framesequences of dynamic scenes, whereas most of the existing unsupervisedapproaches are based on temporal static models. We handle the unsupervisedoptical flow estimation with a temporal dynamic model by introducing aspatial-temporal dual recurrent block based on the predictive coding structure,which feeds the previous high-level motion prior to the current optical flowestimator. Assuming temporal smoothness of optical flow, we use motion priorsof the adjacent frames to provide more reliable supervision of the occludedregions. To grasp the essence of challenging scenes, we simulate variousscenarios across long sequences, including dynamic occlusion, contentvariation, and spatial variation, and adopt self-supervised distillation tomake the model understand the object's motion patterns in a prolonged dynamicenvironment. Experiments on KITTI 2012, KITTI 2015, Sintel Clean, and SintelFinal datasets demonstrate the effectiveness of our methods on unsupervisedoptical flow estimation. The proposal achieves state-of-the-art performancewith advantages in memory overhead.", "output": "Unsupervised Learning Optical Flow in Multi-frame Dynamic Environment Using Temporal Dynamic Modeling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Solar activity is one of the main drivers of variability in our solar systemand the key source of space weather phenomena that affect Earth and near Earthspace. The extensive record of high resolution extreme ultraviolet (EUV)observations from the Solar Dynamics Observatory (SDO) offers an unprecedented,very large dataset of solar images. In this work, we make use of thiscomprehensive dataset to investigate capabilities of current state-of-the-artgenerative models to accurately capture the data distribution behind theobserved solar activity states. Starting from StyleGAN-based methods, weuncover severe deficits of this model family in handling fine-scale details ofsolar images when training on high resolution samples, contrary to training onnatural face images. When switching to the diffusion based generative modelfamily, we observe strong improvements of fine-scale detail generation. For theGAN family, we are able to achieve similar improvements in fine-scalegeneration when turning to ProjectedGANs, which uses multi-scale discriminatorswith a pre-trained frozen feature extractor. We conduct ablation studies toclarify mechanisms responsible for proper fine-scale handling. Usingdistributed training on supercomputers, we are able to train generative modelsfor up to 1024x1024 resolution that produce high quality samplesindistinguishable to human experts, as suggested by the evaluation we conduct.We make all code, models and workflows used in this study publicly available aturl{", "output": "A Comparative Study on Generative Models for High Resolution Solar Observation Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, media reports have called out bias and racism in facerecognition technology. We review experimental results exploring severalspeculated causes for asymmetric cross-demographic performance. We consideraccuracy differences as represented by variations in non-mated (impostor) and /or mated (genuine) distributions for 1-to-1 face matching. Possible causesexplored include differences in skin tone, face size and shape, imbalance innumber of identities and images in the training data, and amount of facevisible in the test data (\"face pixels\"). We find that demographic differencesin face pixel information of the test images appear to most directly impact theresultant differences in face recognition accuracy.", "output": "Exploring Causes of Demographic Variations In Face Recognition Accuracy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent breakthroughs in natural language processing for model pretrainingon large quantities of data have opened the way for similar foundation modelsin computer vision. These models could greatly simplify the use of images inany system by producing all-purpose visual features, i.e., features that workacross image distributions and tasks without finetuning. This work shows thatexisting pretraining methods, especially self-supervised methods, can producesuch features if trained on enough curated data from diverse sources. Werevisit existing approaches and combine different techniques to scale ourpretraining in terms of data and model size. Most of the technicalcontributions aim at accelerating and stabilizing the training at scale. Interms of data, we propose an automatic pipeline to build a dedicated, diverse,and curated image dataset instead of uncurated data, as typically done in theself-supervised literature. In terms of models, we train a ViT model(Dosovitskiy et al., 2020) with 1B parameters and distill it into a series ofsmaller models that surpass the best available all-purpose features, OpenCLIP(Ilharco et al., 2021) on most of the benchmarks at image and pixel levels.", "output": "DINOv2: Learning Robust Visual Features without Supervision."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Understanding semantic scene segmentation of urban scenes captured from theUnmanned Aerial Vehicles (UAV) perspective plays a vital role in building aperception model for UAV. With the limitations of large-scale densely labeleddata, semantic scene segmentation for UAV views requires a broad understandingof an object from both its top and side views. Adapting from well-annotatedautonomous driving data to unlabeled UAV data is challenging due to thecross-view differences between the two data types. Our work proposes a novelCross-View Adaptation (CROVIA) approach to effectively adapt the knowledgelearned from on-road vehicle views to UAV views. First, a novel geometry-basedconstraint to cross-view adaptation is introduced based on the geometrycorrelation between views. Second, cross-view correlations from image space areeffectively transferred to segmentation space without any requirement of pairedon-road and UAV view data via a new Geometry-Constraint Cross-View (GeiCo)loss. Third, the multi-modal bijective networks are introduced to enforce theglobal structural modeling across views. Experimental results on new cross-viewadaptation benchmarks introduced in this work, i.e., SYNTHIA to UAVID and GTA5to UAVID, show the State-of-the-Art (SOTA) performance of our approach overprior adaptation methods", "output": "CROVIA: Seeing Drone Scenes from Car Perspective via Cross-View Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Event-based sensors have recently drawn increasing interest in roboticperception due to their lower latency, higher dynamic range, and lowerbandwidth requirements compared to standard CMOS-based imagers. Theseproperties make them ideal tools for real-time perception tasks in highlydynamic environments. In this work, we demonstrate an application where eventcameras excel: accurately estimating the impact location of fast-movingobjects. We introduce a lightweight event representation called Binary EventHistory Image (BEHI) to encode event data at low latency, as well as alearning-based approach that allows real-time inference of a confidence-enabledcontrol signal to the robot. To validate our approach, we present anexperimental catching system in which we catch fast-flying ping-pong balls. Weshow that the system is capable of achieving a success rate of 81% in catchingballs targeted at different locations, with a velocity of up to 13 m/s even oncompute-constrained embedded platforms such as the Nvidia Jetson NX.", "output": "EV-Catcher: High-Speed Object Catching Using Low-latency Event-based Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vegetation structure mapping is critical for understanding the global carboncycle and monitoring nature-based approaches to climate adaptation andmitigation. Repeat measurements of these data allow for the observation ofdeforestation or degradation of existing forests, natural forest regeneration,and the implementation of sustainable agricultural practices like agroforestry.Assessments of tree canopy height and crown projected area at a high spatialresolution are also important for monitoring carbon fluxes and assessingtree-based land uses, since forest structures can be highly spatiallyheterogeneous, especially in agroforestry systems. Very high resolutionsatellite imagery (less than one meter (1m) ground sample distance) makes itpossible to extract information at the tree level while allowing monitoring ata very large scale. This paper presents the first high-resolution canopy heightmap concurrently produced for multiple sub-national jurisdictions.Specifically, we produce canopy height maps for the states of California andS~{a}o Paolo, at sub-meter resolution, a significant improvement over the tenmeter (10m) resolution of previous Sentinel / GEDI based worldwide maps ofcanopy height. The maps are generated by applying a vision transformer tofeatures extracted from a self-supervised model in Maxar imagery from 2017 to2020, and are trained against aerial lidar and GEDI observations. We evaluatethe proposed maps with set-aside validation lidar data as well as by comparingwith other remotely sensed maps and field-collected data, and find our modelproduces an average Mean Absolute Error (MAE) within set-aside validation areasof 3.0 meters.", "output": "Sub-meter resolution canopy height maps using self-supervised learning and a vision transformer trained on Aerial and GEDI Lidar."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, pre-trained point cloud models have found extensive applications indownstream tasks like object classification. However, these tasks often require{full fine-tuning} of models and lead to storage-intensive procedures, thuslimiting the real applications of pre-trained models. Inspired by the greatsuccess of visual prompt tuning (VPT) in vision, we attempt to explore prompttuning, which serves as an efficient alternative to full fine-tuning forlarge-scale models, to point cloud pre-trained models to reduce storage costs.However, it is non-trivial to apply the traditional static VPT to point clouds,owing to the distribution diversity of point cloud data. For instance, thescanned point clouds exhibit various types of missing or noisy points. Toaddress this issue, we propose an Instance-aware Dynamic Prompt Tuning (IDPT)for point cloud pre-trained models, which utilizes a prompt module to perceivethe semantic prior features of each instance. This semantic prior facilitatesthe learning of unique prompts for each instance, thus enabling downstreamtasks to robustly adapt to pre-trained point cloud models. Notably, extensiveexperiments conducted on downstream tasks demonstrate that IDPT outperformsfull fine-tuning in most tasks with a mere 7% of the trainable parameters,thus significantly reducing the storage pressure. Code is available aturl{", "output": "Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Pedestrian attribute recognition (PAR) has received increasing attentionbecause of its wide application in video surveillance and pedestrian analysis.Extracting robust feature representation is one of the key challenges in thistask. The existing methods mainly use the convolutional neural network (CNN) asthe backbone network to extract features. However, these methods mainly focuson small discriminative regions while ignoring the global perspective. Toovercome these limitations, we propose a pure transformer-based multi-task PARnetwork named PARFormer, which includes four modules. In the feature extractionmodule, we build a transformer-based strong baseline for feature extraction,which achieves competitive results on several PAR benchmarks compared with theexisting CNN-based baseline methods. In the feature processing module, wepropose an effective data augmentation strategy named batch random mask (BRM)block to reinforce the attentive feature learning of random patches.Furthermore, we propose a multi-attribute center loss (MACL) to enhance theinter-attribute discriminability in the feature representations. In theviewpoint perception module, we explore the impact of viewpoints on pedestrianattributes, and propose a multi-view contrastive loss (MCVL) that enables thenetwork to exploit the viewpoint information. In the attribute recognitionmodule, we alleviate the negative-positive imbalance problem to generate theattribute predictions. The above modules interact and jointly learn a highlydiscriminative feature space, and supervise the generation of the finalfeatures. Extensive experimental results show that the proposed PARFormernetwork performs well compared to the state-of-the-art methods on severalpublic datasets, including PETA, RAP, and PA100K. Code will be released at", "output": "PARFormer: Transformer-based Multi-Task Network for Pedestrian Attribute Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The University of California San Francisco Brain Metastases StereotacticRadiosurgery (UCSF-BMSR) dataset is a public, clinical, multimodal brain MRIdataset consisting of 560 brain MRIs from 412 patients with expert annotationsof 5136 brain metastases. Data consists of registered and skull stripped T1post-contrast, T1 pre-contrast, FLAIR and subtraction (T1 pre-contrast - T1post-contrast) images and voxelwise segmentations of enhancing brain metastasesin NifTI format. The dataset also includes patient demographics, surgicalstatus and primary cancer types. The UCSF-BSMR has been made publicly availablein the hopes that researchers will use these data to push the boundaries of AIapplications for brain metastases.", "output": "The University of California San Francisco, Brain Metastases Stereotactic Radiosurgery (UCSF-BMSR) MRI Dataset."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The localization of objects is a crucial task in various applications such asrobotics, virtual and augmented reality, and the transportation of goods inwarehouses. Recent advances in deep learning have enabled the localizationusing monocular visual cameras. While structure from motion (SfM) predicts theabsolute pose from a point cloud, absolute pose regression (APR) methods learna semantic understanding of the environment through neural networks. However,both fields face challenges caused by the environment such as motion blur,lighting changes, repetitive patterns, and feature-less structures. This studyaims to address these challenges by incorporating additional information andregularizing the absolute pose using relative pose regression (RPR) methods.The optical flow between consecutive images is computed using the Lucas-Kanadealgorithm, and the relative pose is predicted using an auxiliary smallrecurrent convolutional network. The fusion of absolute and relative poses is acomplex task due to the mismatch between the global and local coordinatesystems. State-of-the-art methods fusing absolute and relative poses use posegraph optimization (PGO) to regularize the absolute pose predictions usingrelative poses. In this work, we propose recurrent fusion networks to optimallyalign absolute and relative pose predictions to improve the absolute poseprediction. We evaluate eight different recurrent units and construct asimulation environment to pre-train the APR and RPR networks for bettergeneralized training. Additionally, we record a large database of differentscenarios in a challenging large-scale indoor environment that mimics awarehouse with transportation robots. We conduct hyperparameter searches andexperiments to show the effectiveness of our recurrent fusion method comparedto PGO.", "output": "Fusing Structure from Motion and Simulation-Augmented Pose Regression from Optical Flow for Challenging Indoor Environments."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce Dynamic Mobile-Former(DMF), maximizes the capabilities ofdynamic convolution by harmonizing it with efficient operators.Our DynamicMobileFormer effectively utilizes the advantages of Dynamic MobileNet(MobileNet equipped with dynamic convolution) using global information fromlight-weight attention.A Transformer in Dynamic Mobile-Former only requires afew randomly initialized tokens to calculate global features, making itcomputationally efficient.And a bridge between Dynamic MobileNet andTransformer allows for bidirectional integration of local and globalfeatures.We also simplify the optimization process of vanilla dynamicconvolution by splitting the convolution kernel into an input-agnostic kerneland an input-dependent kernel.This allows for optimization in a wider kernelspace, resulting in enhanced capacity.By integrating lightweight attention andenhanced dynamic convolution, our Dynamic Mobile-Former achieves not only highefficiency, but also strong performance.We benchmark the Dynamic Mobile-Formeron a series of vision tasks, and showcase that it achieves impressiveperformance on image classification, COCO detection, and instanacesegmentation.For example, our DMF hits the top-1 accuracy of 79.4% onImageNet-1K, much higher than PVT-Tiny by 4.3% with only 1/4FLOPs.Additionally,our proposed DMF-S model performed well on challengingvision datasets such as COCO, achieving a 39.0% mAP,which is 1% higher thanthat of the Mobile-Former 508M model, despite using 3 GFLOPs lesscomputations.Code and models are available at ", "output": "Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention and Residual Connection in Kernel Space."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Object detection has seen remarkable progress in recent years with theintroduction of Convolutional Neural Networks (CNN). Object detection is amulti-task learning problem where both the position of the objects in theimages as well as their classes needs to be correctly identified. The idea hereis to maximize the overlap between the ground-truth bounding boxes and thepredictions i.e. the Intersection over Union (IoU). In the scope of work seencurrently in this domain, IoU is approximated by using the Huber loss as aproxy but this indirect method does not leverage the IoU information and treatsthe bounding box as four independent, unrelated terms of regression. This isnot true for a bounding box where the four coordinates are highly correlatedand hold a semantic meaning when taken together. The direct optimization of theIoU is not possible due to its non-convex and non-differentiable nature. Inthis paper, we have formulated a novel loss namely, the Smooth IoU, whichdirectly optimizes the IoUs for the bounding boxes. This loss has beenevaluated on the Oxford IIIT Pets, Udacity self-driving car, PASCAL VOC, andVWFS Car Damage datasets and has shown performance gains over the standardHuber loss.", "output": "Directly Optimizing IoU for Bounding Box Localization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Domain generalization (DG), aiming at models able to work on multiple unseendomains, is a must-have characteristic of general artificial intelligence. DGbased on single source domain training data is more challenging due to the lackof comparable information to help identify domain invariant features. In thispaper, it is determined that the domain invariant features could be containedin the single source domain training samples, then the task is to find properways to extract such domain invariant features from the single source domainsamples. An assumption is made that the domain invariant features are closelyrelated to the frequency. Then, a new method that learns through multiplefrequency domains is proposed. The key idea is, dividing the frequency domainof each original image into multiple subdomains, and learning features in thesubdomain by a designed two branches network. In this way, the model isenforced to learn features from more samples of the specifically limitedspectrum, which increases the possibility of obtaining the domain invariantfeatures that might have previously been defiladed by easily learned features.Extensive experimental investigation reveals that 1) frequency decompositioncan help the model learn features that are difficult to learn. 2) the proposedmethod outperforms the state-of-the-art methods of single-source domaingeneralization.", "output": "Frequency Decomposition to Tap the Potential of Single Domain for Generalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The strength of machine learning models stems from their ability to learncomplex function approximations from data; however, this strength also makestraining deep neural networks challenging. Notably, the complex models tend tomemorize the training data, which results in poor regularization performance ontest data. The regularization techniques such as L1, L2, dropout, etc. areproposed to reduce the overfitting effect; however, they bring in additionalhyperparameters tuning complexity. These methods also fall short when theinter-class similarity is high due to the underlying data distribution, leadingto a less accurate model. In this paper, we present a novel approach toregularize the models by leveraging the information-rich latent embeddings andtheir high intra-class correlation. We create phantom embeddings from a subsetof homogenous samples and use these phantom embeddings to decrease theinter-class similarity of instances in their latent embedding space. Theresulting models generalize better as a combination of their embedding andregularize them without requiring an expensive hyperparameter search. Weevaluate our method on two popular and challenging image classificationdatasets (CIFAR and FashionMNIST) and show how our approach outperforms thestandard baselines while displaying better training behavior.", "output": "Phantom Embeddings: Using Embedding Space for Model Regularization in Deep Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The standard non-invasive imaging technique used to assess the severity andextent of Coronary Artery Disease (CAD) is Coronary Computed TomographyAngiography (CCTA). However, manual grading of each patient's CCTA according tothe CAD-Reporting and Data System (CAD-RADS) scoring is time-consuming andoperator-dependent, especially in borderline cases. This work proposes a fullyautomated, and visually explainable, deep learning pipeline to be used as adecision support system for the CAD screening procedure. The pipeline performstwo classification tasks: firstly, identifying patients who require furtherclinical investigations and secondly, classifying patients into subgroups basedon the degree of stenosis, according to commonly used CAD-RADS thresholds. Thepipeline pre-processes multiplanar projections of the coronary arteries,extracted from the original CCTAs, and classifies them using a fine-tunedMulti-Axis Vision Transformer architecture. With the aim of emulating thecurrent clinical practice, the model is trained to assign a per-patient scoreby stacking the bi-dimensional longitudinal cross-sections of the three maincoronary arteries along channel dimension. Furthermore, it generates visuallyinterpretable maps to assess the reliability of the predictions. When run on adatabase of 1873 three-channel images of 253 patients collected at the MonzinoCardiology Center in Milan, the pipeline obtained an AUC of 0.87 and 0.93 forthe two classification tasks, respectively. According to our knowledge, this isthe first model trained to assign CAD-RADS scores learning solely from patientscores and not requiring finer imaging annotation steps that are not part ofthe clinical routine.", "output": "CAD-RADS scoring of coronary CT angiography with Multi-Axis Vision Transformer: a clinically-inspired deep learning pipeline."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We investigate the geometric properties of the functions learned by trainedConvNets in the preactivation space of their convolutional layers, byperforming an empirical study of hyperplane arrangements induced by aconvolutional layer. We introduce statistics over the weights of a trainednetwork to study local arrangements and relate them to the training dynamics.We observe that trained ConvNets show a significant statistical bias towardsregular hyperplane configurations. Furthermore, we find that layers showingbiased configurations are critical to validation performance for thearchitectures considered, trained on CIFAR10, CIFAR100 and ImageNet.", "output": "Hyperplane Arrangements of Trained ConvNets Are Biased."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces two new ensemble-based methods to reduce the data andcomputation costs of image classification. They can be used with any set ofclassifiers and do not require additional training. In the first approach, datausage is reduced by only analyzing a full-sized image if the model has lowconfidence in classifying a low-resolution pixelated version. When applied onthe best performing classifiers considered here, data usage is reduced by 61.2%on MNIST, 69.6% on KMNIST, 56.3% on FashionMNIST, 84.6% on SVHN, 40.6% onImageNet, and 27.6% on ImageNet-V2, all with a less than 5% reduction inaccuracy. However, for CIFAR-10, the pixelated data are not particularlyinformative, and the ensemble approach increases data usage while reducingaccuracy. In the second approach, compute costs are reduced by only using acomplex model if a simpler model has low confidence in its classification.Computation cost is reduced by 82.1% on MNIST, 47.6% on KMNIST, 72.3% onFashionMNIST, 86.9% on SVHN, 89.2% on ImageNet, and 81.5% on ImageNet-V2, allwith a less than 5% reduction in accuracy; for CIFAR-10 the correspondingimprovements are smaller at 13.5%. When cost is not an object, choosing theprojection from the most confident model for each observation increasesvalidation accuracy to 81.0% from 79.3% for ImageNet and to 69.4% from 67.5%for ImageNet-V2.", "output": "Problem-dependent attention and effort in neural networks with applications to image resolution and model selection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Image classifiers often rely overly on peripheral attributes that have astrong correlation with the target class (i.e., dataset bias) when makingpredictions. Due to the dataset bias, the model correctly classifies datasamples including bias attributes (i.e., bias-aligned samples) while failing tocorrectly predict those without bias attributes (i.e., bias-conflictingsamples). Recently, a myriad of studies focus on mitigating such dataset bias,the task of which is referred to as debiasing. However, our comprehensive studyindicates several issues need to be improved when conducting evaluation ofdebiasing in image classification. First, most of the previous studies do notspecify how they select their hyper-parameters and model checkpoints (i.e.,tuning criterion). Second, the debiasing studies until now evaluated theirproposed methods on datasets with excessively high bias-severities, showingdegraded performance on datasets with low bias severity. Third, the debiasingstudies do not share consistent experimental settings (e.g., datasets andneural networks) which need to be standardized for fair comparisons. Based onsuch issues, this paper 1) proposes an evaluation metric `Align-Conflict (AC)score' for the tuning criterion, 2) includes experimental settings with lowbias severity and shows that they are yet to be explored, and 3) unifies thestandardized experimental settings to promote fair comparisons betweendebiasing methods. We believe that our findings and lessons inspire futureresearchers in debiasing to further push state-of-the-art performances withfair comparisons.", "output": "Improving Evaluation of Debiasing in Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A growing body of work studies Blindspot Discovery Methods (\"BDM\"s): methodsthat use an image embedding to find semantically meaningful (i.e., united by ahuman-understandable concept) subsets of the data where an image classifierperforms significantly worse. Motivated by observed gaps in prior work, weintroduce a new framework for evaluating BDMs, SpotCheck, that uses syntheticimage datasets to train models with known blindspots and a new BDM, PlaneSpot,that uses a 2D image representation. We use SpotCheck to run controlledexperiments that identify factors that influence BDM performance (e.g., thenumber of blindspots in a model, or features used to define the blindspot) andshow that PlaneSpot is competitive with and in many cases outperforms existingBDMs. Importantly, we validate these findings by designing additionalexperiments that use real image data from MS-COCO, a large image benchmarkdataset. Our findings suggest several promising directions for future work onBDM design and evaluation. Overall, we hope that the methodology and analysespresented in this work will help facilitate a more rigorous science ofblindspot discovery.", "output": "Towards a More Rigorous Science of Blindspot Discovery in Image Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present NeuriCam, a novel deep learning-based system to achieve videocapture from low-power dual-mode IoT camera systems. Our idea is to design adual-mode camera system where the first mode is low-power (1.1 mW) but onlyoutputs grey-scale, low resolution, and noisy video and the second modeconsumes much higher power (100 mW) but outputs color and higher resolutionimages. To reduce total energy consumption, we heavily duty cycle the highpower mode to output an image only once every second. The data for this camerasystem is then wirelessly sent to a nearby plugged-in gateway, where we run ourreal-time neural network decoder to reconstruct a higher-resolution colorvideo. To achieve this, we introduce an attention feature filter mechanism thatassigns different weights to different features, based on the correlationbetween the feature map and the contents of the input frame at each spatiallocation. We design a wireless hardware prototype using off-the-shelf camerasand address practical issues including packet loss and perspective mismatch.Our evaluations show that our dual-camera approach reduces energy consumptionby 7x compared to existing systems. Further, our model achieves an averagegreyscale PSNR gain of 3.7 dB over prior single and dual-camera videosuper-resolution methods and 5.6 dB RGB gain over prior color propagationmethods. Open-source code: ", "output": "NeuriCam: Key-Frame Video Super-Resolution and Colorization for IoT Cameras."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Millimeter Wave (mmWave) Radar is gaining popularity as it can work inadverse environments like smoke, rain, snow, poor lighting, etc. Prior work hasexplored the possibility of reconstructing 3D skeletons or meshes from thenoisy and sparse mmWave Radar signals. However, it is unclear how accurately wecan reconstruct the 3D body from the mmWave signals across scenes and how itperforms compared with cameras, which are important aspects needed to beconsidered when either using mmWave radars alone or combining them withcameras. To answer these questions, an automatic 3D body annotation system isfirst designed and built up with multiple sensors to collect a large-scaledataset. The dataset consists of synchronized and calibrated mmWave radar pointclouds and RGB(D) images in different scenes and skeleton/mesh annotations forhumans in the scenes. With this dataset, we train state-of-the-art methods withinputs from different sensors and test them in various scenarios. The resultsdemonstrate that 1) despite the noise and sparsity of the generated pointclouds, the mmWave radar can achieve better reconstruction accuracy than theRGB camera but worse than the depth camera; 2) the reconstruction from themmWave radar is affected by adverse weather conditions moderately while theRGB(D) camera is severely affected. Further, analysis of the dataset and theresults shadow insights on improving the reconstruction from the mmWave radarand the combination of signals from different sensors.", "output": "mmBody Benchmark: 3D Body Reconstruction Dataset and Analysis for Millimeter Wave Radar."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "High-quality HDRIs(High Dynamic Range Images), typically HDR panoramas, areone of the most popular ways to create photorealistic lighting and 360-degreereflections of 3D scenes in graphics. Given the difficulty of capturing HDRIs,a versatile and controllable generative model is highly desired, where laymanusers can intuitively control the generation process. However, existingstate-of-the-art methods still struggle to synthesize high-quality panoramasfor complex scenes. In this work, we propose a zero-shot text-driven framework,Text2Light, to generate 4K+ resolution HDRIs without paired training data.Given a free-form text as the description of the scene, we synthesize thecorresponding HDRI with two dedicated steps: 1) text-driven panorama generationin low dynamic range(LDR) and low resolution, and 2) super-resolution inversetone mapping to scale up the LDR panorama both in resolution and dynamic range.Specifically, to achieve zero-shot text-driven panorama generation, we firstbuild dual codebooks as the discrete representation for diverse environmentaltextures. Then, driven by the pre-trained CLIP model, a text-conditioned globalsampler learns to sample holistic semantics from the global codebook accordingto the input text. Furthermore, a structure-aware local sampler learns tosynthesize LDR panoramas patch-by-patch, guided by holistic semantics. Toachieve super-resolution inverse tone mapping, we derive a continuousrepresentation of 360-degree imaging from the LDR panorama as a set ofstructured latent codes anchored to the sphere. This continuous representationenables a versatile module to upscale the resolution and dynamic rangesimultaneously. Extensive experiments demonstrate the superior capability ofText2Light in generating high-quality HDR panoramas. In addition, we show thefeasibility of our work in realistic rendering and immersive VR.", "output": "Text2Light: Zero-Shot Text-Driven HDR Panorama Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Semi-supervised learning via teacher-student network can train a modeleffectively on a few labeled samples. It enables a student model to distillknowledge from the teacher's predictions of extra unlabeled data. However, suchknowledge flow is typically unidirectional, having the performance vulnerableto the quality of teacher model. In this paper, we seek to robust 3Dreconstruction of stereo endoscopic images by proposing a novel fashion ofbidirectional learning between two learners, each of which can play both rolesof teacher and student concurrently. Specifically, we introduce twoself-supervisions, i.e., Adaptive Cross Supervision (ACS) and Adaptive ParallelSupervision (APS), to learn a dual-branch convolutional neural network. The twobranches predict two different disparity probability distributions for the sameposition, and output their expectations as disparity values. The learnedknowledge flows across branches along two directions: a cross direction(disparity guides distribution in ACS) and a parallel direction (disparityguides disparity in APS). Moreover, each branch also learns confidences todynamically refine its provided supervisions. In ACS, the predicted disparityis softened into a unimodal distribution, and the lower the confidence, thesmoother the distribution. In APS, the incorrect predictions are suppressed bylowering the weights of those with low confidence. With the adaptivebidirectional learning, the two branches enjoy well-tuned supervisions, andeventually converge on a consistent and more accurate disparity estimation. Theextensive and comprehensive experimental results on four public datasetsdemonstrate our superior performance over other state-of-the-arts with arelative decrease of averaged disparity error by at least 9.76%.", "output": "Bidirectional Semi-supervised Dual-branch CNN for Robust 3D Reconstruction of Stereo Endoscopic Images via Adaptive Cross and Parallel Supervisions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a simple and novel method for generating 3D human motion fromcomplex natural language sentences, which describe different velocity,direction and composition of all kinds of actions. Different from existingmethods that use classical generative architecture, we apply the DenoisingDiffusion Probabilistic Model to this task, synthesizing diverse motion resultsunder the guidance of texts. The diffusion model converts white noise intostructured 3D motion by a Markov process with a series of denoising steps andis efficiently trained by optimizing a variational lower bound. To achieve thegoal of text-conditioned image synthesis, we use the classifier-free guidancestrategy to fuse text embedding into the model during training. Our experimentsdemonstrate that our model achieves competitive results on HumanML3D test setquantitatively and can generate more visually natural and diverse examples. Wealso show with experiments that our model is capable of zero-shot generation ofmotions for unseen text guidance.", "output": "Diffusion Motion: Generate Text-Guided 3D Human Motion by Diffusion Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent works have demonstrated that natural language can be used to generateand edit 3D shapes. However, these methods generate shapes with limitedfidelity and diversity. We introduce CLIP-Sculptor, a method to address theseconstraints by producing high-fidelity and diverse 3D shapes without the needfor (text, shape) pairs during training. CLIP-Sculptor achieves this in amulti-resolution approach that first generates in a low-dimensional latentspace and then upscales to a higher resolution for improved shape fidelity. Forimproved shape diversity, we use a discrete latent space which is modeled usinga transformer conditioned on CLIP's image-text embedding space. We also presenta novel variant of classifier-free guidance, which improves theaccuracy-diversity trade-off. Finally, we perform extensive experimentsdemonstrating that CLIP-Sculptor outperforms state-of-the-art baselines. Thecode is available at ", "output": "CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Natural Language."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Scene flow estimation is a long-standing problem in computer vision, wherethe goal is to find the 3D motion of a scene from its consecutive observations.Recently, there have been efforts to compute the scene flow from 3D pointclouds. A common approach is to train a regression model that consumes sourceand target point clouds and outputs the per-point translation vector. Analternative is to learn point matches between the point clouds concurrentlywith regressing a refinement of the initial correspondence flow. In both cases,the learning task is very challenging since the flow regression is done in thefree 3D space, and a typical solution is to resort to a large annotatedsynthetic dataset. We introduce SCOOP, a new method for scene flow estimationthat can be learned on a small amount of data without employing ground-truthflow supervision. In contrast to previous work, we train a pure correspondencemodel focused on learning point feature representation and initialize the flowas the difference between a source point and its softly corresponding targetpoint. Then, in the run-time phase, we directly optimize a flow refinementcomponent with a self-supervised objective, which leads to a coherent andaccurate flow field between the point clouds. Experiments on widespreaddatasets demonstrate the performance gains achieved by our method compared toexisting leading techniques while using a fraction of the training data. Ourcode is publicly available at ", "output": "SCOOP: Self-Supervised Correspondence and Optimization-Based Scene Flow."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Novel view synthesis and 3D modeling using implicit neural fieldrepresentation are shown to be very effective for calibrated multi-viewcameras. Such representations are known to benefit from additional geometricand semantic supervision. Most existing methods that exploit additionalsupervision require dense pixel-wise labels or localized scene priors. Thesemethods cannot benefit from high-level vague scene priors provided in terms ofscenes' descriptions. In this work, we aim to leverage the geometric prior ofManhattan scenes to improve the implicit neural radiance field representations.More precisely, we assume that only the knowledge of the indoor scene (underinvestigation) being Manhattan is known -- with no additional informationwhatsoever -- with an unknown Manhattan coordinate frame. Such high-level prioris used to self-supervise the surface normals derived explicitly in theimplicit neural fields. Our modeling allows us to group the derived normals andexploit their orthogonality constraints for self-supervision. Our exhaustiveexperiments on datasets of diverse indoor scenes demonstrate the significantbenefit of the proposed method over the established baselines.", "output": "Neural Radiance Fields for Manhattan Scenes with Unknown Manhattan Frame."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a novel 3D morphable model for complete human heads based onhybrid neural fields. At the core of our model lies a neural parametricrepresentation that disentangles identity and expressions in disjoint latentspaces. To this end, we capture a person's identity in a canonical space as asigned distance field (SDF), and model facial expressions with a neuraldeformation field. In addition, our representation achieves high-fidelity localdetail by introducing an ensemble of local fields centered around facial anchorpoints. To facilitate generalization, we train our model on a newly-captureddataset of over 5200 head scans from 255 different identities using a customhigh-end 3D scanning setup. Our dataset significantly exceeds comparableexisting datasets, both with respect to quality and completeness of geometry,averaging around 3.5M mesh faces per scan. Finally, we demonstrate that ourapproach outperforms state-of-the-art methods in terms of fitting error andreconstruction quality.", "output": "Learning Neural Parametric Head Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training a Neural Radiance Field (NeRF) without pre-computed camera poses ischallenging. Recent advances in this direction demonstrate the possibility ofjointly optimising a NeRF and camera poses in forward-facing scenes. However,these methods still face difficulties during dramatic camera movement. Wetackle this challenging problem by incorporating undistorted monocular depthpriors. These priors are generated by correcting scale and shift parametersduring training, with which we are then able to constrain the relative posesbetween consecutive frames. This constraint is achieved using our proposednovel loss functions. Experiments on real-world indoor and outdoor scenes showthat our method can handle challenging camera trajectories and outperformsexisting methods in terms of novel view rendering quality and pose estimationaccuracy. Our project page is ", "output": "NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We learn a visual representation that captures information about the camerathat recorded a given photo. To do this, we train a multimodal embeddingbetween image patches and the EXIF metadata that cameras automatically insertinto image files. Our model represents this metadata by simply converting it totext and then processing it with a transformer. The features that we learnsignificantly outperform other self-supervised and supervised features ondownstream image forensics and calibration tasks. In particular, wesuccessfully localize spliced image regions \"zero shot\" by clustering thevisual embeddings for all of the patches within an image.", "output": "EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper we introduce the Temporo-Spatial Vision Transformer (TSViT), afully-attentional model for general Satellite Image Time Series (SITS)processing based on the Vision Transformer (ViT). TSViT splits a SITS recordinto non-overlapping patches in space and time which are tokenized andsubsequently processed by a factorized temporo-spatial encoder. We argue, thatin contrast to natural images, a temporal-then-spatial factorization is moreintuitive for SITS processing and present experimental evidence for this claim.Additionally, we enhance the model's discriminative power by introducing twonovel mechanisms for acquisition-time-specific temporal positional encodingsand multiple learnable class tokens. The effect of all novel design choices isevaluated through an extensive ablation study. Our proposed architectureachieves state-of-the-art performance, surpassing previous approaches by asignificant margin in three publicly available SITS semantic segmentation andclassification datasets. All model, training and evaluation codes are madepublicly available to facilitate further research.", "output": "ViTs for SITS: Vision Transformers for Satellite Image Time Series."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a generative document-specific approach to character analysis andrecognition in text lines. Our main idea is to build on unsupervisedmulti-object segmentation methods and in particular those that reconstructimages based on a limited amount of visual elements, called sprites. Taking asinput a set of text lines with similar font or handwriting, our approach canlearn a large number of different characters and leverage line-levelannotations when available. Our contribution is twofold. First, we provide thefirst adaptation and evaluation of a deep unsupervised multi-objectsegmentation approach for text line analysis. Since these methods have mainlybeen evaluated on synthetic data in a completely unsupervised setting,demonstrating that they can be adapted and quantitatively evaluated on realimages of text and that they can be trained using weak supervision aresignificant progresses. Second, we show the potential of our method for newapplications, more specifically in the field of paleography, which studies thehistory and variations of handwriting, and for cipher analysis. We demonstrateour approach on three very different datasets: a printed volume of theGoogle1000 dataset, the Copiale cipher and historical handwritten charters fromthe 12th and early 13th century.", "output": "The Learnable Typewriter: A Generative Approach to Text Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Effective BEV object detection on infrastructure can greatly improve trafficscenes understanding and vehicle-toinfrastructure (V2I) cooperative perception.However, cameras installed on infrastructure have various postures, andprevious BEV detection methods rely on accurate calibration, which is difficultfor practical applications due to inevitable natural factors (e.g., wind andsnow). In this paper, we propose a Calibration-free BEV Representation (CBR)network, which achieves 3D detection based on BEV representation withoutcalibration parameters and additional depth supervision. Specifically, weutilize two multi-layer perceptrons for decoupling the features fromperspective view to front view and birdeye view under boxes-induced foregroundsupervision. Then, a cross-view feature fusion module matches features fromorthogonal views according to similarity and conducts BEV feature enhancementwith front view features. Experimental results on DAIR-V2X demonstrate that CBRachieves acceptable performance without any camera parameters and is naturallynot affected by calibration noises. We hope CBR can serve as a baseline forfuture research addressing practical challenges of infrastructure perception.", "output": "Calibration-free BEV Representation for Infrastructure Perception."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we present our advanced solutions to the two sub-challenges ofAffective Behavior Analysis in the wild (ABAW) 2023: the Emotional ReactionIntensity (ERI) Estimation Challenge and Expression (Expr) ClassificationChallenge. ABAW 2023 aims to tackle the challenge of affective behavioranalysis in natural contexts, with the ultimate goal of creating intelligentmachines and robots that possess the ability to comprehend human emotions,feelings, and behaviors. For the Expression Classification Challenge, wepropose a streamlined approach that handles the challenges of classificationeffectively. However, our main contribution lies in our use of diverse modelsand tools to extract multimodal features such as audio and video cues from theHume-Reaction dataset. By studying, analyzing, and combining these features, wesignificantly enhance the model's accuracy for sentiment prediction in amultimodal context. Furthermore, our method achieves outstanding results on theEmotional Reaction Intensity (ERI) Estimation Challenge, surpassing thebaseline method by an impressive 84% increase, as measured by the PearsonCoefficient, on the validation dataset.", "output": "Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While it is well known that population differences from genetics, sex, race,and environmental factors contribute to disease, AI studies in medicine havelargely focused on locoregional patient cohorts with less diverse data sources.Such limitation stems from barriers to large-scale data share and ethicalconcerns over data privacy. Federated learning (FL) is one potential pathwayfor AI development that enables learning across hospitals without data share.In this study, we show the results of various FL strategies on one of thelargest and most diverse COVID-19 chest CT datasets: 21 participating hospitalsacross five continents that comprise >10,000 patients with >1 million images.We also propose an FL strategy that leverages synthetically generated data toovercome class and size imbalances. We also describe the sources of dataheterogeneity in the context of FL, and show how even among the correctlylabeled populations, disparities can arise due to these biases.", "output": "AI Models Close to your Chest: Robust Federated Learning Strategies for Multi-site CT."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Medical images usually suffer from image degradation in clinical practice,leading to decreased performance of deep learning-based models. To resolve thisproblem, most previous works have focused on filtering out degradation-causinglow-quality images while ignoring their potential value for models. Througheffectively learning and leveraging the knowledge of degradations, models canbetter resist their adverse effects and avoid misdiagnosis. In this paper, weraise the problem of image quality-aware diagnosis, which aims to takeadvantage of low-quality images and image quality labels to achieve a moreaccurate and robust diagnosis. However, the diversity of degradations andsuperficially unrelated targets between image quality assessment and diseasediagnosis makes it still quite challenging to effectively leverage qualitylabels to assist diagnosis. Thus, to tackle these issues, we propose a novelmeta-knowledge co-embedding network, consisting of two subnets: Task Net andMeta Learner. Task Net constructs an explicit quality information utilizationmechanism to enhance diagnosis via knowledge co-embedding features, while MetaLearner ensures the effectiveness and constrains the semantics of thesefeatures via meta-learning and joint-encoding masking. Superior performance onfive datasets with four widely-used medical imaging modalities demonstrates theeffectiveness and generalizability of our method.", "output": "Image Quality-aware Diagnosis via Meta-knowledge Co-embedding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the realm of machine learning, the study of anomaly detection andlocalization within image data has gained substantial traction, particularlyfor practical applications such as industrial defect detection. While themajority of existing methods predominantly use Convolutional Neural Networks(CNN) as their primary network architecture, we introduce a novel approachbased on the Transformer backbone network. Our method employs a two-stageincremental learning strategy. During the first stage, we train a MaskedAutoencoder (MAE) model solely on normal images. In the subsequent stage, weapply pixel-level data augmentation techniques to generate corrupted normalimages and their corresponding pixel labels. This process allows the model tolearn how to repair corrupted regions and classify the status of each pixel.Ultimately, the model generates a pixel reconstruction error matrix and a pixelanomaly probability matrix. These matrices are then combined to produce ananomaly scoring matrix that effectively detects abnormal regions. Whenbenchmarked against several state-of-the-art CNN-based methods, our approachexhibits superior performance on the MVTec AD dataset, achieving an impressive97.6% AUC.", "output": "ISSTAD: Incremental Self-Supervised Learning Based on Transformer for Anomaly Detection and Localization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent text-to-image generation models like DreamBooth have made remarkableprogress in generating highly customized images of a target subject, byfine-tuning an ``expert model'' for a given subject from a few examples.However, this process is expensive, since a new expert model must be learnedfor each subject. In this paper, we present SuTI, a Subject-drivenText-to-Image generator that replaces subject-specific fine tuning withemph{in-context} learning. Given a few demonstrations of a new subject, SuTIcan instantly generate novel renditions of the subject in different scenes,without any subject-specific optimization. SuTI is powered by {emapprenticeship learning}, where a single apprentice model is learned from datagenerated by massive amount of subject-specific expert models. Specifically, wemine millions of image clusters from the Internet, each centered around aspecific visual subject. We adopt these clusters to train massive amount ofexpert models specialized on different subjects. The apprentice model SuTI thenlearns to mimic the behavior of these experts through the proposedapprenticeship learning algorithm. SuTI can generate high-quality andcustomized subject-specific images 20x faster than optimization-based SoTAmethods. On the challenging DreamBench and DreamBench-v2, our human evaluationshows that SuTI can significantly outperform existing approaches likeInstructPix2Pix, Textual Inversion, Imagic, Prompt2Prompt, Re-Imagen whileperforming on par with DreamBooth.", "output": "Subject-driven Text-to-Image Generation via Apprenticeship Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated Learning (FL) aims to learn a single global model that enables thecentral server to help the model training in local clients without accessingtheir local data. The key challenge of FL is the heterogeneity of local data indifferent clients, such as heterogeneous label distribution and feature shift,which could lead to significant performance degradation of the learned models.Although many studies have been proposed to address the heterogeneous labeldistribution problem, few studies attempt to explore the feature shift issue.To address this issue, we propose a simple yet effective algorithm, namelytextbf{p}ersonalized textbf{Fed}erated learning with textbf{L}ocaltextbf{A}ttention (pFedLA), by incorporating the attention mechanism intopersonalized models of clients while keeping the attention blocksclient-specific. Specifically, two modules are proposed in pFedLA, i.e., thepersonalized single attention module and the personalized hybrid attentionmodule. In addition, the proposed pFedLA method is quite flexible and generalas it can be incorporated into any FL method to improve their performancewithout introducing additional communication costs. Extensive experimentsdemonstrate that the proposed pFedLA method can boost the performance ofstate-of-the-art FL methods on different tasks such as image classification andobject detection tasks.", "output": "Personalized Federated Learning with Local Attention."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Let us rethink the real-world scenarios that require human motion predictiontechniques, such as human-robot collaboration. Current works simplify the taskof predicting human motions into a one-off process of forecasting a shortfuture sequence (usually no longer than 1 second) based on a historicalobserved one. However, such simplification may fail to meet practical needs dueto the neglect of the fact that motion prediction in real applications is notan isolated ``observe then predict'' unit, but a consecutive process composedof many rounds of such unit, semi-overlapped along the entire sequence. As timegoes on, the predicted part of previous round has its corresponding groundtruth observable in the new round, but their deviation in-between is neitherexploited nor able to be captured by existing isolated learning fashion. Inthis paper, we propose DeFeeNet, a simple yet effective network that can beadded on existing one-off prediction models to realize deviation perception andfeedback when applied to consecutive motion prediction task. At each predictionround, the deviation generated by previous unit is first encoded by ourDeFeeNet, and then incorporated into the existing predictor to enable adeviation-aware prediction manner, which, for the first time, allows forinformation transmit across adjacent prediction units. We design two versionsof DeFeeNet as MLP-based and GRU-based, respectively. On Human3.6M and morecomplicated BABEL, experimental results indicate that our proposed networkimproves consecutive human motion prediction performance regardless of thebasic model.", "output": "DeFeeNet: Consecutive 3D Human Motion Prediction with Deviation Feedback."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual simultaneous localization and mapping (SLAM) systems face challengesin detecting loop closure under the circumstance of large viewpoint changes. Inthis paper, we present an object-based loop closure detection method based onthe spatial layout and semanic consistency of the 3D scene graph. Firstly, wepropose an object-level data association approach based on the semanticinformation from semantic labels, intersection over union (IoU), object color,and object embedding. Subsequently, multi-view bundle adjustment with theassociated objects is utilized to jointly optimize the poses of objects andcameras. We represent the refined objects as a 3D spatial graph with semanticsand topology. Then, we propose a graph matching approach to selectcorrespondence objects based on the structure layout and semantic propertysimilarity of vertices' neighbors. Finally, we jointly optimize cameratrajectories and object poses in an object-level pose graph optimization, whichresults in a globally consistent map. Experimental results demonstrate that ourproposed data association approach can construct more accurate 3D semanticmaps, and our loop closure method is more robust than point-based andobject-based methods in circumstances with large viewpoint changes.", "output": "Loop Closure Detection Based on Object-level Spatial Layout and Semantic Consistency."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present DreamPose, a diffusion-based method for generating animatedfashion videos from still images. Given an image and a sequence of human bodyposes, our method synthesizes a video containing both human and fabric motion.To achieve this, we transform a pretrained text-to-image model (StableDiffusion) into a pose-and-image guided video synthesis model, using a novelfinetuning strategy, a set of architectural changes to support the addedconditioning signals, and techniques to encourage temporal consistency. Wefine-tune on a collection of fashion videos from the UBC Fashion dataset. Weevaluate our method on a variety of clothing styles and poses, and demonstratethat our method produces state-of-the-art results on fashion video animation.Video results are available on our project page.", "output": "DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Exemplar-based image colorization aims to colorize a target grayscale imagebased on a color reference image, and the key is to establish accuratepixel-level semantic correspondence between these two images. Previous methodssearch for correspondence across the entire reference image, and this type ofglobal matching is easy to get mismatch. We summarize the difficulties in twoaspects: (1) When the reference image only contains a part of objects relatedto target image, improper correspondence will be established in unrelatedregions. (2) It is prone to get mismatch in regions where the shape or textureof the object is easily confused. To overcome these issues, we propose SPColor,a semantic prior guided exemplar-based image colorization framework. Differentfrom previous methods, SPColor first coarsely classifies pixels of thereference and target images to several pseudo-classes under the guidance ofsemantic prior, then the correspondences are only established locally betweenthe pixels in the same class via the newly designed semantic prior guidedcorrespondence network. In this way, improper correspondence between differentsemantic classes is explicitly excluded, and the mismatch is obviouslyalleviated. Besides, to better reserve the color from reference, a similaritymasked perceptual loss is designed. Noting that the carefully designed SPColorutilizes the semantic prior provided by an unsupervised segmentation model,which is free for additional manual semantic annotations. Experimentsdemonstrate that our model outperforms recent state-of-the-art methods bothquantitatively and qualitatively on public dataset.", "output": "SPColor: Semantic Prior Guided Exemplar-based Image Colorization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent applications of deep convolutional neural networks in medical imagingraise concerns about their interpretability. While most explainable deeplearning applications use post hoc methods (such as GradCAM) to generatefeature attribution maps, there is a new type of case-based reasoning models,namely ProtoPNet and its variants, which identify prototypes during trainingand compare input image patches with those prototypes. We propose the firstmedical prototype network (MProtoNet) to extend ProtoPNet to brain tumorclassification with 3D multi-parametric magnetic resonance imaging (mpMRI)data. To address different requirements between 2D natural images and 3D mpMRIsespecially in terms of localizing attention regions, a new attention modulewith soft masking and online-CAM loss is introduced. Soft masking helps sharpenattention maps, while online-CAM loss directly utilizes image-level labels whentraining the attention module. MProtoNet achieves statistically significantimprovements in interpretability metrics of both correctness and localizationcoherence (with a best activation precision of $0.713pm0.058$) withouthuman-annotated labels during training, when compared with GradCAM and severalProtoPNet variants. The source code is available at", "output": "MProtoNet: A Case-Based Interpretable Model for Brain Tumor Classification with 3D Multi-parametric Magnetic Resonance Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The advances in automatic sign language translation (SLT) to spoken languageshave been mostly benchmarked with datasets of limited size and restricteddomains. Our work advances the state of the art by providing the first baselineresults on How2Sign, a large and broad dataset.We train a Transformer over I3D video features, using the reduced BLEU as areference metric for validation, instead of the widely used BLEU score. Wereport a result of 8.03 on the BLEU score, and publish the first open-sourceimplementation of its kind to promote further advances.", "output": "Sign Language Translation from Instructional Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce Flatlandia, a novel problem for visual localization of an imagefrom object detections composed of two specific tasks: i) Coarse MapLocalization: localizing a single image observing a set of objects in respectto a 2D map of object landmarks; ii) Fine-grained 3DoF Localization: estimatinglatitude, longitude, and orientation of the image within a 2D map. Solutionsfor these new tasks exploit the wide availability of open urban maps annotatedwith GPS locations of common objects (eg via surveying or crowd-sourced). Suchmaps are also more storage-friendly than standard large-scale 3D models oftenused in visual localization while additionally being privacy-preserving. Asexisting datasets are unsuited for the proposed problem, we provide theFlatlandia dataset, designed for 3DoF visual localization in multiple urbansettings and based on crowd-sourced data from five European cities. We use theFlatlandia dataset to validate the complexity of the proposed tasks.", "output": "You are here! Finding position and orientation on a 2D map from a single image: The Flatlandia localization problem and dataset."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Document-based Visual Question Answering examines the document understandingof document images in conditions of natural language questions. We proposed anew document-based VQA dataset, PDF-VQA, to comprehensively examine thedocument understanding from various aspects, including document elementrecognition, document layout structural understanding as well as contextualunderstanding and key information extraction. Our PDF-VQA dataset extends thecurrent scale of document understanding that limits on the single document pageto the new scale that asks questions over the full document of multiple pages.We also propose a new graph-based VQA model that explicitly integrates thespatial and hierarchically structural relationships between different documentelements to boost the document structural understanding. The performances arecompared with several baselines over different question types andtasksfootnote{The full dataset will be released after paper acceptance.", "output": "PDF-VQA: A New Dataset for Real-World VQA on PDF Documents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spatial control is a core capability in controllable image generation.Advancements in layout-guided image generation have shown promising results onin-distribution (ID) datasets with similar spatial configurations. However, itis unclear how these models perform when facing out-of-distribution (OOD)samples with arbitrary, unseen layouts. In this paper, we propose LayoutBench,a diagnostic benchmark for layout-guided image generation that examines fourcategories of spatial control skills: number, position, size, and shape. Webenchmark two recent representative layout-guided image generation methods andobserve that the good ID layout control may not generalize well to arbitrarylayouts in the wild (e.g., objects at the boundary). Next, we proposeIterInpaint, a new baseline that generates foreground and background regions ina step-by-step manner via inpainting, demonstrating stronger generalizabilitythan existing models on OOD layouts in LayoutBench. We perform quantitative andqualitative evaluation and fine-grained analysis on the four LayoutBench skillsto pinpoint the weaknesses of existing models. Lastly, we show comprehensiveablation studies on IterInpaint, including training task ratio, crop&paste vs.repaint, and generation order. Project website: ", "output": "Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dynamic neural networks can greatly reduce computation redundancy withoutcompromising accuracy by adapting their structures based on the input. In thispaper, we explore the robustness of dynamic neural networks againstenergy-oriented attacks targeted at reducing their efficiency. Specifically, weattack dynamic models with our novel algorithm GradMDM. GradMDM is a techniquethat adjusts the direction and the magnitude of the gradients to effectivelyfind a small perturbation for each input, that will activate more computationalunits of dynamic models during inference. We evaluate GradMDM on multipledatasets and dynamic models, where it outperforms previous energy-orientedattack techniques, significantly increasing computation complexity whilereducing the perceptibility of the perturbations.", "output": "GradMDM: Adversarial Attack on Dynamic Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Meta-learning is a framework for learning learning algorithms throughrepeated interactions with an environment as opposed to designing them by hand.In recent years, this framework has established itself as a promising tool forbuilding models of human cognition. Yet, a coherent research program aroundmeta-learned models of cognition is still missing. The purpose of this articleis to synthesize previous work in this field and establish such a researchprogram. We rely on three key pillars to accomplish this goal. We first pointout that meta-learning can be used to construct Bayes-optimal learningalgorithms. This result not only implies that any behavioral phenomenon thatcan be explained by a Bayesian model can also be explained by a meta-learnedmodel but also allows us to draw strong connections to the rational analysis ofcognition. We then discuss several advantages of the meta-learning frameworkover traditional Bayesian methods. In particular, we argue that meta-learningcan be applied to situations where Bayesian inference is impossible and that itenables us to make rational models of cognition more realistic, either byincorporating limited computational resources or neuroscientific knowledge.Finally, we reexamine prior studies from psychology and neuroscience that haveapplied meta-learning and put them into the context of these new insights. Insummary, our work highlights that meta-learning considerably extends the scopeof rational analysis and thereby of cognitive theories more generally.", "output": "Meta-Learned Models of Cognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Coronaviruses are membrane-enveloped, non-segmented positive-strand RNAviruses belonging to the Coronaviridae family. Various animal species, mainlymammalian and avian, are severely infected by various coronaviruses, causingserious concerns like the recent pandemic (COVID-19). Therefore, building adeeper understanding of these viruses is essential to devise prevention andmitigation mechanisms. In the Coronavirus genome, an essential structuralregion is the spike region, and it's responsible for attaching the virus to thehost cell membrane. Therefore, the usage of only the spike protein, instead ofthe full genome, provides most of the essential information for performinganalyses such as host classification. In this paper, we propose a novel methodfor predicting the host specificity of coronaviruses by analyzing spike proteinsequences from different viral subgenera and species. Our method involves usingthe Poisson correction distance to generate a distance matrix, followed byusing a radial basis function (RBF) kernel and kernel principal componentanalysis (PCA) to generate a low-dimensional embedding. Finally, we applyclassification algorithms to the low-dimensional embedding to generate theresulting predictions of the host specificity of coronaviruses. We providetheoretical proofs for the non-negativity, symmetry, and triangle inequalityproperties of the Poisson correction distance metric, which are importantproperties in a machine-learning setting. By encoding the spike proteinstructure and sequences using this comprehensive approach, we aim to uncoverhidden patterns in the biological sequences to make accurate predictions abouthost specificity. Finally, our classification results illustrate that ourmethod can achieve higher predictive accuracy and improve performance overexisting baselines.", "output": "PCD2Vec: A Poisson Correction Distance-Based Approach for Viral Host Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper considers the problem of testing the maximum in-degree of theBayes net underlying an unknown probability distribution $P$ over ${0,1}^n$,given sample access to $P$. We show that the sample complexity of the problemis $tilde{Theta}(2^{n/2}/varepsilon^2)$. Our algorithm relies on atesting-by-learning framework, previously used to obtain sample-optimaltesters; in order to apply this framework, we develop new algorithms for``near-proper'' learning of Bayes nets, and high-probability learning under$chi^2$ divergence, which are of independent interest.", "output": "Near-Optimal Degree Testing for Bayes Nets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Humans excel at continually acquiring, consolidating, and retaininginformation from an ever-changing environment, whereas artificial neuralnetworks (ANNs) exhibit catastrophic forgetting. There are considerabledifferences in the complexity of synapses, the processing of information, andthe learning mechanisms in biological neural networks and their artificialcounterparts, which may explain the mismatch in performance. We consider abiologically plausible framework that constitutes separate populations ofexclusively excitatory and inhibitory neurons that adhere to Dale's principle,and the excitatory pyramidal neurons are augmented with dendritic-likestructures for context-dependent processing of stimuli. We then conduct acomprehensive study on the role and interactions of different mechanismsinspired by the brain, including sparse non-overlapping representations,Hebbian learning, synaptic consolidation, and replay of past activations thataccompanied the learning event. Our study suggests that the employing ofmultiple complementary mechanisms in a biologically plausible architecture,similar to the brain, may be effective in enabling continual learning in ANNs.", "output": "A Study of Biologically Plausible Neural Network: The Role and Interactions of Brain-Inspired Mechanisms in Continual Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We develop an end-to-end workflow for the training and implementation ofco-designed neural networks (NNs) for efficient field-programmable gate array(FPGA) and application-specific integrated circuit (ASIC) hardware. Ourapproach leverages Hessian-aware quantization (HAWQ) of NNs, the Quantized OpenNeural Network Exchange (QONNX) intermediate representation, and the hls4mltool flow for transpiling NNs into FPGA and ASIC firmware. This makes efficientNN implementations in hardware accessible to nonexperts, in a singleopen-sourced workflow that can be deployed for real-time machine learningapplications in a wide range of scientific and industrial settings. Wedemonstrate the workflow in a particle physics application involving triggerdecisions that must operate at the 40 MHz collision rate of the CERN LargeHadron Collider (LHC). Given the high collision rate, all data processing mustbe implemented on custom ASIC and FPGA hardware within a strict area andlatency. Based on these constraints, we implement an optimized mixed-precisionNN classifier for high-momentum particle jets in simulated LHC proton-protoncollisions.", "output": "End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large decoder-only language models (LMs) can be largely improved in terms ofperplexity by retrieval (e.g., RETRO), but its impact on text generationquality and downstream task accuracy is unclear. Thus, it is still an openquestion: shall we pretrain large autoregressive LMs with retrieval? To answerit, we perform a comprehensive study on a scalable pre-trainedretrieval-augmented LM (i.e., RETRO) compared with standard GPT andretrieval-augmented GPT incorporated at fine-tuning or inference stages. Wefirst provide the recipe to reproduce RETRO up to 9.5B parameters whileretrieving a text corpus with 330B tokens. Based on that, we have the followingnovel findings: i) RETRO outperforms GPT on text generation with much lessdegeneration (i.e., repetition), moderately higher factual accuracy, andslightly lower toxicity with a nontoxic retrieval database. ii) On the LMEvaluation Harness benchmark, RETRO largely outperforms GPT onknowledge-intensive tasks, but is on par with GPT on other tasks. Furthermore,we introduce a simple variant of the model, RETRO++, which largely improvesopen-domain QA results of original RETRO (e.g., EM score +8.6 on NaturalQuestion) and significantly outperforms retrieval-augmented GPT acrossdifferent model sizes. Our findings highlight the promising direction ofpretraining autoregressive LMs with retrieval as future foundation models. Werelease our implementation at: ", "output": "Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative foundation models are susceptible to implicit biases that canarise from extensive unsupervised training data. Such biases can producesuboptimal samples, skewed outcomes, and unfairness, with potentiallysignificant repercussions. Consequently, aligning these models with humanethics and preferences is an essential step toward ensuring their responsibleand effective deployment in real-world applications. Prior research hasprimarily employed Reinforcement Learning from Human Feedback (RLHF) as a meansof addressing this problem, wherein generative models are fine-tuned using RLalgorithms guided by a human-feedback-informed reward model. However, theinefficiencies and instabilities associated with RL algorithms frequentlypresent substantial obstacles to the successful alignment of generative models,necessitating the development of a more robust and streamlined approach. Tothis end, we introduce a new framework, Reward rAnked FineTuning (RAFT),designed to align generative models more effectively. Utilizing a reward modeland a sufficient number of samples, our approach selects the high-qualitysamples, discarding those that exhibit undesired behavior, and subsequentlyassembles a streaming dataset. This dataset serves as the basis for aligningthe generative model and can be employed under both offline and onlinesettings. Notably, the sample generation process within RAFT is gradient-free,rendering it compatible with black-box generators. Through extensiveexperiments, we demonstrate that our proposed algorithm exhibits strongperformance in the context of both large language models and diffusion models.", "output": "RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning algorithms, both in their classical and quantum versions,heavily rely on optimization algorithms based on gradients, such as gradientdescent and alike. The overall performance is dependent on the appearance oflocal minima and barren plateaus, which slow-down calculations and lead tonon-optimal solutions. In practice, this results in dramatic computational andenergy costs for AI applications. In this paper we introduce a generic strategyto accelerate and improve the overall performance of such methods, allowing toalleviate the effect of barren plateaus and local minima. Our method is basedon coordinate transformations, somehow similar to variational rotations, addingextra directions in parameter space that depend on the cost function itself,and which allow to explore the configuration landscape more efficiently. Thevalidity of our method is benchmarked by boosting a number of quantum machinelearning algorithms, getting a very significant improvement in theirperformance.", "output": "Improving Gradient Methods via Coordinate Transformations: Applications to Quantum Machine Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Online recognition of gestures is critical for intuitive human-robotinteraction (HRI) and further push collaborative robotics into the market,making robots accessible to more people. The problem is that it is difficult toachieve accurate gesture recognition in real unstructured environments, oftenusing distorted and incomplete multisensory data. This paper introduces an HRIframework to classify large vocabularies of interwoven static gestures (SGs)and dynamic gestures (DGs) captured with wearable sensors. DG features areobtained by applying data dimensionality reduction to raw data from sensors(resampling with cubic interpolation and principal component analysis).Experimental tests were conducted using the UC2017 hand gesture dataset withsamples from eight different subjects. The classification models show anaccuracy of 95.6% for a library of 24 SGs with a random forest and 99.3% for 10DGs using artificial neural networks. These results compare equally orfavorably with different commonly used classifiers. Long short-term memory deepnetworks achieved similar performance in online frame-by-frame classificationusing raw incomplete data, performing better in terms of accuracy than staticmodels with specially crafted features, but worse in training and inferencetime. The recognized gestures are used to teleoperate a robot in acollaborative process that consists in preparing a breakfast meal.", "output": "Online Recognition of Incomplete Gesture Data to Interface Collaborative Robots."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the problem of learning conditional distributions of the form $p(G |hat G)$, where $G$ and $hat G$ are two 3D graphs, using continuousnormalizing flows. We derive a semi-equivariance condition on the flow whichensures that conditional invariance to rigid motions holds. We demonstrate theeffectiveness of the technique in the molecular setting of receptor-awareligand generation.", "output": "Semi-Equivariant Conditional Normalizing Flows."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes a distributionally robust approach to regret optimalcontrol of discrete-time linear dynamical systems with quadratic costs subjectto stochastic additive disturbance on the state process. The underlyingprobability distribution of the disturbance process is unknown, but assumed tolie in a given ball of distributions defined in terms of the type-2 Wassersteindistance. In this framework, strictly causal linear disturbance feedbackcontrollers are designed to minimize the worst-case expected regret. The regretincurred by a controller is defined as the difference between the cost itincurs in response to a realization of the disturbance process and the costincurred by the optimal noncausal controller which has perfect knowledge of thedisturbance process realization at the outset. Building on a well-establishedduality theory for optimal transport problems, we show how to equivalentlyreformulate this minimax regret optimal control problem as a tractablesemidefinite program. The equivalent dual reformulation also allows us tocharacterize a worst-case distribution achieving the worst-case expected regretin relation to the distribution at the center of the Wasserstein ball.", "output": "A Distributionally Robust Approach to Regret Optimal Control using the Wasserstein Distance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present the first $varepsilon$-differentially private, computationallyefficient algorithm that estimates the means of product distributions over${0,1}^d$ accurately in total-variation distance, whilst attaining theoptimal sample complexity to within polylogarithmic factors. The prior work hadeither solved this problem efficiently and optimally under weaker notions ofprivacy, or had solved it optimally while having exponential running times.", "output": "A Polynomial Time, Pure Differentially Private Estimator for Binary Product Distributions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The decision tree ensembles use a single data feature at each node forsplitting the data. However, splitting in this manner may fail to capture thegeometric properties of the data. Thus, oblique decision trees generate theoblique hyperplane for splitting the data at each non-leaf node. Obliquedecision trees capture the geometric properties of the data and hence, showbetter generalization. The performance of the oblique decision trees depends onthe way oblique hyperplanes are generate and the data used for the generationof those hyperplanes. Recently, multiple classifiers have been used in aheterogeneous random forest (RaF) classifier, however, it fails to generate thetrees of proper depth. Moreover, double RaF studies highlighted that largertrees can be generated via bootstrapping the data at each non-leaf node andsplitting the original data instead of the bootstrapped data recently. Thestudy of heterogeneous RaF lacks the generation of larger trees while as thedouble RaF based model fails to take over the geometric characteristics of thedata. To address these shortcomings, we propose heterogeneous oblique doubleRaF. The proposed model employs several linear classifiers at each non-leafnode on the bootstrapped data and splits the original data based on the optimallinear classifier. The optimal hyperplane corresponds to the models based onthe optimized impurity criterion. The experimental analysis indicates that theperformance of the introduced heterogeneous double random forest iscomparatively better than the baseline models. To demonstrate the effectivenessof the proposed heterogeneous double random forest, we used it for thediagnosis of Schizophrenia disease. The proposed model predicted the diseasemore accurately compared to the baseline models.", "output": "Heterogeneous Oblique Double Random Forest."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Edge computing solutions that enable the extraction of high level informationfrom a variety of sensors is in increasingly high demand. This is due to theincreasing number of smart devices that require sensory processing for theirapplication on the edge. To tackle this problem, we present a smart visionsensor System on Chip (Soc), featuring an event-based camera and a low powerasynchronous spiking Convolutional Neuronal Network (sCNN) computingarchitecture embedded on a single chip. By combining both sensor and processingon a single die, we can lower unit production costs significantly. Moreover,the simple end-to-end nature of the SoC facilitates small stand-aloneapplications as well as functioning as an edge node in a larger systems. Theevent-driven nature of the vision sensor delivers high-speed signals in asparse data stream. This is reflected in the processing pipeline, focuses onoptimising highly sparse computation and minimising latency for 9 sCNN layersto $3.36mu s$. Overall, this results in an extremely low-latency visualprocessing pipeline deployed on a small form factor with a low energy budgetand sensor cost. We present the asynchronous architecture, the individualblocks, the sCNN processing principle and benchmark against other sCNN capableprocessors.", "output": "Speck: A Smart event-based Vision Sensor with a low latency 327K Neuron Convolutional Neuronal Network Processing Pipeline."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces a novel Token-and-Duration Transducer (TDT)architecture for sequence-to-sequence tasks. TDT extends conventionalRNN-Transducer architectures by jointly predicting both a token and itsduration, i.e. the number of input frames covered by the emitted token. This isachieved by using a joint network with two outputs which are independentlynormalized to generate distributions over tokens and durations. Duringinference, TDT models can skip input frames guided by the predicted durationoutput, which makes them significantly faster than conventional Transducerswhich process the encoder output frame by frame. TDT models achieve both betteraccuracy and significantly faster inference than conventional Transducers ondifferent sequence transduction tasks. TDT models for Speech Recognitionachieve better accuracy and up to 2.82X faster inference than RNN-Transducers.TDT models for Speech Translation achieve an absolute gain of over 1 BLEU onthe MUST-C test compared with conventional Transducers, and its inference is2.27X faster. In Speech Intent Classification and Slot Filling tasks, TDTmodels improve the intent accuracy up to over 1% (absolute) over conventionalTransducers, while running up to 1.28X faster.", "output": "Efficient Sequence Transduction by Jointly Predicting Tokens and Durations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a novel approach for black-box VI that bypasses the difficultiesof stochastic gradient ascent, including the task of selecting step-sizes. Ourapproach involves using a sequence of sample average approximation (SAA)problems. SAA approximates the solution of stochastic optimization problems bytransforming them into deterministic ones. We use quasi-Newton methods and linesearch to solve each deterministic optimization problem and present a heuristicpolicy to automate hyperparameter selection. Our experiments show that ourmethod simplifies the VI problem and achieves faster performance than existingmethods.", "output": "Sample Average Approximation for Black-Box VI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many collective systems exist in nature far from equilibrium, ranging fromcellular sheets up to flocks of birds. These systems reflect a form of activematter, whereby individual material components have internal energy. Underspecific parameter regimes, these active systems undergo phase transitionswhereby small fluctuations of single components can lead to global changes tothe rheology of the system. Simulations and methods from statistical physicsare typically used to understand and predict these phase transitions forreal-world observations. In this work, we demonstrate that simulation-basedinference can be used to robustly infer active matter parameters from systemobservations. Moreover, we demonstrate that a small number (from one to three)snapshots of the system can be used for parameter inference and that thisgraph-informed approach outperforms typical metrics such as the averagevelocity or mean square displacement of the system. Our work highlights thathigh-level system information is contained within the relational structure of acollective system and that this can be exploited to better couple models todata.", "output": "Graph-informed simulation-based inference for models of active matter."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study actively labeling streaming data, where an active learner is facedwith a stream of data points and must carefully choose which of these points tolabel via an expensive experiment. Such problems frequently arise inapplications such as healthcare and astronomy. We first study a setting whenthe data's inputs belong to one of $K$ discrete distributions and formalizethis problem via a loss that captures the labeling cost and the predictionerror. When the labeling cost is $B$, our algorithm, which chooses to label apoint if the uncertainty is larger than a time and cost dependent threshold,achieves a worst-case upper bound of $O(B^{frac{1}{3}} K^{frac{1}{3}}T^{frac{2}{3}})$ on the loss after $T$ rounds. We also provide a more nuancedupper bound which demonstrates that the algorithm can adapt to the arrivalpattern, and achieves better performance when the arrival pattern is morefavorable. We complement both upper bounds with matching lower bounds. We nextstudy this problem when the inputs belong to a continuous domain and the outputof the experiment is a smooth function with bounded RKHS norm. After $T$ roundsin $d$ dimensions, we show that the loss is bounded by $O(B^{frac{1}{d+3}}T^{frac{d+2}{d+3}})$ in an RKHS with a squared exponential kernel and by$O(B^{frac{1}{2d+3}} T^{frac{2d+2}{2d+3}})$ in an RKHS with a Mat'ernkernel. Our empirical evaluation demonstrates that our method outperforms otherbaselines in several synthetic experiments and two real experiments in medicineand astronomy.", "output": "Active Cost-aware Labeling of Streaming Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Ragnarock is a virtual reality (VR) rhythm game in which you play a Vikingcaptain competing in a longship race. With two hammers, the task is to crushthe incoming runes in sync with epic Viking music. The runes are defined by abeat map which the player can manually create. The creation of beat maps takeshours. This work aims to automate the process of beat map creation, also knownas the task of learning to choreograph. The assignment is broken down intothree parts: determining the timing of the beats (action placement),determining where in space the runes connected with the chosen beats should beplaced (action selection) and web-application creation. For the first task ofaction placement, extraction of predominant local pulse (PLP) information frommusic recordings is used. This approach allows to learn where and how manybeats are supposed to be placed. For the second task of action selection,Recurrent Neural Networks (RNN) are used, specifically Gated recurrent unit(GRU) to learn sequences of beats, and their patterns to be able to recreatethose rules and receive completely new levels. Then the last task was to builda solution for non-technical players, the task was to combine the results ofthe first and the second parts into a web application for easy use. For thistask the frontend was built using JavaScript and React and the backend - pythonand FastAPI.", "output": "Level generation for rhythm VR games."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a novel, physically-constrained and differentiable approach forthe generation of D-dimensional qudit states via spontaneous parametricdown-conversion (SPDC) in quantum optics. We circumvent any limitations imposedby the inherently stochastic nature of the physical process and incorporate aset of stochastic dynamical equations governing its evolution under the SPDCHamiltonian. We demonstrate the effectiveness of our model through the designof structured nonlinear photonic crystals (NLPCs) and shaped pump beams; andshow, theoretically and experimentally, how to generate maximally entangledstates in the spatial degree of freedom. The learning of NLPC structures offersa promising new avenue for shaping and controlling arbitrary quantum states andenables all-optical coherent control of the generated states. We believe thatthis approach can readily be extended from bulky crystals to thin Metasurfacesand potentially applied to other quantum systems sharing a similar Hamiltonianstructures, such as superfluids and superconductors.", "output": "Designing Nonlinear Photonic Crystals for High-Dimensional Quantum State Engineering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Out-of-distribution (OOD) detection aims to identify test examples that donot belong to the training distribution and are thus unlikely to be predictedreliably. Despite a plethora of existing works, most of them focused only onthe scenario where OOD examples come from semantic shift (e.g., unseencategories), ignoring other possible causes (e.g., covariate shift). In thispaper, we present a novel, unifying framework to study OOD detection in abroader scope. Instead of detecting OOD examples from a particular cause, wepropose to detect examples that a deployed machine learning model (e.g., animage classifier) is unable to predict correctly. That is, whether a testexample should be detected and rejected or not is ``model-specific''. We showthat this framework unifies the detection of OOD examples caused by semanticshift and covariate shift, and closely addresses the concern of applying amachine learning model to uncontrolled environments. We provide an extensiveanalysis that involves a variety of models (e.g., different architectures andtraining strategies), sources of OOD examples, and OOD detection approaches,and reveal several insights into improving and understanding OOD detection inuncontrolled environments.", "output": "Unified Out-Of-Distribution Detection: A Model-Specific Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large Language Models (LLM) are a new class of computation engines,\"programmed\" via prompt engineering. We are still learning how to best\"program\" these LLMs to help developers. We start with the intuition thatdevelopers tend to consciously and unconsciously have a collection of semanticsfacts in mind when working on coding tasks. Mostly these are shallow, simplefacts arising from a quick read. For a function, examples of facts mightinclude parameter and local variable names, return expressions, simple pre- andpost-conditions, and basic control and data flow, etc.One might assume that the powerful multi-layer architecture oftransformer-style LLMs makes them inherently capable of doing this simple levelof \"code analysis\" and extracting such information, implicitly, whileprocessing code: but are they, really? If they aren't, could explicitly addingthis information help? Our goal here is to investigate this question, using thecode summarization task and evaluate whether automatically augmenting an LLM'sprompt with semantic facts explicitly, actually helps.Prior work shows that LLM performance on code summarization benefits fromfew-shot samples drawn either from the same-project or from examples found viainformation retrieval methods (such as BM25). While summarization performancehas steadily increased since the early days, there is still room forimprovement: LLM performance on code summarization still lags its performanceon natural-language tasks like translation and text summarization.We find that adding semantic facts actually does help! This approach improvesperformance in several different settings suggested by prior work, includingfor two different Large Language Models. In most cases, improvement nears orexceeds 2 BLEU; for the PHP language in the challenging CodeSearchNet dataset,this augmentation actually yields performance surpassing 30 BLEU.", "output": "Improving Few-Shot Prompts with Relevant Static Analysis Products."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This technical report studies the problem of ranking from pairwisecomparisons in the classical Bradley-Terry-Luce (BTL) model, with a focus onscore estimation. For general graphs, we show that, with sufficiently manysamples, maximum likelihood estimation (MLE) achieves an entrywise estimationerror matching the Cram'er-Rao lower bound, which can be stated in terms ofeffective resistances; the key to our analysis is a connection betweenstatistical estimation and iterative optimization by preconditioned gradientdescent. We are also particularly interested in graphs with locality, whereonly nearby items can be connected by edges; our analysis identifies conditionsunder which locality does not hurt, i.e. comparing the scores between a pair ofitems that are far apart in the graph is nearly as easy as comparing a pair ofnearby items. We further explore divide-and-conquer algorithms that canprovably achieve similar guarantees even in the regime with the sparsestsamples, while enjoying certain computational advantages. Numerical resultsvalidate our theory and confirm the efficacy of the proposed algorithms.", "output": "Ranking from Pairwise Comparisons in General Graphs and Graphs with Locality."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dynamic Graph Neural Networks (DGNNs) are becoming increasingly popular dueto their effectiveness in analyzing and predicting the evolution of complexinterconnected graph-based systems. However, hardware deployment of DGNNs stillremains a challenge. First, DGNNs do not fully utilize hardware resourcesbecause temporal data dependencies cause low hardware parallelism.Additionally, there is currently a lack of generic DGNN hardware acceleratorframeworks, and existing GNN accelerator frameworks have limited ability tohandle dynamic graphs with changing topologies and node features. To addressthe aforementioned challenges, in this paper, we propose DGNN-Booster, which isa novel Field-Programmable Gate Array (FPGA) accelerator framework forreal-time DGNN inference using High-Level Synthesis (HLS). It includes twodifferent FPGA accelerator designs with different dataflows that can supportthe most widely used DGNNs. We showcase the effectiveness of our designs byimplementing and evaluating two representative DGNN models on ZCU102 board andmeasuring the end-to-end performance. The experiment results demonstrate thatDGNN-Booster can achieve a speedup of up to 5.6x compared to the CPU baseline(6226R), 8.4x compared to the GPU baseline (A6000) and 2.1x compared to theFPGA baseline without applying optimizations proposed in this paper. Moreover,DGNN-Booster can achieve over 100x and over 1000x runtime energy efficiencythan the CPU and GPU baseline respectively. Our implementation code andon-board measurements are publicly available at", "output": "DGNN-Booster: A Generic FPGA Accelerator Framework For Dynamic Graph Neural Network Inference."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a simple non-linear embedding adaptation layer, which isfine-tuned on top of fixed pre-trained features for one-shot tasks, improvingsignificantly transductive entropy-based inference for low-shot regimes. Ournorm-induced transformation could be understood as a re-parametrization of thefeature space to disentangle the representations of different classes in a taskspecific manner. It focuses on the relevant feature dimensions while hinderingthe effects of non-relevant dimensions that may cause overfitting in a one-shotsetting. We also provide an interpretation of our proposed featuretransformation in the basic case of few-shot inference with K-means clustering.Furthermore, we give an interesting bound-optimization link between K-means andentropy minimization. This emphasizes why our feature transformation is usefulin the context of entropy minimization. We report comprehensive experiments,which show consistent improvements over a variety of one-shot benchmarks,outperforming recent state-of-the-art methods.", "output": "Task Adaptive Feature Transformation for One-Shot Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In data-driven stochastic optimization, model parameters of the underlyingdistribution need to be estimated from data in addition to the optimizationtask. Recent literature suggests the integration of the estimation andoptimization processes, by selecting model parameters that lead to the bestempirical objective performance. Such an integrated approach can be readilyshown to outperform simple ``estimate then optimize\" when the model ismisspecified. In this paper, we argue that when the model class is rich enoughto cover the ground truth, the performance ordering between the two approachesis reversed for nonlinear problems in a strong sense. Simple ``estimate thenoptimize\" outperforms the integrated approach in terms of stochastic dominanceof the asymptotic optimality gap, i,e, the mean, all other moments, and theentire asymptotic distribution of the optimality gap is always better.Analogous results also hold under constrained settings and when contextualfeatures are available. We also provide experimental findings to support ourtheory.", "output": "Estimate-Then-Optimize Versus Integrated-Estimation-Optimization: A Stochastic Dominance Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we propose to use various artificial neural network (ANN)structures for modeling and compensation of intra- and inter-subcarrier fibernonlinear interference in digital subcarrier multiplexing (DSCM) opticaltransmission systems. We perform nonlinear channel equalization by employingdifferent ANN cores including convolutional neural networks (CNN) and longshort-term memory (LSTM) layers. We start to compensate the fiber nonlinearitydistortion in DSCM systems by a fully connected network across all subcarriers.In subsequent steps, and borrowing from fiber nonlinearity analysis, wegradually upgrade the designs towards modular structures with betterperformance-complexity advantages. Our study shows that putting proper macrostructures in design of ANN nonlinear equalizers in DSCM systems can be crucialfor practical solutions in future generations of coherent optical transceivers.", "output": "Neural Network Architectures for Optical Channel Nonlinear Compensation in Digital Subcarrier Multiplexing Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although multi-task deep neural network (DNN) models have computation andstorage benefits over individual single-task DNN models, they can be furtheroptimized via model compression. Numerous structured pruning methods arealready developed that can readily achieve speedups in single-task models, butthe pruning of multi-task networks has not yet been extensively studied. Inthis work, we investigate the effectiveness of structured pruning on multi-taskmodels. We use an existing single-task filter pruning criterion and alsointroduce an MTL-based filter pruning criterion for estimating the filterimportance scores. We prune the model using an iterative pruning strategy withboth pruning methods. We show that, with careful hyper-parameter tuning,architectures obtained from different pruning methods do not have significantdifferences in their performances across tasks when the number of parameters issimilar. We also show that iterative structure pruning may not be the best wayto achieve a well-performing pruned model because, at extreme pruning levels,there is a high drop in performance across all tasks. But when the same modelsare randomly initialized and re-trained, they show better results.", "output": "Structured Pruning for Multi-Task Deep Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we tackle the problem of video alignment, the process ofmatching the frames of a pair of videos containing similar actions. The mainchallenge in video alignment is that accurate correspondence should beestablished despite the differences in the execution processes and appearancesbetween the two videos. We introduce an unsupervised method for alignment thatuses global and local features of the frames. In particular, we introduceeffective features for each video frame by means of three machine vision tools:person detection, pose estimation, and VGG network. Then the features areprocessed and combined to construct a multidimensional time series thatrepresent the video. The resulting time series are used to align videos of thesame actions using a novel version of dynamic time warping named DiagonalizedDynamic Time Warping(DDTW). The main advantage of our approach is that notraining is required, which makes it applicable for any new type of actionwithout any need to collect training samples for it. For evaluation, weconsidered video synchronization and phase classification tasks on the Pennaction dataset. Also, for an effective evaluation of the video synchronizationtask, we present a new metric called Enclosed Area Error(EAE). The results showthat our method outperforms previous state-of-the-art methods, such as TCC andother self-supervised and supervised methods.", "output": "Video alignment using unsupervised learning of local and global features."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recommender systems are increasingly successful in recommending personalizedcontent to users. However, these systems often capitalize on popular content.There is also a continuous evolution of user interests that need to becaptured, but there is no direct way to systematically explore users'interests. This also tends to affect the overall quality of the recommendationpipeline as training data is generated from the candidates presented to theuser. In this paper, we present a framework for exploration in large-scalerecommender systems to address these challenges. It consists of three parts,first the user-creator exploration which focuses on identifying the bestcreators that users are interested in, second the online exploration frameworkand third a feed composition mechanism that balances explore and exploit toensure optimal prevalence of exploratory videos. Our methodology can be easilyintegrated into an existing large-scale recommender system with minimalmodifications. We also analyze the value of exploration by defining relevantmetrics around user-creator connections and understanding how this helps theoverall recommendation pipeline with strong online gains in creator andecosystem value. In contrast to the regression on user engagement metricsgenerally seen while exploring, our method is able to achieve significantimprovements of 3.50% in strong creator connections and 0.85% increase in novelcreator connections. Moreover, our work has been deployed in production onFacebook Watch, a popular video discovery and sharing platform serving billionsof users.", "output": "PIE: Personalized Interest Exploration for Large-Scale Recommender Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robots operating in real-world environments must reason about possibleoutcomes of stochastic actions and make decisions based on partial observationsof the true world state. A major challenge for making accurate and robustaction predictions is the problem of confounding, which if left untreated canlead to prediction errors. The partially observable Markov decision process(POMDP) is a widely-used framework to model these stochastic andpartially-observable decision-making problems. However, due to a lack ofexplicit causal semantics, POMDP planning methods are prone to confounding biasand thus in the presence of unobserved confounders may produce underperformingpolicies. This paper presents a novel causally-informed extension of \"anytimeregularized determinized sparse partially observable tree\" (AR-DESPOT), amodern anytime online POMDP planner, using causal modelling and inference toeliminate errors caused by unmeasured confounder variables. We further proposea method to learn offline the partial parameterisation of the causal model forplanning, from ground truth model data. We evaluate our methods on a toyproblem with an unobserved confounder and show that the learned causal model ishighly accurate, while our planning method is more robust to confounding andproduces overall higher performing policies than AR-DESPOT.", "output": "CAR-DESPOT: Causally-Informed Online POMDP Planning for Robots in Confounded Environments."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vaccine hesitancy continues to be a main challenge for public healthofficials during the COVID-19 pandemic. As this hesitancy undermines vaccinecampaigns, many researchers have sought to identify its root causes, findingthat the increasing volume of anti-vaccine misinformation on social mediaplatforms is a key element of this problem. We explored Twitter as a source ofmisleading content with the goal of extracting overlapping cultural andpolitical beliefs that motivate the spread of vaccine misinformation. To dothis, we have collected a data set of vaccine-related Tweets and annotated themwith the help of a team of annotators with a background in communications andjournalism. Ultimately we hope this can lead to effective and targeted publichealth communication strategies for reaching individuals with anti-vaccinebeliefs. Moreover, this information helps with developing Machine Learningmodels to automatically detect vaccine misinformation posts and combat theirnegative impacts. In this paper, we present Vax-Culture, a novel TwitterCOVID-19 dataset consisting of 6373 vaccine-related tweets accompanied by anextensive set of human-provided annotations including vaccine-hesitancy stance,indication of any misinformation in tweets, the entities criticized andsupported in each tweet and the communicated message of each tweet. Moreover,we define five baseline tasks including four classification and one sequencegeneration tasks, and report the results of a set of recent transformer-basedmodels for them. The dataset and code are publicly available at", "output": "Vax-Culture: A Dataset for Studying Vaccine Discourse on Twitter."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large pre-trained language models are widely used in the community. Thesemodels are usually trained on unmoderated and unfiltered data from open sourceslike the Internet. Due to this, biases that we see in platforms online whichare a reflection of those in society are in turn captured and learned by thesemodels. These models are deployed in applications that affect millions ofpeople and their inherent biases are harmful to the targeted social groups. Inthis work, we study the general trend in bias reduction as newer pre-trainedmodels are released. Three recent models ( ELECTRA, DeBERTa, and DistilBERT)are chosen and evaluated against two bias benchmarks, StereoSet andCrowS-Pairs. They are compared to the baseline of BERT using the associatedmetrics. We explore whether as advancements are made and newer, faster, lightermodels are released: are they being developed responsibly such that theirinherent social biases have been reduced compared to their older counterparts?The results are compiled and we find that all the models under study do exhibitbiases but have generally improved as compared to BERT.", "output": "Evaluation of Social Biases in Recent Large Pre-Trained Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Self-supervision methods learn representations by solving pretext tasks thatdo not require human-generated labels, alleviating the need for time-consumingannotations. These methods have been applied in computer vision, naturallanguage processing, environmental sound analysis, and recently in musicinformation retrieval, e.g. for pitch estimation. Particularly in the contextof music, there are few insights about the fragility of these models regardingdifferent distributions of data, and how they could be mitigated. In thispaper, we explore these questions by dissecting a self-supervised model forpitch estimation adapted for tempo estimation via rigorous experimentation withsynthetic data. Specifically, we study the relationship between the inputrepresentation and data distribution for self-supervised tempo estimation.", "output": "Tempo vs. Pitch: understanding self-supervised tempo estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As language models scale up, it becomes increasingly expensive to verifyresearch ideas because conclusions on small models do not trivially transfer tolarge ones. A possible solution is to establish a generic system that directlypredicts some metrics for large models solely based on the results andhyperparameters from small models. Existing methods based on scaling lawsrequire hyperparameter search on the largest models, which is impractical withlimited resources. We address this issue by presenting our discoveriesindicating that Maximal Update parametrization (muP) enables accurate fittingof scaling laws for hyperparameters close to common loss basins, without anysearch. Thus, different models can be directly compared on large scales withloss prediction even before the training starts. We propose a new paradigm as afirst step towards reliable academic research for any model scale without heavycomputation. Code will be publicly available shortly.", "output": "Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Performative prediction is a framework for learning models that influence thedata they intend to predict. We focus on finding classifiers that areperformatively stable, i.e. optimal for the data distribution they induce.Standard convergence results for finding a performatively stable classifierwith the method of repeated risk minimization assume that the data distributionis Lipschitz continuous to the model's parameters. Under this assumption, theloss must be strongly convex and smooth in these parameters; otherwise, themethod will diverge for some problems. In this work, we instead assume that thedata distribution is Lipschitz continuous with respect to the model'spredictions, a more natural assumption for performative systems. As a result,we are able to significantly relax the assumptions on the loss function. Inparticular, we do not need to assume convexity with respect to the model'sparameters. As an illustration, we introduce a resampling procedure that modelsrealistic distribution shifts and show that it satisfies our assumptions. Wesupport our theory by showing that one can learn performatively stableclassifiers with neural networks making predictions about real data that shiftaccording to our proposed procedure.", "output": "Performative Prediction with Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning algorithms are increasingly used to make or supportdecisions in a wide range of settings. With such expansive use there is alsogrowing concern about the fairness of such methods. Prior literature onalgorithmic fairness has extensively addressed risks and in many casespresented approaches to manage some of them. However, most studies have focusedon fairness issues that arise from actions taken by a (single) focaldecision-maker or agent. In contrast, most real-world systems have many agentsthat work collectively as part of a larger ecosystem. For example, in a lendingscenario, there are multiple lenders who evaluate loans for applicants, alongwith policymakers and other institutions whose decisions also affect outcomes.Thus, the broader impact of any lending decision of a single decision makerwill likely depend on the actions of multiple different agents in theecosystem. This paper develops formalisms for firm versus systemic fairness,and calls for a greater focus in the algorithmic fairness literature onecosystem-wide fairness - or more simply systemic fairness - in real-worldcontexts.", "output": "Systemic Fairness."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In most image retrieval systems, images include various high-level semantics,called tags or annotations. Virtually all the state-of-the-art image annotationmethods that handle imbalanced labeling are search-based techniques which aretime-consuming. In this paper, a novel coupled dictionary learning approach isproposed to learn a limited number of visual prototypes and their correspondingsemantics simultaneously. This approach leads to a real-time image annotationprocedure. Another contribution of this paper is that utilizes a marginalizedloss function instead of the squared loss function that is inappropriate forimage annotation with imbalanced labels. We have employed a marginalized lossfunction in our method to leverage a simple and effective method of prototypeupdating. Meanwhile, we have introduced ${ell}_1$ regularization on semanticprototypes to preserve the sparse and imbalanced nature of labels in learnedsemantic prototypes. Finally, comprehensive experimental results on variousdatasets demonstrate the efficiency of the proposed method for image annotationtasks in terms of accuracy and time. The reference implementation is publiclyavailable on ", "output": "Toward Real-Time Image Annotation Using Marginalized Coupled Dictionary Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks (DNNs) have been shown to be vulnerable to adversarialexamples. Moreover, the transferability of the adversarial examples hasreceived broad attention in recent years, which means that adversarial examplescrafted by a surrogate model can also attack unknown models. This phenomenongave birth to the transfer-based adversarial attacks, which aim to improve thetransferability of the generated adversarial examples. In this paper, wepropose to improve the transferability of adversarial examples in thetransfer-based attack via masking unimportant parameters (MUP). The key idea inMUP is to refine the pretrained surrogate models to boost the transfer-basedattack. Based on this idea, a Taylor expansion-based metric is used to evaluatethe parameter importance score and the unimportant parameters are masked duringthe generation of adversarial examples. This process is simple, yet can benaturally combined with various existing gradient-based optimizers forgenerating adversarial examples, thus further improving the transferability ofthe generated adversarial examples. Extensive experiments are conducted tovalidate the effectiveness of the proposed MUP-based methods.", "output": "Generating Adversarial Examples with Better Transferability via Masking Unimportant Parameters of Surrogate Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While having achieved great success in rich real-life applications, deepneural network (DNN) models have long been criticized for their vulnerabilityto adversarial attacks. Tremendous research efforts have been dedicated tomitigating the threats of adversarial attacks, but the essential trait ofadversarial examples is not yet clear, and most existing methods are yetvulnerable to hybrid attacks and suffer from counterattacks. In light of this,in this paper, we first reveal a gradient-based correlation between sensitivityanalysis-based DNN interpreters and the generation process of adversarialexamples, which indicates the Achilles's heel of adversarial attacks and shedslight on linking together the two long-standing challenges of DNN: fragilityand unexplainability. We then propose an interpreter-based ensemble frameworkcalled X-Ensemble for robust adversary defense. X-Ensemble adopts a noveldetection-rectification process and features in building multiple sub-detectorsand a rectifier upon various types of interpretation information toward targetclassifiers. Moreover, X-Ensemble employs the Random Forests (RF) model tocombine sub-detectors into an ensemble detector for adversarial hybrid attacksdefense. The non-differentiable property of RF further makes it a preciouschoice against the counterattack of adversaries. Extensive experiments undervarious types of state-of-the-art attacks and diverse attack scenariosdemonstrate the advantages of X-Ensemble to competitive baseline methods.", "output": "Interpretability is a Kind of Safety: An Interpreter-based Ensemble for Adversary Defense."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) has been introduced to the healthcare domain as adecentralized learning paradigm that allows multiple parties to train a modelcollaboratively without privacy leakage. However, most previous studies haveassumed that every client holds an identical label set. In reality, medicalspecialists tend to annotate only diseases within their knowledge domain orinterest. This implies that label sets in each client can be different and evendisjoint. In this paper, we propose the framework FedLSM to solve the problemLabel Set Mismatch. FedLSM adopts different training strategies on data withdifferent uncertainty levels to efficiently utilize unlabeled or partiallylabeled data as well as class-wise adaptive aggregation in the classificationlayer to avoid inaccurate aggregation when clients have missing labels. Weevaluate FedLSM on two public real-world medical image datasets, includingchest x-ray (CXR) diagnosis with 112,120 CXR images and skin lesion diagnosiswith 10,015 dermoscopy images, and show that it significantly outperforms otherstate-of-the-art FL algorithms. Code will be made available upon acceptance.", "output": "Scale Federated Learning for Label Set Mismatch in Medical Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The abstract outlines the problem of toxic comments on social mediaplatforms, where individuals use disrespectful, abusive, and unreasonablelanguage that can drive users away from discussions. This behavior is referredto as anti-social behavior, which occurs during online debates, comments, andfights. The comments containing explicit language can be classified intovarious categories, such as toxic, severe toxic, obscene, threat, insult, andidentity hate. This behavior leads to online harassment and cyberbullying,which forces individuals to stop expressing their opinions and ideas. Toprotect users from offensive language, companies have started flagging commentsand blocking users. The abstract proposes to create a classifier using anLstm-cnn model that can differentiate between toxic and non-toxic comments withhigh accuracy. The classifier can help organizations examine the toxicity ofthe comment section better.", "output": "Classification of social media Toxic comments using Machine learning models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sparse training is emerging as a promising avenue for reducing thecomputational cost of training neural networks. Several recent studies haveproposed pruning methods using learnable thresholds to efficiently explore thenon-uniform distribution of sparsity inherent within the models. In this paper,we propose Gradient Annealing (GA), where gradients of masked weights arescaled down in a non-linear manner. GA provides an elegant trade-off betweensparsity and accuracy without the need for additional sparsity-inducingregularization. We integrated GA with the latest learnable pruning methods tocreate an automated sparse training algorithm called AutoSparse, which achievesbetter accuracy and/or training/inference FLOPS reduction than existinglearnable pruning methods for sparse ResNet50 and MobileNetV1 on ImageNet-1K:AutoSparse achieves (2x, 7x) reduction in (training,inference) FLOPS forResNet50 on ImageNet at 80% sparsity. Finally, AutoSparse outperformssparse-to-sparse SotA method MEST (uniform sparsity) for 80% sparse ResNet50with similar accuracy, where MEST uses 12% more training FLOPS and 50% moreinference FLOPS.", "output": "AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In cross-device Federated Learning (FL) environments, scaling synchronous FLmethods is challenging as stragglers hinder the training process. Moreover, theavailability of each client to join the training is highly variable over timedue to system heterogeneities and intermittent connectivity. Recentasynchronous FL methods (e.g., FedBuff) have been proposed to overcome theseissues by allowing slower users to continue their work on local training basedon stale models and to contribute to aggregation when ready. However, we showempirically that this method can lead to a substantial drop in trainingaccuracy as well as a slower convergence rate. The primary reason is thatfast-speed devices contribute to many more rounds of aggregation while othersjoin more intermittently or not at all, and with stale model updates. Toovercome this barrier, we propose TimelyFL, a heterogeneity-aware asynchronousFL framework with adaptive partial training. During the training, TimelyFLadjusts the local training workload based on the real-time resourcecapabilities of each client, aiming to allow more available clients to join inthe global update without staleness. We demonstrate the performance benefits ofTimelyFL by conducting extensive experiments on various datasets (e.g.,CIFAR-10, Google Speech, and Reddit) and models (e.g., ResNet20, VGG11, andALBERT). In comparison with the state-of-the-art (i.e., FedBuff), ourevaluations reveal that TimelyFL improves participation rate by 21.13%,harvests 1.28x - 2.89x more efficiency on convergence rate, and provides a6.25% increment on test accuracy.", "output": "TimelyFL: Heterogeneity-aware Asynchronous Federated Learning with Adaptive Partial Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Hypertension is a medical condition characterized by high blood pressure, andclassifying it into its various stages is crucial to managing the disease. Inthis project, a novel method is proposed for classifying stages of hypertensionusing Photoplethysmography (PPG) signals and deep learning models, namelyAvgPool_VGG-16. The PPG signal is a non-invasive method of measuring bloodpressure through the use of light sensors that measure the changes in bloodvolume in the microvasculature of tissues. PPG images from the publiclyavailable blood pressure classification dataset were used to train the model.Multiclass classification for various PPG stages were done. The results showthe proposed method achieves high accuracy in classifying hypertension stages,demonstrating the potential of PPG signals and deep learning models inhypertension diagnosis and management.", "output": "PPG Signals for Hypertension Diagnosis: A Novel Method using Deep Learning Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Understanding the COVID-19 vaccine hesitancy, such as who and why, is verycrucial since a large-scale vaccine adoption remains as one of the mostefficient methods of controlling the pandemic. Such an understanding alsoprovides insights into designing successful vaccination campaigns for futurepandemics. Unfortunately, there are many factors involving in deciding whetherto take the vaccine, especially from the cultural point of view. To obtainthese goals, we design a novel culture-aware machine learning (ML) model, basedon our new data collection, for predicting vaccination willingness. We furtheranalyze the most important features which contribute to the ML model'spredictions using advanced AI explainers such as the Probabilistic GraphicalModel (PGM) and Shapley Additive Explanations (SHAP). These analyses reveal thekey factors that most likely impact the vaccine adoption decisions. Ourfindings show that Hispanic and African American are most likely impacted bycultural characteristics such as religions and ethnic affiliation, whereas thevaccine trust and approval influence the Asian communities the most. Ourresults also show that cultural characteristics, rumors, and politicalaffiliation are associated with increased vaccine rejection.", "output": "Cultural-aware Machine Learning based Analysis of COVID-19 Vaccine Hesitancy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Soft-thresholding has been widely used in neural networks. Its basic networkstructure is a two-layer convolution neural network with soft-thresholding. Dueto the network's nature of nonlinearity and nonconvexity, the training processheavily depends on an appropriate initialization of network parameters,resulting in the difficulty of obtaining a globally optimal solution. Toaddress this issue, a convex dual network is designed here. We theoreticallyanalyze the network convexity and numerically confirm that the strong dualityholds. This conclusion is further verified in the linear fitting and denoisingexperiments. This work provides a new way to convexify soft-thresholding neuralnetworks.", "output": "Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks with Soft-Thresholding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Depth Estimation has wide reaching applications in the field of Computervision such as target tracking, augmented reality, and self-driving cars. Thegoal of Monocular Depth Estimation is to predict the depth map, given a 2Dmonocular RGB image as input. The traditional depth estimation methods arebased on depth cues and used concepts like epipolar geometry. With theevolution of Convolutional Neural Networks, depth estimation has undergonetremendous strides. In this project, our aim is to explore possible extensionsto existing SoTA Deep Learning based Depth Estimation Models and to see whetherperformance metrics could be further improved. In a broader sense, we arelooking at the possibility of implementing Pose Estimation, Efficient Sub-PixelConvolution Interpolation, Semantic Segmentation Estimation techniques tofurther enhance our proposed architecture and to provide fine-grained and moreglobally coherent depth map predictions. We also plan to do away with cameraintrinsic parameters during training and apply weather augmentations to furthergeneralize our model.", "output": "Self-Supervised Learning based Depth Estimation from Monocular Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning new classes without forgetting is crucial for real-worldapplications for a classification model. Vision Transformers (ViT) recentlyachieve remarkable performance in Class Incremental Learning (CIL). Previousworks mainly focus on block design and model expansion for ViTs. However, inthis paper, we find that when the ViT is incrementally trained, the attentionlayers gradually lose concentration on local features. We call this interestingphenomenon as emph{Locality Degradation} in ViTs for CIL. Since the low-levellocal information is crucial to the transferability of the representation, itis beneficial to preserve the locality in attention layers. In this paper, weencourage the model to preserve more local information as the trainingprocedure goes on and devise a Locality-Preserved Attention (LPA) layer toemphasize the importance of local features. Specifically, we incorporate thelocal information directly into the vanilla attention and control the initialgradients of the vanilla attention by weighting it with a small initial value.Extensive experiments show that the representations facilitated by LPA capturemore low-level general information which is easier to transfer to follow-uptasks. The improved model gets consistently better performance on CIFAR100 andImageNet100.", "output": "Preserving Locality in Vision Transformers for Class Incremental Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data-driven prediction of fluid flow and temperature distribution in marineand aerospace engineering has received extensive research and demonstrated itspotential in real-time prediction recently. However, usually large amounts ofhigh-fidelity data are required to describe and accurately predict the complexphysical information, while in reality, only limited high-fidelity data isavailable due to the high experiment/computational cost. Therefore, this workproposes a novel multi-fidelity learning method based on the Fourier NeuralOperator by jointing abundant low-fidelity data and limited high-fidelity dataunder transfer learning paradigm. First, as a resolution-invariant operator,the Fourier Neural Operator is first and gainfully applied to integratemulti-fidelity data directly, which can utilize the scarce high-fidelity dataand abundant low-fidelity data simultaneously. Then, the transfer learningframework is developed for the current task by extracting the rich low-fidelitydata knowledge to assist high-fidelity modeling training, to further improvedata-driven prediction accuracy. Finally, three typical fluid and temperatureprediction problems are chosen to validate the accuracy of the proposedmulti-fidelity model. The results demonstrate that our proposed method has higheffectiveness when compared with other high-fidelity models, and has the highmodeling accuracy of 99% for all the selected physical field problems.Significantly, the proposed multi-fidelity learning method has the potential ofa simple structure with high precision, which can provide a reference for theconstruction of the subsequent model.", "output": "Multi-fidelity prediction of fluid flow and temperature field based on transfer learning using Fourier Neural Operator."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, communication engineers put strong emphasis on artificialneural network (ANN)-based algorithms with the aim of increasing theflexibility and autonomy of the system and its components. In this context,unsupervised training is of special interest as it enables adaptation withoutthe overhead of transmitting pilot symbols. In this work, we present a novelANN-based, unsupervised equalizer and its trainable field programmable gatearray (FPGA) implementation. We demonstrate that our custom loss functionallows the ANN to adapt for varying channel conditions, approaching theperformance of a supervised baseline. Furthermore, as a first step towards apractical communication system, we design an efficient FPGA implementation ofour proposed algorithm, which achieves a throughput in the order of Gbit/s,outperforming a high-performance GPU by a large margin.", "output": "Unsupervised ANN-Based Equalizer and Its Trainable FPGA Implementation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph Convolutional Network (GCN) with the powerful capacity to exploregraph-structural data has gained noticeable success in recent years.Nonetheless, most of the existing GCN-based models suffer from the notoriousover-smoothing issue, owing to which shallow networks are extensively adopted.This may be problematic for complex graph datasets because a deeper GCN shouldbe beneficial to propagating information across remote neighbors. Recent workshave devoted effort to addressing over-smoothing problems, includingestablishing residual connection structure or fusing predictions frommulti-layer models. Because of the indistinguishable embeddings from deeplayers, it is reasonable to generate more reliable predictions beforeconducting the combination of outputs from various layers. In light of this, wepropose an Alternating Graph-regularized Neural Network (AGNN) composed ofGraph Convolutional Layer (GCL) and Graph Embedding Layer (GEL). GEL is derivedfrom the graph-regularized optimization containing Laplacian embedding term,which can alleviate the over-smoothing problem by periodic projection from thelow-order feature space onto the high-order space. With more distinguishablefeatures of distinct layers, an improved Adaboost strategy is utilized toaggregate outputs from each layer, which explores integrated embeddings ofmulti-hop neighbors. The proposed model is evaluated via a large number ofexperiments including performance comparison with some multi-layer ormulti-order graph neural networks, which reveals the superior performanceimprovement of AGNN compared with state-of-the-art models.", "output": "AGNN: Alternating Graph-Regularized Neural Networks to Alleviate Over-Smoothing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning, and representation learning in particular, has thepotential to facilitate drug discovery by screening billions of compounds. Forexample, a successful approach is representing the molecules as a graph andutilizing graph neural networks (GNN). Yet, these approaches still requireexperimental measurements of thousands of compounds to construct a propertraining set. While in some domains it is easier to acquire experimental data,in others it might be more limited. For example, it is easier to test thecompounds on bacteria than perform in-vivo experiments. Thus, a key question ishow to utilize information from a large available dataset together with a smallsubset of compounds where both domains are measured to predict compounds'effect on the second, experimentally less available domain. Current transferlearning approaches for drug discovery, including training of pre-trainedmodules or meta-learning, have limited success. In this work, we develop anovel method, named Symbiotic Message Passing Neural Network (SMPNN), formerging graph-neural-network models from different domains. Using routing newmessage passing lanes between them, our approach resolves some of the potentialconflicts between the different domains, and implicit constraints induced bythe larger datasets. By collecting public data and performing additionalhigh-throughput experiments, we demonstrate the advantage of our approach bypredicting anti-fungal activity from anti-bacterial activity. We compare ourmethod to the standard transfer learning approach and show that SMPNN providedbetter and less variable performances. Our approach is general and can be usedto facilitate information transfer between any two domains such as differentorganisms, different organelles, or different environments.", "output": "Symbiotic Message Passing Model for Transfer Learning between Anti-Fungal and Anti-Bacterial Domains."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Efficient deep learning-based approaches have achieved remarkable performancein single image super-resolution. However, recent studies on efficientsuper-resolution have mainly focused on reducing the number of parameters andfloating-point operations through various network designs. Although thesemethods can decrease the number of parameters and floating-point operations,they may not necessarily reduce actual running time. To address this issue, wepropose a novel multi-stage lightweight network boosting method, which canenable lightweight networks to achieve outstanding performance. Specifically,we leverage enhanced high-resolution output as additional supervision toimprove the learning ability of lightweight student networks. Upon convergenceof the student network, we further simplify our network structure to a morelightweight level using reparameterization techniques and iterative networkpruning. Meanwhile, we adopt an effective lightweight network training strategythat combines multi-anchor distillation and progressive learning, enabling thelightweight network to achieve outstanding performance. Ultimately, ourproposed method achieves the fastest inference time among all participants inthe NTIRE 2023 efficient super-resolution challenge while maintainingcompetitive super-resolution performance. Additionally, extensive experimentsare conducted to demonstrate the effectiveness of the proposed components. Theresults show that our approach achieves comparable performance inrepresentative dataset DIV2K, both qualitatively and quantitatively, withfaster inference and fewer number of network parameters.", "output": "DIPNet: Efficiency Distillation and Iterative Pruning for Image Super-Resolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Irregularly measured time series are common in many of the applied settingsin which time series modelling is a key statistical tool, including medicine.This provides challenges in model choice, often necessitating imputation orsimilar strategies. Continuous time autoregressive recurrent neural networks(CTRNNs) are a deep learning model that account for irregular observationsthrough incorporating continuous evolution of the hidden states betweenobservations. This is achieved using a neural ordinary differential equation(ODE) or neural flow layer. In this manuscript, we give an overview of thesemodels, including the varying architectures that have been proposed to accountfor issues such as ongoing medical interventions. Further, we demonstrate theapplication of these models to probabilistic forecasting of blood glucose in acritical care setting using electronic medical record and simulated data. Theexperiments confirm that addition of a neural ODE or neural flow layergenerally improves the performance of autoregressive recurrent neural networksin the irregular measurement setting. However, several CTRNN architecture areoutperformed by an autoregressive gradient boosted tree model (Catboost), withonly a long short-term memory (LSTM) and neural ODE based architecture(ODE-LSTM) achieving comparable performance on probabilistic forecastingmetrics such as the continuous ranked probability score (ODE-LSTM:0.118$pm$0.001; Catboost: 0.118$pm$0.001), ignorance score (0.152$pm$0.008;0.149$pm$0.002) and interval score (175$pm$1; 176$pm$1).", "output": "Continuous time recurrent neural networks: overview and application to forecasting blood glucose in the intensive care unit."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Long-term stability is a critical property for deep learning-baseddata-driven digital twins of the Earth system. Such data-driven digital twinsenable sub-seasonal and seasonal predictions of extreme environmental events,probabilistic forecasts, that require a large number of ensemble members, andcomputationally tractable high-resolution Earth system models where expensivecomponents of the models can be replaced with cheaper data-driven surrogates.Owing to computational cost, physics-based digital twins, though long-termstable, are intractable for real-time decision-making. Data-driven digitaltwins offer a cheaper alternative to them and can provide real-timepredictions. However, such digital twins can only provide short-term forecastsaccurately since they become unstable when time-integrated beyond 20 days.Currently, the cause of the instabilities is unknown, and the methods that areused to improve their stability horizons are ad-hoc and lack rigorous theory.In this paper, we reveal that the universal causal mechanism for theseinstabilities in any turbulent flow is due to textit{spectral bias} wherein,textit{any} deep learning architecture is biased to learn only the large-scaledynamics and ignores the small scales completely. We further elucidate howturbulence physics and the absence of convergence in deep learning-basedtime-integrators amplify this bias leading to unstable error propagation.Finally, using the quasigeostrophic flow and ECMWF Reanalysis data as testcases, we bridge the gap between deep learning theory and fundamental numericalanalysis to propose one mitigative solution to such instabilities. We developlong-term stable data-driven digital twins for the climate system anddemonstrate accurate short-term forecasts, and hundreds of years of long-termstable time-integration with accurate mean and variability.", "output": "Long-term instabilities of deep learning-based digital twins of the climate system: The cause and a solution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The exploitation of visible spectrum datasets has led deep networks to showremarkable success. However, real-world tasks include low-lighting conditionswhich arise performance bottlenecks for models trained on large-scale RGB imagedatasets. Thermal IR cameras are more robust against such conditions.Therefore, the usage of thermal imagery in real-world applications can beuseful. Unsupervised domain adaptation (UDA) allows transferring informationfrom a source domain to a fully unlabeled target domain. Despite substantialimprovements in UDA, the performance gap between UDA and its supervisedlearning counterpart remains significant. By picking a small number of targetsamples to annotate and using them in training, active domain adaptation triesto mitigate this gap with minimum annotation expense. We propose an activedomain adaptation method in order to examine the efficiency of combining thevisible spectrum and thermal imagery modalities. When the domain gap isconsiderably large as in the visible-to-thermal task, we may conclude that themethods without explicit domain alignment cannot achieve their full potential.To this end, we propose a spectral transfer guided active domain adaptationmethod to select the most informative unlabeled target samples while aligningsource and target domains. We used the large-scale visible spectrum datasetMS-COCO as the source domain and the thermal dataset FLIR ADAS as the targetdomain to present the results of our method. Extensive experimental evaluationdemonstrates that our proposed method outperforms the state-of-the-art activedomain adaptation methods. The code and models are publicly available.", "output": "Spectral Transfer Guided Active Domain Adaptation For Thermal Imagery."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Ultrasound is the primary modality to examine fetal growth during pregnancy,while the image quality could be affected by various factors. Qualityassessment is essential for controlling the quality of ultrasound images toguarantee both the perceptual and diagnostic values. Existing automatedapproaches often require heavy structural annotations and the predictions maynot necessarily be consistent with the assessment results by human experts.Furthermore, the overall quality of a scan and the correlation between thequality of frames should not be overlooked. In this work, we propose areinforcement learning framework powered by two hierarchical agents thatcollaboratively learn to perform both frame-level and video-level qualityassessments. It is equipped with a specially-designed reward mechanism thatconsiders temporal dependency among frame quality and only requires sparsebinary annotations to train. Experimental results on a challenging fetal braindataset verify that the proposed framework could perform dual-level qualityassessment and its predictions correlate well with the subjective assessmentresults.", "output": "Hierarchical Agent-based Reinforcement Learning Framework for Automated Quality Assessment of Fetal Ultrasound Video."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "PAC-Bayes learning is an established framework to assess the generalisationability of learning algorithm during the training phase. However, it remainschallenging to know whether PAC-Bayes is useful to understand, before training,why the output of well-known algorithms generalise well. We positively answerthis question by expanding the emph{Wasserstein PAC-Bayes} framework, brieflyintroduced in cite{amit2022ipm}. We provide new generalisation boundsexploiting geometric assumptions on the loss function. Using our framework, weprove, before any training, that the output of an algorithm fromcitet{lambert2022variational} has a strong asymptotic generalisation ability.More precisely, we show that it is possible to incorporate optimisation resultswithin a generalisation framework, building a bridge between PAC-Bayes andoptimisation algorithms.", "output": "Wasserstein PAC-Bayes Learning: A Bridge Between Generalisation and Optimisation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work proposes to reduce visibility data volume using abaseline-dependent lossy compression technique that preserves smearing at theedges of the field-of-view. We exploit the relation of the rank of a matrix andthe fact that a low-rank approximation can describe the raw visibility data asa sum of basic components where each basic component corresponds to a specificFourier component of the sky distribution. As such, the entire visibility datais represented as a collection of data matrices from baselines, instead of asingle tensor. The proposed methods are formulated as follows: provided a largedataset of the entire visibility data; the first algorithm, named $simple~SVD$projects the data into a regular sampling space of rank$-r$ data matrices. Inthis space, the data for all the baselines has the same rank, which makes thecompression factor equal across all baselines. The second algorithm, named$BDSVD$ projects the data into an irregular sampling space of rank$-r_{pq}$data matrices. The subscript $pq$ indicates that the rank of the data matrixvaries across baselines $pq$, which makes the compression factorbaseline-dependent. MeerKAT and the European Very Long Baseline InterferometryNetwork are used as reference telescopes to evaluate and compare theperformance of the proposed methods against traditional methods, such astraditional averaging and baseline-dependent averaging (BDA). For the samespatial resolution threshold, both $simple~SVD$ and $BDSVD$ show effectivecompression by two-orders of magnitude higher than traditional averaging andBDA. At the same space-saving rate, there is no decrease in spatial resolutionand there is a reduction in the noise variance in the data which improves theS/N to over $1.5$ dB at the edges of the field-of-view.", "output": "Lossy Compression of Large-Scale Radio Interferometric Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent years have witnessed an exponential increase in the demand for facevideo compression, and the success of artificial intelligence has expanded theboundaries beyond traditional hybrid video coding. Generative coding approacheshave been identified as promising alternatives with reasonable perceptualrate-distortion trade-offs, leveraging the statistical priors of face videos.However, the great diversity of distortion types in spatial and temporaldomains, ranging from the traditional hybrid coding frameworks to generativemodels, present grand challenges in compressed face video quality assessment(VQA). In this paper, we introduce the large-scale Compressed Face VideoQuality Assessment (CFVQA) database, which is the first attempt tosystematically understand the perceptual quality and diversified compressiondistortions in face videos. The database contains 3,240 compressed face videoclips in multiple compression levels, which are derived from 135 source videoswith diversified content using six representative video codecs, including twotraditional methods based on hybrid coding frameworks, two end-to-end methods,and two generative methods. In addition, a FAce VideO IntegeRity (FAVOR) indexfor face video compression was developed to measure the perceptual quality,considering the distinct content characteristics and temporal priors of theface videos. Experimental results exhibit its superior performance on theproposed CFVQA dataset. The benchmark is now made publicly available at:", "output": "Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reasoning on knowledge graphs is a challenging task because it utilizesobserved information to predict the missing one. Specifically, answeringfirst-order logic formulas is of particular interest because of its clearsyntax and semantics. Recently, the query embedding method has been proposedwhich learns the embedding of a set of entities and treats logic operations asset operations. Though there has been much research following the samemethodology, it lacks a systematic inspection from the standpoint of logic. Inthis paper, we characterize the scope of queries investigated previously andprecisely identify the gap between it and the whole family of existentialformulas. Moreover, we develop a new dataset containing ten new formulas anddiscuss the new challenges coming simultaneously. Finally, we propose a newsearch algorithm from fuzzy logic theory which is capable of solving newformulas and outperforming the previous methods in existing formulas.", "output": "On Existential First Order Queries Inference on Knowledge Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The highly structured energy landscape of the loss as a function ofparameters for deep neural networks makes it necessary to use sophisticatedoptimization strategies in order to discover (local) minima that guaranteereasonable performance. Overcoming less suitable local minima is an importantprerequisite and often momentum methods are employed to achieve this. As inother non local optimization procedures, this however creates the necessity tobalance between exploration and exploitation. In this work, we suggest an eventbased control mechanism for switching from exploration to exploitation based onreaching a predefined reduction of the loss function. As we give the momentummethod a port Hamiltonian interpretation, we apply the 'heavy ball withfriction' interpretation and trigger breaking (or friction) when achievingcertain goals. We benchmark our method against standard stochastic gradientdescent and provide experimental evidence for improved performance of deepneural networks when our strategy is applied.", "output": "Who breaks early, looses: goal oriented training of deep neural networks based on port Hamiltonian dynamics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The transportation sector accounts for about 25% of global greenhouse gasemissions. Therefore, an improvement of energy efficiency in the traffic sectoris crucial to reducing the carbon footprint. Efficiency is typically measuredin terms of energy use per traveled distance, e.g. liters of fuel perkilometer. Leading factors that impact the energy efficiency are the type ofvehicle, environment, driver behavior, and weather conditions. These varyingfactors introduce uncertainty in estimating the vehicles' energy efficiency. Wepropose in this paper an ensemble learning approach based on deep neuralnetworks (ENN) that is designed to reduce the predictive uncertainty and tooutput measures of such uncertainty. We evaluated it using the publiclyavailable Vehicle Energy Dataset (VED) and compared it with several baselinesper vehicle and energy type. The results showed a high predictive performanceand they allowed to output a measure of predictive uncertainty.", "output": "Uncertainty-Aware Vehicle Energy Efficiency Prediction using an Ensemble of Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion probabilistic models have been successful in generatinghigh-quality and diverse images. However, traditional models, whose input andoutput are high-resolution images, suffer from excessive memory requirements,making them less practical for edge devices. Previous approaches for generativeadversarial networks proposed a patch-based method that uses positionalencoding and global content information. Nevertheless, designing a patch-basedapproach for diffusion probabilistic models is non-trivial. In this paper, weresent a diffusion probabilistic model that generates images on apatch-by-patch basis. We propose two conditioning methods for a patch-basedgeneration. First, we propose position-wise conditioning using one-hotrepresentation to ensure patches are in proper positions. Second, we proposeGlobal Content Conditioning (GCC) to ensure patches have coherent content whenconcatenated together. We evaluate our model qualitatively and quantitativelyon CelebA and LSUN bedroom datasets and demonstrate a moderate trade-offbetween maximum memory consumption and generated image quality. Specifically,when an entire image is divided into 2 x 2 patches, our proposed approach canreduce the maximum memory consumption by half while maintaining comparableimage quality.", "output": "Memory Efficient Diffusion Probabilistic Models via Patch-based Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce Delta Denoising Score (DDS), a novel scoring function fortext-based image editing that guides minimal modifications of an input imagetowards the content described in a target prompt. DDS leverages the richgenerative prior of text-to-image diffusion models and can be used as a lossterm in an optimization problem to steer an image towards a desired directiondictated by a text. DDS utilizes the Score Distillation Sampling (SDS)mechanism for the purpose of image editing. We show that using only SDS oftenproduces non-detailed and blurry outputs due to noisy gradients. To addressthis issue, DDS uses a prompt that matches the input image to identify andremove undesired erroneous directions of SDS. Our key premise is that SDSshould be zero when calculated on pairs of matched prompts and images, meaningthat if the score is non-zero, its gradients can be attributed to the erroneouscomponent of SDS. Our analysis demonstrates the competence of DDS for textbased image-to-image translation. We further show that DDS can be used to trainan effective zero-shot image translation model. Experimental results indicatethat DDS outperforms existing methods in terms of stability and quality,highlighting its potential for real-world applications in text-based imageediting.", "output": "Delta Denoising Score."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Alzheimer's Disease (AD), which is the most common cause of dementia, is aprogressive disease preceded by Mild Cognitive Impairment (MCI). Earlydetection of the disease is crucial for making treatment decisions. However,most of the literature on computer-assisted detection of AD focuses onclassifying brain images into one of three major categories: healthy, MCI, andAD; or categorising MCI patients into one of (1) progressive: those whoprogress from MCI to AD at a future examination time during a given studyperiod, and (2) stable: those who stay as MCI and never progress to AD. Thismisses the opportunity to accurately identify the trajectory of progressive MCIpatients. In this paper, we revisit the brain image classification task for ADidentification and re-frame it as an ordinal classification task to predict howclose a patient is to the severe AD stage. To this end, we select progressiveMCI patients from the Alzheimer's Disease Neuroimaging Initiative (ADNI)dataset and construct an ordinal dataset with a prediction target thatindicates the time to progression to AD. We train a siamese network model topredict the time to onset of AD based on MRI brain images. We also propose aweighted variety of siamese networks and compare its performance to a baselinemodel. Our evaluations show that incorporating a weighting factor to siamesenetworks brings considerable performance gain at predicting how close inputbrain MRI images are to progressing to AD.", "output": "Weighted Siamese Network to Predict the Time to Onset of Alzheimer's Disease from MRI Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper summarizes our contributions to the document-grounded dialog tasksat the 9th and 10th Dialog System Technology Challenges (DSTC9 and DSTC10). Inboth iterations the task consists of three subtasks: first detect whether thecurrent turn is knowledge seeking, second select a relevant knowledge document,and third generate a response grounded on the selected document. For DSTC9 weproposed different approaches to make the selection task more efficient. Thebest method, Hierarchical Selection, actually improves the results compared tothe original baseline and gives a speedup of 24x. In the DSTC10 iteration ofthe task, the challenge was to adapt systems trained on written dialogs toperform well on noisy automatic speech recognition transcripts. Therefore, weproposed data augmentation techniques to increase the robustness of the modelsas well as methods to adapt the style of generated responses to fit well intothe proceeding dialog. Additionally, we proposed a noisy channel model thatallows for increasing the factuality of the generated responses. In addition tosummarizing our previous contributions, in this work, we also report on a fewsmall improvements and reconsider the automatic evaluation metrics for thegeneration task which have shown a low correlation to human judgments.", "output": "Task-oriented Document-Grounded Dialog Systems by HLTPR@RWTH for DSTC9 and DSTC10."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Explainability in yield prediction helps us fully explore the potential ofmachine learning models that are already able to achieve high accuracy for avariety of yield prediction scenarios. The data included for the prediction ofyields are intricate and the models are often difficult to understand. However,understanding the models can be simplified by using natural groupings of theinput features. Grouping can be achieved, for example, by the time the featuresare captured or by the sensor used to do so. The state-of-the-art forinterpreting machine learning models is currently defined by the game-theoreticapproach of Shapley values. To handle groups of features, the calculatedShapley values are typically added together, ignoring the theoreticallimitations of this approach. We explain the concept of Shapley values directlycomputed for predefined groups of features and introduce an algorithm tocompute them efficiently on tree structures. We provide a blueprint fordesigning swarm plots that combine many local explanations for globalunderstanding. Extensive evaluation of two different yield prediction problemsshows the worth of our approach and demonstrates how we can enable a betterunderstanding of yield prediction models in the future, ultimately leading tomutual enrichment of research and application.", "output": "Grouping Shapley Value Feature Importances of Random Forests for explainable Yield Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "By formulating data samples' formation as a Markov denoising process,diffusion models achieve state-of-the-art performances in a collection oftasks. Recently, many variants of diffusion models have been proposed to enablecontrolled sample generation. Most of these existing methods either formulatethe controlling information as an input (i.e.,: conditional representation) forthe noise approximator, or introduce a pre-trained classifier in the test-phaseto guide the Langevin dynamic towards the conditional goal. However, the formerline of methods only work when the controlling information can be formulated asconditional representations, while the latter requires the pre-trained guidanceclassifier to be differentiable. In this paper, we propose a novel frameworknamed RGDM (Reward-Guided Diffusion Model) that guides the training-phase ofdiffusion models via reinforcement learning (RL). The proposed trainingframework bridges the objective of weighted log-likelihood and maximum entropyRL, which enables calculating policy gradients via samples from a pay-offdistribution proportional to exponential scaled rewards, rather than frompolicies themselves. Such a framework alleviates the high gradient variancesand enables diffusion models to explore for highly rewarded samples in thereverse process. Experiments on 3D shape and molecule generation tasks showsignificant improvements over existing conditional diffusion models.", "output": "Towards Controllable Diffusion Models via Reward-Guided Exploration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current machine learning models produce outstanding results in many areasbut, at the same time, suffer from shortcut learning and spurious correlations.To address such flaws, the explanatory interactive machine learning (XIL)framework has been proposed to revise a model by employing user feedback on amodel's explanation. This work sheds light on the explanations used within thisframework. In particular, we investigate simultaneous model revision throughmultiple explanation methods. To this end, we identified that textit{oneexplanation does not fit XIL} and propose considering multiple ones whenrevising models via XIL.", "output": "One Explanation Does Not Fit XIL."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Point clouds are widely regarded as one of the best dataset types for urbanmapping purposes. Hence, point cloud datasets are commonly investigated asbenchmark types for various urban interpretation methods. Yet, few researchershave addressed the use of point cloud benchmarks for fac{c}ade segmentation.Robust fac{c}ade segmentation is becoming a key factor in various applicationsranging from simulating autonomous driving functions to preserving culturalheritage. In this work, we present a method of enriching existing point clouddatasets with fac{c}ade-related classes that have been designed to facilitatefac{c}ade segmentation testing. We propose how to efficiently extend existingdatasets and comprehensively assess their potential for fac{c}adesegmentation. We use the method to create the TUM-FAc{C}ADE dataset, whichextends the capabilities of TUM-MLS-2016. Not only can TUM-FAc{C}ADEfacilitate the development of point-cloud-based fac{c}ade segmentation tasks,but our procedure can also be applied to enrich further datasets.", "output": "TUM-FA\\c{C}ADE: Reviewing and enriching point cloud benchmarks for fa\\c{c}ade segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Speech separation remains an important area of multi-speaker signalprocessing. Deep neural network (DNN) models have attained the best performanceon many speech separation benchmarks. Some of these models can take significanttime to train and have high memory requirements. Previous work has proposedshortening training examples to address these issues but the impact of this onmodel performance is not yet well understood. In this work, the impact ofapplying these training signal length (TSL) limits is analysed for two speechseparation models: SepFormer, a transformer model, and Conv-TasNet, aconvolutional model. The WJS0-2Mix, WHAMR and Libri2Mix datasets are analysedin terms of signal length distribution and its impact on training efficiency.It is demonstrated that, for specific distributions, applying specific TSLlimits results in better performance. This is shown to be mainly due torandomly sampling the start index of the waveforms resulting in more uniqueexamples for training. A SepFormer model trained using a TSL limit of 4.42s anddynamic mixing (DM) is shown to match the best-performing SepFormer modeltrained with DM and unlimited signal lengths. Furthermore, the 4.42s TSL limitresults in a 44% reduction in training time with WHAMR.", "output": "On Data Sampling Strategies for Training Neural Network Speech Separation Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cancer is a highly heterogeneous condition that can occur almost anywhere inthe human body. 18F-fluorodeoxyglucose is an imaging modality commonly used todetect cancer due to its high sensitivity and clear visualisation of thepattern of metabolic activity. Nonetheless, as cancer is highly heterogeneous,it is challenging to train general-purpose discriminative cancer detectionmodels, with data availability and disease complexity often cited as a limitingfactor. Unsupervised anomaly detection models have been suggested as a putativesolution. These models learn a healthy representation of tissue and detectcancer by predicting deviations from the healthy norm, which requires modelscapable of accurately learning long-range interactions between organs and theirimaging patterns with high levels of expressivity. Such characteristics aresuitably satisfied by transformers, which have been shown to generatestate-of-the-art results in unsupervised anomaly detection by training onnormal data. This work expands upon such approaches by introducing multi-modalconditioning of the transformer via cross-attention i.e. supplying anatomicalreference from paired CT. Using 294 whole-body PET/CT samples, we show that ouranomaly detection method is robust and capable of achieving accurate cancerlocalization results even in cases where normal training data is unavailable.In addition, we show the efficacy of this approach on out-of-sample datashowcasing the generalizability of this approach with limited training data.Lastly, we propose to combine model uncertainty with a new kernel densityestimation approach, and show that it provides clinically and statisticallysignificant improvements when compared to the classic residual-based anomalymaps. Overall, a superior performance is demonstrated against leadingstate-of-the-art alternatives, drawing attention to the potential of theseapproaches.", "output": "Cross Attention Transformers for Multi-modal Unsupervised Whole-Body PET Anomaly Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the increasing penetration of renewable power sources such as wind andsolar, accurate short-term, nowcasting renewable power prediction is becomingincreasingly important. This paper investigates the multi-modal (MM) learningand end-to-end (E2E) learning for nowcasting renewable power as an intermediateto energy management systems. MM combines features from all-sky imagery andmeteorological sensor data as two modalities to predict renewable powergeneration that otherwise could not be combined effectively. The combined,predicted values are then input to a differentiable optimal power flow (OPF)formulation simulating the energy management. For the first time, MM iscombined with E2E training of the model that minimises the expected totalsystem cost. The case study tests the proposed methodology on the real sky andmeteorological data from the Netherlands. In our study, the proposed MM-E2Emodel reduced system cost by 30% compared to uni-modal baselines.", "output": "End-to-End Learning with Multiple Modalities for System-Optimised Renewables Nowcasting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Subgraph-enhanced graph neural networks (SGNN) can increase the expressivepower of the standard message-passing framework. This model family representseach graph as a collection of subgraphs, generally extracted by random samplingor with hand-crafted heuristics. Our key observation is that by selecting\"meaningful\" subgraphs, besides improving the expressivity of a GNN, it is alsopossible to obtain interpretable results. For this purpose, we introduce anovel framework that jointly predicts the class of the graph and a set ofexplanatory sparse subgraphs, which can be analyzed to understand the decisionprocess of the classifier. We compare the performance of our framework againststandard subgraph extraction policies, like random node/edge deletionstrategies. The subgraphs produced by our framework allow to achieve comparableperformance in terms of accuracy, with the additional benefit of providingexplanations.", "output": "Combining Stochastic Explainers and Subgraph Neural Networks can Increase Expressivity and Interpretability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A key challenge for a reinforcement learning (RL) agent is to incorporateexternal/expert1 advice in its learning. The desired goals of an algorithm thatcan shape the learning of an RL agent with external advice include (a)maintaining policy invariance; (b) accelerating the learning of the agent; and(c) learning from arbitrary advice [3]. To address this challenge this paperformulates the problem of incorporating external advice in RL as a multi-armedbandit called shaping-bandits. The reward of each arm of shaping banditscorresponds to the return obtained by following the expert or by following adefault RL algorithm learning on the true environment reward.We show thatdirectly applying existing bandit and shaping algorithms that do not reasonabout the non-stationary nature of the underlying returns can lead to poorresults. Thus we propose UCB-PIES (UPIES), Racing-PIES (RPIES), and Lazy PIES(LPIES) three different shaping algorithms built on different assumptions thatreason about the long-term consequences of following the expert policy or thedefault RL algorithm. Our experiments in four different settings show thatthese proposed algorithms achieve the above-mentioned goals whereas the otheralgorithms fail to do so.", "output": "Bandit-Based Policy Invariant Explicit Shaping for Incorporating External Advice in Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Solar activity is one of the main drivers of variability in our solar systemand the key source of space weather phenomena that affect Earth and near Earthspace. The extensive record of high resolution extreme ultraviolet (EUV)observations from the Solar Dynamics Observatory (SDO) offers an unprecedented,very large dataset of solar images. In this work, we make use of thiscomprehensive dataset to investigate capabilities of current state-of-the-artgenerative models to accurately capture the data distribution behind theobserved solar activity states. Starting from StyleGAN-based methods, weuncover severe deficits of this model family in handling fine-scale details ofsolar images when training on high resolution samples, contrary to training onnatural face images. When switching to the diffusion based generative modelfamily, we observe strong improvements of fine-scale detail generation. For theGAN family, we are able to achieve similar improvements in fine-scalegeneration when turning to ProjectedGANs, which uses multi-scale discriminatorswith a pre-trained frozen feature extractor. We conduct ablation studies toclarify mechanisms responsible for proper fine-scale handling. Usingdistributed training on supercomputers, we are able to train generative modelsfor up to 1024x1024 resolution that produce high quality samplesindistinguishable to human experts, as suggested by the evaluation we conduct.We make all code, models and workflows used in this study publicly available aturl{", "output": "A Comparative Study on Generative Models for High Resolution Solar Observation Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "GPT-3 and several other language models (LMs) can effectively address variousnatural language processing (NLP) tasks, including machine translation and textsummarization. Recently, they have also been successfully employed in thebusiness process management (BPM) domain, e.g., for predictive processmonitoring and process extraction from text. This, however, typically requiresfine-tuning the employed LM, which, among others, necessitates large amounts ofsuitable training data. A possible solution to this problem is the use ofprompt engineering, which leverages pre-trained LMs without fine-tuning them.Recognizing this, we argue that prompt engineering can help bring thecapabilities of LMs to BPM research. We use this position paper to develop aresearch agenda for the use of prompt engineering for BPM research byidentifying the associated potentials and challenges.", "output": "Just Tell Me: Prompt Engineering in Business Process Management."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Complex networked systems in fields such as physics, biology, and socialsciences often involve interactions that extend beyond simple pairwise ones.Hypergraphs serve as powerful modeling tools for describing and analyzing theintricate behaviors of systems with multi-body interactions. Herein, weinvestigate a discrete-time nonlinear averaging dynamics with three-bodyinteractions: an underlying hypergraph, comprising triples as hyperedges,delineates the structure of these interactions, while the vertices update theirstates through a weighted, state-dependent average of neighboring pairs'states. This dynamics captures reinforcing group effects, such as peerpressure, and exhibits higher-order dynamical effects resulting from a complexinterplay between initial states, hypergraph topology, and nonlinearity of theupdate. Differently from linear averaging dynamics on graphs with two-bodyinteractions, this model does not converge to the average of the initial statesbut rather induces a shift. By assuming random initial states and by makingsome regularity and density assumptions on the hypergraph, we prove that thedynamics converges to a multiplicatively-shifted average of the initial states,with high probability. We further characterize the shift as a function of twoparameters describing the initial state and interaction strength, as well asthe convergence time as a function of the hypergraph structure.", "output": "On the convergence of nonlinear averaging dynamics with three-body interactions on hypergraphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Compact user representations (such as embeddings) form the backbone ofpersonalization services. In this work, we present a new theoretical frameworkto measure re-identification risk in such user representations. Our framework,based on hypothesis testing, formally bounds the probability that an attackermay be able to obtain the identity of a user from their representation. As anapplication, we show how our framework is general enough to model importantreal-world applications such as the Chrome's Topics API for interest-basedadvertising. We complement our theoretical bounds by showing provably goodattack algorithms for re-identification that we use to estimate there-identification risk in the Topics API. We believe this work provides arigorous and interpretable notion of re-identification risk and a framework tomeasure it that can be used to inform real-world applications.", "output": "Measuring Re-identification Risk."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vegetation structure mapping is critical for understanding the global carboncycle and monitoring nature-based approaches to climate adaptation andmitigation. Repeat measurements of these data allow for the observation ofdeforestation or degradation of existing forests, natural forest regeneration,and the implementation of sustainable agricultural practices like agroforestry.Assessments of tree canopy height and crown projected area at a high spatialresolution are also important for monitoring carbon fluxes and assessingtree-based land uses, since forest structures can be highly spatiallyheterogeneous, especially in agroforestry systems. Very high resolutionsatellite imagery (less than one meter (1m) ground sample distance) makes itpossible to extract information at the tree level while allowing monitoring ata very large scale. This paper presents the first high-resolution canopy heightmap concurrently produced for multiple sub-national jurisdictions.Specifically, we produce canopy height maps for the states of California andS~{a}o Paolo, at sub-meter resolution, a significant improvement over the tenmeter (10m) resolution of previous Sentinel / GEDI based worldwide maps ofcanopy height. The maps are generated by applying a vision transformer tofeatures extracted from a self-supervised model in Maxar imagery from 2017 to2020, and are trained against aerial lidar and GEDI observations. We evaluatethe proposed maps with set-aside validation lidar data as well as by comparingwith other remotely sensed maps and field-collected data, and find our modelproduces an average Mean Absolute Error (MAE) within set-aside validation areasof 3.0 meters.", "output": "Sub-meter resolution canopy height maps using self-supervised learning and a vision transformer trained on Aerial and GEDI Lidar."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Over the last few years, we have not seen any major developments inmodel-free or model-based learning methods that would make one obsoleterelative to the other. In most cases, the used technique is heavily dependenton the use case scenario or other attributes, e.g. the environment. Bothapproaches have their own advantages, for example, sample efficiency orcomputational efficiency. However, when combining the two, the advantages ofeach can be combined and hence achieve better performance. The TD-MPC frameworkis an example of this approach. On the one hand, a world model in combinationwith model predictive control is used to get a good initial estimate of thevalue function. On the other hand, a Q function is used to provide a goodlong-term estimate. Similar to algorithms like MuZero a latent staterepresentation is used, where only task-relevant information is encoded toreduce the complexity. In this paper, we propose the use of a reconstructionfunction within the TD-MPC framework, so that the agent can reconstruct theoriginal observation given the internal state representation. This allows ouragent to have a more stable learning signal during training and also improvessample efficiency. Our proposed addition of another loss term leads to improvedperformance on both state- and image-based tasks from the DeepMind-Controlsuite.", "output": "Model Predictive Control with Self-supervised Representation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the development of the Internet of Things (IoT), network intrusiondetection is becoming more complex and extensive. It is essential toinvestigate an intelligent, automated, and robust network intrusion detectionmethod. Graph neural networks based network intrusion detection methods havebeen proposed. However, it still needs further studies because the graphconstruction method of the existing methods does not fully adapt to thecharacteristics of the practical network intrusion datasets. To address theabove issue, this paper proposes a graph neural network algorithm based onbehavior similarity (BS-GAT) using graph attention network. First, a novelgraph construction method is developed using the behavior similarity byanalyzing the characteristics of the practical datasets. The data flows aretreated as nodes in the graph, and the behavior rules of nodes are used asedges in the graph, constructing a graph with a relatively uniform number ofneighbors for each node. Then, the edge behavior relationship weights areincorporated into the graph attention network to utilize the relationshipbetween data flows and the structure information of the graph, which is used toimprove the performance of the network intrusion detection. Finally,experiments are conducted based on the latest datasets to evaluate theperformance of the proposed behavior similarity based graph attention networkfor the network intrusion detection. The results show that the proposed methodis effective and has superior performance comparing to existing solutions.", "output": "BS-GAT Behavior Similarity Based Graph Attention Network for Network Intrusion Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This article measures how sparsity can make neural networks more robust tomembership inference attacks. The obtained empirical results show that sparsityimproves the privacy of the network, while preserving comparable performanceson the task at hand. This empirical study completes and extends existingliterature.", "output": "Sparsity in neural networks can increase their privacy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Design exploration is an important step in the engineering design process.This involves the search for design/s that meet the specified design criteriaand accomplishes the predefined objective/s. In recent years, machinelearning-based approaches have been widely used in engineering design problems.This paper showcases Artificial Neural Network (ANN) architecture applied to anengineering design problem to explore and identify improved design solutions.The case problem of this study is the design of flexible disc elements used indisc couplings. We are required to improve the design of the disc elements bylowering the mass and stress without lowering the torque transmission andmisalignment capability. To accomplish this objective, we employ ANN coupledwith genetic algorithm in the design exploration step to identify designs thatmeet the specified criteria (torque and misalignment) while having minimum massand stress. The results are comparable to the optimized results obtained fromthe traditional response surface method. This can have huge advantage when weare evaluating conceptual designs against multiple conflicting requirements.", "output": "Machine Learning-Based Multi-Objective Design Exploration Of Flexible Disc Elements."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We provide a novel Neural Network architecture that can: i) output R-matrixfor a given quantum integrable spin chain, ii) search for an integrableHamiltonian and the corresponding R-matrix under assumptions of certainsymmetries or other restrictions, iii) explore the space of Hamiltonians aroundalready learned models and reconstruct the family of integrable spin chainswhich they belong to. The neural network training is done by minimizing lossfunctions encoding Yang-Baxter equation, regularity and other model-specificrestrictions such as hermiticity. Holomorphy is implemented via the choice ofactivation functions. We demonstrate the work of our Neural Network on thetwo-dimensional spin chains of difference form. In particular, we reconstructthe R-matrices for all 14 classes. We also demonstrate its utility as antextit{Explorer}, scanning a certain subspace of Hamiltonians and identifyingintegrable classes after clusterisation. The last strategy can be used infuture to carve out the map of integrable spin chains in higher dimensions andin more general settings where no analytical methods are available.", "output": "The R-mAtrIx Net."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Object detection has seen remarkable progress in recent years with theintroduction of Convolutional Neural Networks (CNN). Object detection is amulti-task learning problem where both the position of the objects in theimages as well as their classes needs to be correctly identified. The idea hereis to maximize the overlap between the ground-truth bounding boxes and thepredictions i.e. the Intersection over Union (IoU). In the scope of work seencurrently in this domain, IoU is approximated by using the Huber loss as aproxy but this indirect method does not leverage the IoU information and treatsthe bounding box as four independent, unrelated terms of regression. This isnot true for a bounding box where the four coordinates are highly correlatedand hold a semantic meaning when taken together. The direct optimization of theIoU is not possible due to its non-convex and non-differentiable nature. Inthis paper, we have formulated a novel loss namely, the Smooth IoU, whichdirectly optimizes the IoUs for the bounding boxes. This loss has beenevaluated on the Oxford IIIT Pets, Udacity self-driving car, PASCAL VOC, andVWFS Car Damage datasets and has shown performance gains over the standardHuber loss.", "output": "Directly Optimizing IoU for Bounding Box Localization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language models pre-trained on large self-supervised corpora, followed bytask-specific fine-tuning has become the dominant paradigm in NLP. Thesepre-training datasets often have a one-to-many structure--e.g. in dialoguethere are many valid responses for a given context. However, only some of theseresponses will be desirable in our downstream task. This raises the question ofhow we should train the model such that it can emulate the desirablebehaviours, but not the undesirable ones. Current approaches train in aone-to-one setup--only a single target response is given for a single dialoguecontext--leading to models only learning to predict the average response, whileignoring the full range of possible responses. Using text-based games as atestbed, our approach, PASA, uses discrete latent variables to capture therange of different behaviours represented in our larger pre-training dataset.We then use knowledge distillation to distil the posterior probabilitydistribution into a student model. This probability distribution is far richerthan learning from only the hard targets of the dataset, and thus allows thestudent model to benefit from the richer range of actions the teacher model haslearned. Results show up to 49% empirical improvement over the previousstate-of-the-art model on the Jericho Walkthroughs dataset.", "output": "Learn What Is Possible, Then Choose What Is Best: Disentangling One-To-Many Relations in Language Through Text-based Games."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The standard non-invasive imaging technique used to assess the severity andextent of Coronary Artery Disease (CAD) is Coronary Computed TomographyAngiography (CCTA). However, manual grading of each patient's CCTA according tothe CAD-Reporting and Data System (CAD-RADS) scoring is time-consuming andoperator-dependent, especially in borderline cases. This work proposes a fullyautomated, and visually explainable, deep learning pipeline to be used as adecision support system for the CAD screening procedure. The pipeline performstwo classification tasks: firstly, identifying patients who require furtherclinical investigations and secondly, classifying patients into subgroups basedon the degree of stenosis, according to commonly used CAD-RADS thresholds. Thepipeline pre-processes multiplanar projections of the coronary arteries,extracted from the original CCTAs, and classifies them using a fine-tunedMulti-Axis Vision Transformer architecture. With the aim of emulating thecurrent clinical practice, the model is trained to assign a per-patient scoreby stacking the bi-dimensional longitudinal cross-sections of the three maincoronary arteries along channel dimension. Furthermore, it generates visuallyinterpretable maps to assess the reliability of the predictions. When run on adatabase of 1873 three-channel images of 253 patients collected at the MonzinoCardiology Center in Milan, the pipeline obtained an AUC of 0.87 and 0.93 forthe two classification tasks, respectively. According to our knowledge, this isthe first model trained to assign CAD-RADS scores learning solely from patientscores and not requiring finer imaging annotation steps that are not part ofthe clinical routine.", "output": "CAD-RADS scoring of coronary CT angiography with Multi-Axis Vision Transformer: a clinically-inspired deep learning pipeline."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper studies reward-agnostic exploration in reinforcement learning (RL)-- a scenario where the learner is unware of the reward functions during theexploration stage -- and designs an algorithm that improves over the state ofthe art. More precisely, consider a finite-horizon non-stationary Markovdecision process with $S$ states, $A$ actions, and horizon length $H$, andsuppose that there are no more than a polynomial number of given rewardfunctions of interest. By collecting an order of begin{align*}frac{SAH^3}{varepsilon^2} text{ sample episodes (up to log factor)}end{align*} without guidance of the reward information, our algorithm is ableto find $varepsilon$-optimal policies for all these reward functions, providedthat $varepsilon$ is sufficiently small. This forms the first reward-agnosticexploration scheme in this context that achieves provable minimax optimality.Furthermore, once the sample size exceeds $frac{S^2AH^3}{varepsilon^2}$episodes (up to log factor), our algorithm is able to yield $varepsilon$accuracy for arbitrarily many reward functions (even when they areadversarially designed), a task commonly dubbed as ``reward-free exploration.''The novelty of our algorithm design draws on insights from offline RL: theexploration scheme attempts to maximize a critical reward-agnostic quantitythat dictates the performance of offline RL, while the policy learning paradigmleverages ideas from sample-optimal offline RL paradigms.", "output": "Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of synthetically generating data that can closelyresemble human decisions made in the context of an interactive human-AI systemlike a computer game. We propose a novel algorithm that can generate synthetic,human-like, decision making data while starting from a very small set ofdecision making data collected from humans. Our proposed algorithm integratesthe concept of reward shaping with an imitation learning algorithm to generatethe synthetic data. We have validated our synthetic data generation techniqueby using the synthetically generated data as a surrogate for human interactiondata to solve three sequential decision making tasks of increasing complexitywithin a small computer game-like setup. Different empirical and statisticalanalyses of our results show that the synthetically generated data cansubstitute the human data and perform the game-playing tasks almostindistinguishably, with very low divergence, from a human performing the sametasks.", "output": "Synthetically Generating Human-like Data for Sequential Decision Making Tasks via Reward-Shaped Imitation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cross-entropy is a widely used loss function in applications. It coincideswith the logistic loss applied to the outputs of a neural network, when thesoftmax is used. But, what guarantees can we rely on when using cross-entropyas a surrogate loss? We present a theoretical analysis of a broad family oflosses, comp-sum losses, that includes cross-entropy (or logistic loss),generalized cross-entropy, the mean absolute error and other losscross-entropy-like functions. We give the first $H$-consistency bounds forthese loss functions. These are non-asymptotic guarantees that upper bound thezero-one loss estimation error in terms of the estimation error of a surrogateloss, for the specific hypothesis set $H$ used. We further show that our boundsare tight. These bounds depend on quantities called minimizability gaps, whichonly depend on the loss function and the hypothesis set. To make them moreexplicit, we give a specific analysis of these gaps for comp-sum losses. Wealso introduce a new family of loss functions, smooth adversarial comp-sumlosses, derived from their comp-sum counterparts by adding in a related smoothterm. We show that these loss functions are beneficial in the adversarialsetting by proving that they admit $H$-consistency bounds. This leads to newadversarial robustness algorithms that consist of minimizing a regularizedsmooth adversarial comp-sum loss. While our main purpose is a theoreticalanalysis, we also present an extensive empirical analysis comparing comp-sumlosses. We further report the results of a series of experiments demonstratingthat our adversarial robustness algorithms outperform the currentstate-of-the-art, while also achieving a superior non-adversarial accuracy.", "output": "Cross-Entropy Loss Functions: Theoretical Analysis and Applications."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We investigate the geometric properties of the functions learned by trainedConvNets in the preactivation space of their convolutional layers, byperforming an empirical study of hyperplane arrangements induced by aconvolutional layer. We introduce statistics over the weights of a trainednetwork to study local arrangements and relate them to the training dynamics.We observe that trained ConvNets show a significant statistical bias towardsregular hyperplane configurations. Furthermore, we find that layers showingbiased configurations are critical to validation performance for thearchitectures considered, trained on CIFAR10, CIFAR100 and ImageNet.", "output": "Hyperplane Arrangements of Trained ConvNets Are Biased."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most signal processing and statistical applications heavily rely on specificdata distribution models. The Gaussian distributions, although being the mostcommon choice, are inadequate in most real world scenarios as they fail toaccount for data coming from heavy-tailed populations or contaminated byoutliers. Such problems call for the use of Robust Statistics. The robustmodels and estimators are usually based on elliptical populations, making thelatter ubiquitous in all methods of robust statistics. To determine whethersuch tools are applicable in any specific case, goodness-of-fit (GoF) tests areused to verify the ellipticity hypothesis. Ellipticity GoF tests are usuallyhard to analyze and often their statistical power is not particularly strong.In this work, assuming the true covariance matrix is unknown we design andrigorously analyze a robust GoF test consistent against all alternatives toellipticity on the unit sphere. The proposed test is based on Tyler's estimatorand is formulated in terms of easily computable statistics of the data. For itsrigorous analysis, we develop a novel framework based on the exchangeablerandom variables calculus introduced by de Finetti. Our findings are supportedby numerical simulations comparing them to other popular GoF tests anddemonstrating the significantly higher statistical power of the suggestedtechnique.", "output": "A Robust Test for Elliptical Symmetry."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion-based generative models have demonstrated a capacity forperceptually impressive synthesis, but can they also be great likelihood-basedmodels? We answer this in the affirmative, and introduce a family ofdiffusion-based generative models that obtain state-of-the-art likelihoods onstandard image density estimation benchmarks. Unlike other diffusion-basedmodels, our method allows for efficient optimization of the noise schedulejointly with the rest of the model. We show that the variational lower bound(VLB) simplifies to a remarkably short expression in terms of thesignal-to-noise ratio of the diffused data, thereby improving our theoreticalunderstanding of this model class. Using this insight, we prove an equivalencebetween several models proposed in the literature. In addition, we show thatthe continuous-time VLB is invariant to the noise schedule, except for thesignal-to-noise ratio at its endpoints. This enables us to learn a noiseschedule that minimizes the variance of the resulting VLB estimator, leading tofaster optimization. Combining these advances with architectural improvements,we obtain state-of-the-art likelihoods on image density estimation benchmarks,outperforming autoregressive models that have dominated these benchmarks formany years, with often significantly faster optimization. In addition, we showhow to use the model as part of a bits-back compression scheme, and demonstratelossless compression rates close to the theoretical optimum. Code is availableat .", "output": "Variational Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we investigate a two-layer fully connected neural network ofthe form $f(X)=frac{1}{sqrt{d_1}}boldsymbol{a}^top sigmaleft(WXright)$,where $Xinmathbb{R}^{d_0times n}$ is a deterministic data matrix,$Winmathbb{R}^{d_1times d_0}$ and $boldsymbol{a}inmathbb{R}^{d_1}$ arerandom Gaussian weights, and $sigma$ is a nonlinear activation function. Westudy the limiting spectral distributions of two empirical kernel matricesassociated with $f(X)$: the empirical conjugate kernel (CK) and neural tangentkernel (NTK), beyond the linear-width regime ($d_1asymp n$). We focus on the$textit{ultra-wide regime}$, where the width $d_1$ of the first layer is muchlarger than the sample size $n$. Under appropriate assumptions on $X$ and$sigma$, a deformed semicircle law emerges as $d_1/ntoinfty$ and$ntoinfty$. We first prove this limiting law for generalized samplecovariance matrices with some dependency. To specify it for our neural networkmodel, we provide a nonlinear Hanson-Wright inequality that is suitable forneural networks with random weights and Lipschitz activation functions. We alsodemonstrate non-asymptotic concentrations of the empirical CK and NTK aroundtheir limiting kernels in the spectral norm, along with lower bounds on theirsmallest eigenvalues. As an application, we show that random feature regressioninduced by the empirical kernel achieves the same asymptotic performance as itslimiting kernel regression under the ultra-wide regime. This allows us tocalculate the asymptotic training and test errors for random feature regressionusing the corresponding kernel regression.", "output": "Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The increasing complexity of today's software requires the contribution ofthousands of developers. This complex collaboration structure makes developersmore likely to introduce defect-prone changes that lead to software faults.Determining when these defect-prone changes are introduced has provenchallenging, and using traditional machine learning (ML) methods to make thesedeterminations seems to have reached a plateau. In this work, we buildcontribution graphs consisting of developers and source files to capture thenuanced complexity of changes required to build software. By leveraging thesecontribution graphs, our research shows the potential of using graph-based MLto improve Just-In-Time (JIT) defect prediction. We hypothesize that featuresextracted from the contribution graphs may be better predictors of defect-pronechanges than intrinsic features derived from software characteristics. Wecorroborate our hypothesis using graph-based ML for classifying edges thatrepresent defect-prone changes. This new framing of the JIT defect predictionproblem leads to remarkably better results. We test our approach on 14open-source projects and show that our best model can predict whether or not acode change will lead to a defect with an F1 score as high as 77.55% and aMatthews correlation coefficient (MCC) as high as 53.16%. This represents a152% higher F1 score and a 3% higher MCC over the state-of-the-art JIT defectprediction. We describe limitations, open challenges, and how this method canbe used for operational JIT defect prediction.", "output": "Graph-Based Machine Learning Improves Just-in-Time Defect Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "State-of-the-art deep neural networks (DNNs) have been proven to bevulnerable to adversarial manipulation and backdoor attacks. Backdoored modelsdeviate from expected behavior on inputs with predefined triggers whileretaining performance on clean data. Recent works focus on software simulationof backdoor injection during the inference phase by modifying network weights,which we find often unrealistic in practice due to restrictions in hardware.In contrast, in this work for the first time, we present an end-to-endbackdoor injection attack realized on actual hardware on a classifier modelusing Rowhammer as the fault injection method. To this end, we firstinvestigate the viability of backdoor injection attacks in real-lifedeployments of DNNs on hardware and address such practical issues in hardwareimplementation from a novel optimization perspective. We are motivated by thefact that vulnerable memory locations are very rare, device-specific, andsparsely distributed. Consequently, we propose a novel network trainingalgorithm based on constrained optimization to achieve a realistic backdoorinjection attack in hardware. By modifying parameters uniformly across theconvolutional and fully-connected layers as well as optimizing the triggerpattern together, we achieve state-of-the-art attack performance with fewer bitflips. For instance, our method on a hardware-deployed ResNet-20 model trainedon CIFAR-10 achieves over 89% test accuracy and 92% attack success rate byflipping only 10 out of 2.2 million bits.", "output": "Don't Knock! Rowhammer at the Backdoor of DNN Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Considering a probability distribution over parameters is known as anefficient strategy to learn a neural network with non-differentiable activationfunctions. We study the expectation of a probabilistic neural network as apredictor by itself, focusing on the aggregation of binary activated neuralnetworks with normal distributions over real-valued weights. Our work leveragesa recent analysis derived from the PAC-Bayesian framework that derives tightgeneralization bounds and learning procedures for the expected output value ofsuch an aggregation, which is given by an analytical expression. While thecombinatorial nature of the latter has been circumvented by approximations inprevious works, we show that the exact computation remains tractable for deepbut narrow neural networks, thanks to a dynamic programming approach. Thisleads us to a peculiar bound minimization learning algorithm for binaryactivated neural networks, where the forward pass propagates probabilities overrepresentations instead of activation values. A stochastic counterpart thatscales to wide architectures is proposed.", "output": "PAC-Bayesian Learning of Aggregated Binary Activated Neural Networks with Probabilities over Representations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose physics-informed neural operators (PINO) that usesavailable data and/or physics constraints to learn the solution operator of afamily of parametric Partial Differential Equation (PDE). This hybrid approachallows PINO to overcome the limitations of purely data-driven and physics-basedmethods. For instance, data-driven methods fail to learn when data is oflimited quantity and/or quality, and physics-based approaches fail to optimizeon challenging PDE constraints. By combining both data and PDE constraints,PINO overcomes all these challenges. Additionally, a unique property that PINOenjoys over other hybrid learning methods is its ability to incorporate dataand PDE constraints at different resolutions. This allows us to combinecoarse-resolution data, which is inexpensive to obtain from numerical solvers,with higher resolution PDE constraints, and the resulting PINO has nodegradation in accuracy even on high-resolution test instances. Thisdiscretization-invariance property in PINO is due to neural-operator frameworkwhich learns mappings between function spaces and allows evaluation atdifferent resolutions without the need for re-training. Moreover, PINO succeedsin the purely physics setting, where no data is available, while otherapproaches such as the Physics-Informed Neural Network (PINN) fail due tooptimization challenges, e.g. in multi-scale dynamic systems such as Kolmogorovflows. This is because PINO learns the solution operator by optimizing PDEconstraints on multiple instances while PINN optimizes PDE constraints of asingle PDE instance. Further, in PINO, we incorporate the Fourier neuraloperator (FNO) architecture which achieves orders-of-magnitude speedup overnumerical solvers and also allows us to compute explicit gradients on functionspaces efficiently.", "output": "Physics-Informed Neural Operator for Learning Partial Differential Equations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A variety of theoretically-sound policy gradient algorithms exist for theon-policy setting due to the policy gradient theorem, which provides asimplified form for the gradient. The off-policy setting, however, has beenless clear due to the existence of multiple objectives and the lack of anexplicit off-policy policy gradient theorem. In this work, we unify theseobjectives into one off-policy objective, and provide a policy gradient theoremfor this unified objective. The derivation involves emphatic weightings andinterest functions. We show multiple strategies to approximate the gradients,in an algorithm called Actor Critic with Emphatic weightings (ACE). We prove ina counterexample that previous (semi-gradient) off-policy actor-criticmethods--particularly Off-Policy Actor-Critic (OffPAC) and Deterministic PolicyGradient (DPG)--converge to the wrong solution whereas ACE finds the optimalsolution. We also highlight why these semi-gradient approaches can stillperform well in practice, suggesting strategies for variance reduction in ACE.We empirically study several variants of ACE on two classic controlenvironments and an image-based environment designed to illustrate thetradeoffs made by each gradient approximation. We find that by approximatingthe emphatic weightings directly, ACE performs as well as or better than OffPACin all settings tested.", "output": "Off-Policy Actor-Critic with Emphatic Weightings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces two new ensemble-based methods to reduce the data andcomputation costs of image classification. They can be used with any set ofclassifiers and do not require additional training. In the first approach, datausage is reduced by only analyzing a full-sized image if the model has lowconfidence in classifying a low-resolution pixelated version. When applied onthe best performing classifiers considered here, data usage is reduced by 61.2%on MNIST, 69.6% on KMNIST, 56.3% on FashionMNIST, 84.6% on SVHN, 40.6% onImageNet, and 27.6% on ImageNet-V2, all with a less than 5% reduction inaccuracy. However, for CIFAR-10, the pixelated data are not particularlyinformative, and the ensemble approach increases data usage while reducingaccuracy. In the second approach, compute costs are reduced by only using acomplex model if a simpler model has low confidence in its classification.Computation cost is reduced by 82.1% on MNIST, 47.6% on KMNIST, 72.3% onFashionMNIST, 86.9% on SVHN, 89.2% on ImageNet, and 81.5% on ImageNet-V2, allwith a less than 5% reduction in accuracy; for CIFAR-10 the correspondingimprovements are smaller at 13.5%. When cost is not an object, choosing theprojection from the most confident model for each observation increasesvalidation accuracy to 81.0% from 79.3% for ImageNet and to 69.4% from 67.5%for ImageNet-V2.", "output": "Problem-dependent attention and effort in neural networks with applications to image resolution and model selection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, deep learning-based algorithms are widely adopted due to theadvantage of being able to establish anomaly detection models without or withminimal domain knowledge of the task. Instead, to train the artificial neuralnetwork more stable, it should be better to define the appropriate neuralnetwork structure or the loss function. For the training anomaly detectionmodel, the mean squared error (MSE) function is adopted widely. On the otherhand, the novel loss function, logarithmic mean squared error (LMSE), isproposed in this paper to train the neural network more stable. This studycovers a variety of comparisons from mathematical comparisons, visualization inthe differential domain for backpropagation, loss convergence in the trainingprocess, and anomaly detection performance. In an overall view, LMSE issuperior to the existing MSE function in terms of strongness of lossconvergence, anomaly detection performance. The LMSE function is expected to beapplicable for training not only the anomaly detection model but also thegeneral generative neural network.", "output": "Concise Logarithmic Loss Function for Robust Training of Anomaly Detection Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce the notion of a reproducible algorithm in the context oflearning. A reproducible learning algorithm is resilient to variations in itssamples -- with high probability, it returns the exact same output when run ontwo samples from the same underlying distribution. We begin by unpacking thedefinition, clarifying how randomness is instrumental in balancing accuracy andreproducibility. We initiate a theory of reproducible algorithms, showing howreproducibility implies desirable properties such as data reuse and efficienttestability. Despite the exceedingly strong demand of reproducibility, thereare efficient reproducible algorithms for several fundamental problems instatistics and learning. First, we show that any statistical query algorithmcan be made reproducible with a modest increase in sample complexity, and weuse this to construct reproducible algorithms for finding approximateheavy-hitters and medians. Using these ideas, we give the first reproduciblealgorithm for learning halfspaces via a reproducible weak learner and areproducible boosting algorithm. Finally, we initiate the study of lower boundsand inherent tradeoffs for reproducible algorithms, giving nearly tight samplecomplexity upper and lower bounds for reproducible versus nonreproducible SQalgorithms.", "output": "Reproducibility in Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Model Predictive Control (MPC) has become a popular framework in embeddedcontrol for high-performance autonomous systems. However, to achieve goodcontrol performance using MPC, an accurate dynamics model is key. To maintainreal-time operation, the dynamics models used on embedded systems have beenlimited to simple first-principle models, which substantially limits theirrepresentative power. In contrast to such simple models, machine learningapproaches, specifically neural networks, have been shown to accurately modeleven complex dynamic effects, but their large computational complexity hinderedcombination with fast real-time iteration loops. With this work, we presentReal-time Neural MPC, a framework to efficiently integrate large, complexneural network architectures as dynamics models within a model-predictivecontrol pipeline. Our experiments, performed in simulation and the real worldonboard a highly agile quadrotor platform, demonstrate the capabilities of thedescribed system to run learned models with, previously infeasible, largemodeling capacity using gradient-based online optimization MPC. Compared toprior implementations of neural networks in online optimization MPC we canleverage models of over 4000 times larger parametric capacity in a 50Hzreal-time window on an embedded platform. Further, we show the feasibility ofour framework on real-world problems by reducing the positional tracking errorby up to 82% when compared to state-of-the-art MPC approaches without neuralnetwork dynamics.", "output": "Real-time Neural-MPC: Deep Learning Model Predictive Control for Quadrotors and Agile Robotic Platforms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Background: Many published machine learning studies are irreproducible.Issues with methodology and not properly accounting for variation introduced bythe algorithm themselves or their implementations are attributed as the maincontributors to the irreproducibility.Problem: There exist no theoreticalframework that relates experiment design choices to potential effects on theconclusions. Without such a framework, it is much harder for practitioners andresearchers to evaluate experiment results and describe the limitations ofexperiments. The lack of such a framework also makes it harder for independentresearchers to systematically attribute the causes of failed reproducibilityexperiments. Objective: The objective of this paper is to develop a frameworkthat enable applied data science practitioners and researchers to understandwhich experiment design choices can lead to false findings and how and by thishelp in analyzing the conclusions of reproducibility experiments. Method: Wehave compiled an extensive list of factors reported in the literature that canlead to machine learning studies being irreproducible. These factors areorganized and categorized in a reproducibility framework motivated by thestages of the scientific method. The factors are analyzed for how they canaffect the conclusions drawn from experiments. A model comparison study is usedas an example. Conclusion: We provide a framework that describes machinelearning methodology from experimental design decisions to the conclusionsinferred from them.", "output": "Sources of Irreproducibility in Machine Learning: A Review."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We develop an algorithm for parameter-free stochastic convex optimization(SCO) whose rate of convergence is only a double-logarithmic factor larger thanthe optimal rate for the corresponding known-parameter setting. In contrast,the best previously known rates for parameter-free SCO are based on onlineparameter-free regret bounds, which contain unavoidable excess logarithmicterms compared to their known-parameter counterparts. Our algorithm isconceptually simple, has high-probability guarantees, and is also partiallyadaptive to unknown gradient norms, smoothness, and strong convexity. At theheart of our results is a novel parameter-free certificate for SGD step sizechoice, and a time-uniform concentration result that assumes no a-priori boundson SGD iterates.", "output": "Making SGD Parameter-Free."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Is overparameterization a privacy liability? In this work, we study theeffect that the number of parameters has on a classifier's vulnerability tomembership inference attacks. We first demonstrate how the number of parametersof a model can induce a privacy--utility trade-off: increasing the number ofparameters generally improves generalization performance at the expense oflower privacy. However, remarkably, we then show that if coupled with properregularization, increasing the number of parameters of a model can actuallysimultaneously increase both its privacy and performance, thereby eliminatingthe privacy--utility trade-off. Theoretically, we demonstrate this curiousphenomenon for logistic regression with ridge regularization in a bi-levelfeature ensemble setting. Pursuant to our theoretical exploration, we develop anovel leave-one-out analysis tool to precisely characterize the vulnerabilityof a linear classifier to the optimal membership inference attack. Weempirically exhibit this \"blessing of dimensionality\" for neural networks on avariety of tasks using early stopping as the regularizer.", "output": "A Blessing of Dimensionality in Membership Inference through Regularization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We develop two compression based stochastic gradient algorithms to solve aclass of non-smooth strongly convex-strongly concave saddle-point problems in adecentralized setting (without a central server). Our first algorithm is aRestart-based Decentralized Proximal Stochastic Gradient method withCompression (C-RDPSG) for general stochastic settings. We provide rigoroustheoretical guarantees of C-RDPSG with gradient computation complexity andcommunication complexity of order $mathcal{O}( (1+delta)^4frac{1}{L^2}{kappa_f^2}kappa_g^2 frac{1}{epsilon} )$, to achieve an$epsilon$-accurate saddle-point solution, where $delta$ denotes thecompression factor, $kappa_f$ and $kappa_g$ denote respectively the conditionnumbers of objective function and communication graph, and $L$ denotes thesmoothness parameter of the smooth part of the objective function. Next, wepresent a Decentralized Proximal Stochastic Variance Reduced Gradient algorithmwith Compression (C-DPSVRG) for finite sum setting which exhibits gradientcomputation complexity and communication complexity of order $mathcal{O}left((1+delta) max {kappa_f^2, sqrt{delta}kappa^2_fkappa_g,kappa_g }logleft(frac{1}{epsilon}right) right)$. Extensive numerical experimentsshow competitive performance of the proposed algorithms and provide support tothe theoretical results obtained.", "output": "Stochastic Gradient Methods with Compressed Communication for Decentralized Saddle Point Problems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data-driven, learning-based control methods offer the potential to improveoperations in complex systems, and model-free deep reinforcement learningrepresents a popular approach to data-driven control. However, existing classesof algorithms present a trade-off between two important deployment requirementsfor real-world control: (i) practical performance guarantees and (ii) dataefficiency. Off-policy algorithms make efficient use of data through samplereuse but lack theoretical guarantees, while on-policy algorithms guaranteeapproximate policy improvement throughout training but suffer from high samplecomplexity. In order to balance these competing goals, we develop a class ofGeneralized Policy Improvement algorithms that combines the policy improvementguarantees of on-policy methods with the efficiency of sample reuse. Wedemonstrate the benefits of this new class of algorithms through extensiveexperimental analysis on a variety of continuous control tasks from theDeepMind Control Suite.", "output": "Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In deterministic optimization, it is typically assumed that all problemparameters are fixed and known. In practice, however, some parameters may be apriori unknown but can be estimated from historical data. A typicalpredict-then-optimize approach separates predictions and optimization into twostages. Recently, end-to-end predict-then-optimize has become an attractivealternative. In this work, we present the PyEPO package, a PyTorchbasedend-to-end predict-then-optimize library in Python. To the best of ourknowledge, PyEPO (pronounced like pineapple with a silent \"n\") is the firstsuch generic tool for linear and integer programming with predicted objectivefunction coefficients. It provides four base algorithms: a convex surrogateloss function from the seminal work of Elmachtoub and Grigas [16], adifferentiable black-box solver approach of Pogancic et al. [35], and twodifferentiable perturbation-based methods from Berthet et al. [6]. PyEPOprovides a simple interface for the definition of new optimization problems,the implementation of state-of-the-art predict-then-optimize trainingalgorithms, the use of custom neural network architectures, and the comparisonof end-to-end approaches with the two-stage approach. PyEPO enables us toconduct a comprehensive set of experiments comparing a number of end-to-end andtwo-stage approaches along axes such as prediction accuracy, decision quality,and running time on problems such as Shortest Path, Multiple Knapsack, and theTraveling Salesperson Problem. We discuss some empirical insights from theseexperiments, which could guide future research. PyEPO and its documentation areavailable at ", "output": "PyEPO: A PyTorch-based End-to-End Predict-then-Optimize Library for Linear and Integer Programming."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A growing body of work studies Blindspot Discovery Methods (\"BDM\"s): methodsthat use an image embedding to find semantically meaningful (i.e., united by ahuman-understandable concept) subsets of the data where an image classifierperforms significantly worse. Motivated by observed gaps in prior work, weintroduce a new framework for evaluating BDMs, SpotCheck, that uses syntheticimage datasets to train models with known blindspots and a new BDM, PlaneSpot,that uses a 2D image representation. We use SpotCheck to run controlledexperiments that identify factors that influence BDM performance (e.g., thenumber of blindspots in a model, or features used to define the blindspot) andshow that PlaneSpot is competitive with and in many cases outperforms existingBDMs. Importantly, we validate these findings by designing additionalexperiments that use real image data from MS-COCO, a large image benchmarkdataset. Our findings suggest several promising directions for future work onBDM design and evaluation. Overall, we hope that the methodology and analysespresented in this work will help facilitate a more rigorous science ofblindspot discovery.", "output": "Towards a More Rigorous Science of Blindspot Discovery in Image Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work studies the threats of adversarial attack on multivariateprobabilistic forecasting models and viable defense mechanisms. Our studiesdiscover a new attack pattern that negatively impact the forecasting of atarget time series via making strategic, sparse (imperceptible) modificationsto the past observations of a small number of other time series. To mitigatethe impact of such attack, we have developed two defense strategies. First, weextend a previously developed randomized smoothing technique in classificationto multivariate forecasting scenarios. Second, we develop an adversarialtraining algorithm that learns to create adversarial examples and at the sametime optimizes the forecasting model to improve its robustness against suchadversarial simulation. Extensive experiments on real-world datasets confirmthat our attack schemes are powerful and our defense algorithms are moreeffective compared with baseline defense mechanisms.", "output": "Robust Multivariate Time-Series Forecasting: Adversarial Attacks and Defense Mechanisms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Normalizing Flows (NF) are powerful likelihood-based generative models thatare able to trade off between expressivity and tractability to model complexdensities. A now well established research avenue leverages optimal transport(OT) and looks for Monge maps, i.e. models with minimal effort between thesource and target distributions. This paper introduces a method based onBrenier's polar factorization theorem to transform any trained NF into a moreOT-efficient version without changing the final density. We do so by learning arearrangement of the source (Gaussian) distribution that minimizes the OT costbetween the source and the final density. We further constrain the path leadingto the estimated Monge map to lie on a geodesic in the space ofvolume-preserving diffeomorphisms thanks to Euler's equations. The proposedmethod leads to smooth flows with reduced OT cost for several existing modelswithout affecting the model performance.", "output": "Turning Normalizing Flows into Monge Maps with Geodesic Gaussian Preserving Flows."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Regularized optimal transport (OT) is now increasingly used as a loss or as amatching layer in neural networks. Entropy-regularized OT can be computed usingthe Sinkhorn algorithm but it leads to fully-dense transportation plans,meaning that all sources are (fractionally) matched with all targets. Toaddress this issue, several works have investigated quadratic regularizationinstead. This regularization preserves sparsity and leads to unconstrained andsmooth (semi) dual objectives, that can be solved with off-the-shelf gradientmethods. Unfortunately, quadratic regularization does not give direct controlover the cardinality (number of nonzeros) of the transportation plan. Wepropose in this paper a new approach for OT with explicit cardinalityconstraints on the transportation plan. Our work is motivated by an applicationto sparse mixture of experts, where OT can be used to match input tokens suchas image patches with expert models such as neural networks. Cardinalityconstraints ensure that at most $k$ tokens are matched with an expert, which iscrucial for computational performance reasons. Despite the nonconvexity ofcardinality constraints, we show that the corresponding (semi) dual problemsare tractable and can be solved with first-order gradient methods. Our methodcan be thought as a middle ground between unregularized OT (recovered in thelimit case $k=1$) and quadratically-regularized OT (recovered when $k$ is largeenough). The smoothness of the objectives increases as $k$ increases, givingrise to a trade-off between convergence speed and sparsity of the optimal plan.", "output": "Sparsity-Constrained Optimal Transport."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Temporal graph is an abstraction for modeling dynamic systems that consist ofevolving interaction elements. In this paper, we aim to solve an important yetneglected problem -- how to learn information from high-order neighbors intemporal graphs? -- to enhance the informativeness and discriminativeness forthe learned node representations. We argue that when learning high-orderinformation from temporal graphs, we encounter two challenges, i.e.,computational inefficiency and over-smoothing, that cannot be solved byconventional techniques applied on static graphs. To remedy these deficiencies,we propose a temporal propagation-based graph neural network, namely TPGNN. Tobe specific, the model consists of two distinct components, i.e., propagatorand node-wise encoder. The propagator is leveraged to propagate messages fromthe anchor node to its temporal neighbors within $k$-hop, and thensimultaneously update the state of neighborhoods, which enables efficientcomputation, especially for a deep model. In addition, to preventover-smoothing, the model compels the messages from $n$-hop neighbors to updatethe $n$-hop memory vector preserved on the anchor. The node-wise encoder adoptstransformer architecture to learn node representations by explicitly learningthe importance of memory vectors preserved on the node itself, that is,implicitly modeling the importance of messages from neighbors at differentlayers, thus mitigating the over-smoothing. Since the encoding process will notquery temporal neighbors, we can dramatically save time consumption ininference. Extensive experiments on temporal link prediction and nodeclassification demonstrate the superiority of TPGNN over state-of-the-artbaselines in efficiency and robustness.", "output": "TPGNN: Learning High-order Information in Dynamic Graphs via Temporal Propagation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dynamic graph neural network (DGNN) is becoming increasingly popular becauseof its widespread use in capturing dynamic features in the real world. Avariety of dynamic graph neural networks designed from algorithmic perspectiveshave succeeded in incorporating temporal information into graph processing.Despite the promising algorithmic performance, deploying DGNNs on hardwarepresents additional challenges due to the model complexity, diversity, and thenature of the time dependency. Meanwhile, the differences between DGNNs andstatic graph neural networks make hardware-related optimizations for staticgraph neural networks unsuitable for DGNNs. In this paper, we select eightprevailing DGNNs with different characteristics and profile them on both CPUand GPU. The profiling results are summarized and analyzed, providing in-depthinsights into the bottlenecks of DGNNs on hardware and identifying potentialoptimization opportunities for future DGNN acceleration. Followed by acomprehensive survey, we provide a detailed analysis of DGNN performancebottlenecks on hardware, including temporal data dependency, workloadimbalance, data movement, and GPU warm-up. We suggest several optimizationsfrom both software and hardware perspectives. This paper is the first toprovide an in-depth analysis of the hardware performance of DGNN Code isavailable at ", "output": "Bottleneck Analysis of Dynamic Graph Neural Network Inference on CPU and GPU."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Weather and climate simulations produce petabytes of high-resolution datathat are later analyzed by researchers in order to understand climate change orsevere weather. We propose a new method of compressing this multidimensionalweather and climate data: a coordinate-based neural network is trained tooverfit the data, and the resulting parameters are taken as a compactrepresentation of the original grid-based data. While compression ratios rangefrom 300x to more than 3,000x, our method outperforms the state-of-the-artcompressor SZ3 in terms of weighted RMSE, MAE. It can faithfully preserveimportant large scale atmosphere structures and does not introduce artifacts.When using the resulting neural network as a 790x compressed dataloader totrain the WeatherBench forecasting model, its RMSE increases by less than 2%.The three orders of magnitude compression democratizes access tohigh-resolution climate data and enables numerous new research directions.", "output": "Compressing multidimensional weather and climate data into neural networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Uncertainty quantification is a central challenge in reliable and trustworthymachine learning. Naive measures such as last-layer scores are well-known toyield overconfident estimates in the context of overparametrized neuralnetworks. Several methods, ranging from temperature scaling to differentBayesian treatments of neural networks, have been proposed to mitigateoverconfidence, most often supported by the numerical observation that theyyield better calibrated uncertainty measures. In this work, we provide a sharpcomparison between popular uncertainty measures for binary classification in amathematically tractable model for overparametrized neural networks: the randomfeatures model. We discuss a trade-off between classification accuracy andcalibration, unveiling a double descent like behavior in the calibration curveof optimally regularized estimators as a function of overparametrization. Thisis in contrast with the empirical Bayes method, which we show to be wellcalibrated in our setting despite the higher generalization error andoverparametrization.", "output": "A study of uncertainty quantification in overparametrized high-dimensional models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Performance monitoring of machine learning (ML)-based risk prediction modelsin healthcare is complicated by the issue of confounding medical interventions(CMI): when an algorithm predicts a patient to be at high risk for an adverseevent, clinicians are more likely to administer prophylactic treatment andalter the very target that the algorithm aims to predict. A simple approach isto ignore CMI and monitor only the untreated patients, whose outcomes remainunaltered. In general, ignoring CMI may inflate Type I error because (i)untreated patients disproportionally represent those with low predicted riskand (ii) evolution in both the model and clinician trust in the model caninduce complex dependencies that violate standard assumptions. Nevertheless, weshow that valid inference is still possible if one monitors conditionalperformance and if either conditional exchangeability or time-constantselection bias hold. Specifically, we develop a new score-based cumulative sum(CUSUM) monitoring procedure with dynamic control limits. Through simulations,we demonstrate the benefits of combining model updating with monitoring andinvestigate how over-trust in a prediction model may delay detection ofperformance deterioration. Finally, we illustrate how these monitoring methodscan be used to detect calibration decay of an ML-based risk calculator forpostoperative nausea and vomiting during the COVID-19 pandemic.", "output": "Monitoring machine learning (ML)-based risk prediction algorithms in the presence of confounding medical interventions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Algorithmic fairness plays an increasingly critical role in machine learningresearch. Several group fairness notions and algorithms have been proposed.However, the fairness guarantee of existing fair classification methods mainlydepends on specific data distributional assumptions, often requiring largesample sizes, and fairness could be violated when there is a modest number ofsamples, which is often the case in practice. In this paper, we propose FaiREE,a fair classification algorithm that can satisfy group fairness constraintswith finite-sample and distribution-free theoretical guarantees. FaiREE can beadapted to satisfy various group fairness notions (e.g., Equality ofOpportunity, Equalized Odds, Demographic Parity, etc.) and achieve the optimalaccuracy. These theoretical guarantees are further supported by experiments onboth synthetic and real data. FaiREE is shown to have favorable performanceover state-of-the-art algorithms.", "output": "FaiREE: Fair Classification with Finite-Sample and Distribution-Free Guarantee."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In off-policy reinforcement learning, a behaviour policy performs exploratoryinteractions with the environment to obtain state-action-reward samples whichare then used to learn a target policy that optimises the expected return. Thisleads to a problem of off-policy evaluation, where one needs to evaluate thetarget policy from samples collected by the often unrelated behaviour policy.Importance sampling is a traditional statistical technique that is oftenapplied to off-policy evaluation. While importance sampling estimators areunbiased, their variance increases exponentially with the horizon of thedecision process due to computing the importance weight as a product of actionprobability ratios, yielding estimates with low accuracy for domains involvinglong-term planning. This paper proposes state-based importance sampling (SIS),which drops the action probability ratios of sub-trajectories with \"negligiblestates\" -- roughly speaking, those for which the chosen actions have no impacton the return estimate -- from the computation of the importance weight.Theoretical results demonstrate a smaller exponent for the variance upper boundas well as a lower mean squared error. To identify negligible states, twosearch algorithms are proposed, one based on covariance testing and one basedon state-action values. Using the formulation of SIS, we then analogouslyformulate state-based variants of weighted importance sampling, per-decisionimportance sampling, and incremental importance sampling based on thestate-action value identification algorithm. Moreover, we note that doublyrobust estimators may also benefit from SIS. Experiments in two gridworlddomains and one inventory management domain show that state-based methods yieldreduced variance and improved accuracy.", "output": "Low Variance Off-policy Evaluation with State-based Importance Sampling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current work in named entity recognition (NER) uses either cross entropy (CE)or conditional random fields (CRF) as the objective/loss functions to optimizethe underlying NER model. Both of these traditional objective functions for theNER problem generally produce adequate performance when the data distributionis balanced and there are sufficient annotated training examples. But since NERis inherently an imbalanced tagging problem, the model performance under thelow-resource settings could suffer using these standard objective functions.Based on recent advances in area under the ROC curve (AUC) maximization, wepropose to optimize the NER model by maximizing the AUC score. We give evidencethat by simply combining two binary-classifiers that maximize the AUC score,significant performance improvement over traditional loss functions is achievedunder low-resource NER settings. We also conduct extensive experiments todemonstrate the advantages of our method under the low-resource andhighly-imbalanced data distribution settings. To the best of our knowledge,this is the first work that brings AUC maximization to the NER setting.Furthermore, we show that our method is agnostic to different types of NERembeddings, models and domains. The code to replicate this work will beprovided upon request.", "output": "AUC Maximization for Low-Resource Named Entity Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The evolution of wireless communications into 6G and beyond is expected torely on new machine learning (ML)-based capabilities. These can enableproactive decisions and actions from wireless-network components to sustainquality-of-service (QoS) and user experience. Moreover, new use cases in thearea of vehicular and industrial communications will emerge. Specifically inthe area of vehicle communication, vehicle-to-everything (V2X) schemes willbenefit strongly from such advances. With this in mind, we have conducted adetailed measurement campaign that paves the way to a plethora of diverseML-based studies. The resulting datasets offer GPS-located wirelessmeasurements across diverse urban environments for both cellular (with twodifferent operators) and sidelink radio access technologies, thus enabling avariety of different studies towards V2X. The datasets are labeled and sampledwith a high time resolution. Furthermore, we make the data publicly availablewith all the necessary information to support the onboarding of newresearchers. We provide an initial analysis of the data showing some of thechallenges that ML needs to overcome and the features that ML can leverage, aswell as some hints at potential research studies.", "output": "Berlin V2X: A Machine Learning Dataset from Multiple Vehicles and Radio Access Technologies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Influenced mixed moving average fields are a versatile modeling class forspatio-temporal data. However, their predictive distribution is not generallyaccessible. Under this modeling assumption, we define a novel theory-guidedmachine learning approach that employs a generalized Bayesian algorithm to makepredictions. We employ a Lipschitz predictor, for example, a linear model or afeed-forward neural network, and determine a randomized estimator by minimizinga novel PAC Bayesian bound for data serially correlated along a spatial andtemporal dimension. Performing causal future predictions is a highlight of ourmethodology as its potential application to data with short and long-rangedependence. We conclude by showing the performance of the learning methodologyin an example with linear predictors and simulated spatio-temporal data from anSTOU process.", "output": "Mixed moving average field guided learning for spatio-temporal data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose to integrate weapon system features (such as weapon systemmanufacturer, deployment time and location, storage time and location, etc.)into a parameterized Cox-Weibull [1] reliability model via a neural network,like DeepSurv [2], to improve predictive maintenance. In parallel, we developan alternative Bayesian model by parameterizing the Weibull parameters with aneural network and employing dropout methods such as Monte-Carlo (MC)-dropoutfor comparative purposes. Due to data collection procedures in weapon systemtesting we employ a novel interval-censored log-likelihood which incorporatesMonte-Carlo Markov Chain (MCMC) [3] sampling of the Weibull parameters duringgradient descent optimization. We compare classification metrics such asreceiver operator curve (ROC) area under the curve (AUC), precision-recall (PR)AUC, and F scores to show our model generally outperforms traditional powerfulmodels such as XGBoost and the current standard conditional Weibull probabilitydensity estimation model.", "output": "Bayesian Weapon System Reliability Modeling with Cox-Weibull Neural Network."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper we introduce the Temporo-Spatial Vision Transformer (TSViT), afully-attentional model for general Satellite Image Time Series (SITS)processing based on the Vision Transformer (ViT). TSViT splits a SITS recordinto non-overlapping patches in space and time which are tokenized andsubsequently processed by a factorized temporo-spatial encoder. We argue, thatin contrast to natural images, a temporal-then-spatial factorization is moreintuitive for SITS processing and present experimental evidence for this claim.Additionally, we enhance the model's discriminative power by introducing twonovel mechanisms for acquisition-time-specific temporal positional encodingsand multiple learnable class tokens. The effect of all novel design choices isevaluated through an extensive ablation study. Our proposed architectureachieves state-of-the-art performance, surpassing previous approaches by asignificant margin in three publicly available SITS semantic segmentation andclassification datasets. All model, training and evaluation codes are madepublicly available to facilitate further research.", "output": "ViTs for SITS: Vision Transformers for Satellite Image Time Series."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study automated intrusion response and formulate the interaction betweenan attacker and a defender as an optimal stopping game where attack and defensestrategies evolve through reinforcement learning and self-play. Thegame-theoretic modeling enables us to find defender strategies that areeffective against a dynamic attacker, i.e. an attacker that adapts its strategyin response to the defender strategy. Further, the optimal stopping formulationallows us to prove that optimal strategies have threshold properties. To obtainnear-optimal defender strategies, we develop Threshold Fictitious Self-Play(T-FP), a fictitious self-play algorithm that learns Nash equilibria throughstochastic approximation. We show that T-FP outperforms a state-of-the-artalgorithm for our use case. The experimental part of this investigationincludes two systems: a simulation system where defender strategies areincrementally learned and an emulation system where statistics are collectedthat drive simulation runs and where learned strategies are evaluated. We arguethat this approach can produce effective defender strategies for a practical ITinfrastructure.", "output": "Learning Near-Optimal Intrusion Responses Against Dynamic Attackers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph Neural Networks (GNNs) are often used for tasks involving the geometryof a given graph, such as molecular dynamics simulation. Although the distancematrix of a geometric graph contains complete geometric information, it hasbeen demonstrated that Message Passing Neural Networks (MPNNs) are insufficientfor learning this geometry. In this work, we expand on the families ofcounterexamples that MPNNs are unable to distinguish from their distancematrices, by constructing families of novel and symmetric geometric graphs. Wethen propose $k$-DisGNNs, which can effectively exploit the rich geometrycontained in the distance matrix. We demonstrate the high expressive power ofour models by proving the universality of $k$-DisGNNs for distinguishinggeometric graphs when $k geq 3$, and that some existing well-designedgeometric models can be unified by $k$-DisGNNs as special cases. Mostimportantly, we establish a connection between geometric deep learning andtraditional graph representation learning, showing that those highly expressiveGNN models originally designed for graph structure learning can also be appliedto geometric deep learning problems with impressive performance, and thatexisting complex, equivariant models are not the only solution. Experimentalresults verify our theory.", "output": "Is Distance Matrix Enough for Geometric Deep Learning?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Teachers and students are increasingly relying on online learning resourcesto supplement the ones provided in school. This increase in the breadth anddepth of available resources is a great thing for students, but only providedthey are able to find answers to their queries. Question-answering andinformation retrieval systems have benefited from public datasets to train andevaluate their algorithms, but most of these datasets have been in English textwritten by and for adults. We introduce a new public French question-answeringdataset collected from Alloprof, a Quebec-based primary and high-school helpwebsite, containing 29 349 questions and their explanations in a variety ofschool subjects from 10 368 students, with more than half of the explanationscontaining links to other questions or some of the 2 596 reference pages on thewebsite. We also present a case study of this dataset in an informationretrieval task. This dataset was collected on the Alloprof public forum, withall questions verified for their appropriateness and the explanations verifiedboth for their appropriateness and their relevance to the question. To predictrelevant documents, architectures using pre-trained BERT models were fine-tunedand evaluated. This dataset will allow researchers to developquestion-answering, information retrieval and other algorithms specifically forthe French speaking education context. Furthermore, the range of languageproficiency, images, mathematical symbols and spelling mistakes willnecessitate algorithms based on a multimodal comprehension. The case study wepresent as a baseline shows an approach that relies on recent techniquesprovides an acceptable performance level, but more work is necessary before itcan reliably be used and trusted in a production setting.", "output": "Alloprof: a new French question-answer education dataset and its use in an information retrieval case study."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recommendation systems are a core feature of social media companies withtheir uses including recommending organic and promoted contents. Many modernrecommendation systems are split into multiple stages - candidate generationand heavy ranking - to balance computational cost against recommendationquality. We focus on the candidate generation phase of a large-scale adsrecommendation problem in this paper, and present a machine learning firstheterogeneous re-architecture of this stage which we term TwERC. We show that asystem that combines a real-time light ranker with sourcing strategies capableof capturing additional information provides validated gains. We present twostrategies. The first strategy uses a notion of similarity in the interactiongraph, while the second strategy caches previous scores from the ranking stage.The graph based strategy achieves a 4.08% revenue gain and the rankscore basedstrategy achieves a 1.38% gain. These two strategies have biases thatcomplement both the light ranker and one another. Finally, we describe a set ofmetrics that we believe are valuable as a means of understanding the complexproduct trade offs inherent in industrial candidate generation systems.", "output": "TwERC: High Performance Ensembled Candidate Generation for Ads Recommendation at Twitter."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Past work in natural language processing interpretability focused mainly onpopular classification tasks while largely overlooking generation settings,partly due to a lack of dedicated tools. In this work, we introduce Inseq, aPython library to democratize access to interpretability analyses of sequencegeneration models. Inseq enables intuitive and optimized extraction of models'internal information and feature importance scores for popular decoder-only andencoder-decoder Transformers architectures. We showcase its potential byadopting it to highlight gender biases in machine translation models and locatefactual knowledge inside GPT-2. Thanks to its extensible interface supportingcutting-edge techniques such as contrastive feature attribution, Inseq candrive future advances in explainable natural language generation, centralizinggood practices and enabling fair and reproducible model evaluations.", "output": "Inseq: An Interpretability Toolkit for Sequence Generation Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spatial objects often come with textual information, such as Points ofInterest (POIs) with their descriptions, which are referred to as geo-textualdata. To retrieve such data, spatial keyword queries that take into accountboth spatial proximity and textual relevance have been extensively studied.Existing indexes designed for spatial keyword queries are mostly built based onthe geo-textual data without considering the distribution of queries alreadyreceived. However, previous studies have shown that utilizing the known querydistribution can improve the index structure for future query processing. Inthis paper, we propose WISK, a learned index for spatial keyword queries, whichself-adapts for optimizing querying costs given a query workload. One keychallenge is how to utilize both structured spatial attributes and unstructuredtextual information during learning the index. We first divide the data objectsinto partitions, aiming to minimize the processing costs of the given queryworkload. We prove the NP-hardness of the partitioning problem and propose amachine learning model to find the optimal partitions. Then, to achieve morepruning power, we build a hierarchical structure based on the generatedpartitions in a bottom-up manner with a reinforcement learning-based approach.We conduct extensive experiments on real-world datasets and query workloadswith various distributions, and the results show that WISK outperforms allcompetitors, achieving up to 8x speedup in querying time with comparablestorage overhead.", "output": "WISK: A Workload-aware Learned Index for Spatial Keyword Queries."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In a federated learning (FL) system, malicious participants can easily embedbackdoors into the aggregated model while maintaining the model's performanceon the main task. To this end, various defenses, including training stageaggregation-based defenses and post-training mitigation defenses, have beenproposed recently. While these defenses obtain reasonable performance againstexisting backdoor attacks, which are mainly heuristics based, we show that theyare insufficient in the face of more advanced attacks. In particular, wepropose a general reinforcement learning-based backdoor attack framework wherethe attacker first trains a (non-myopic) attack policy using a simulator builtupon its local data and common knowledge on the FL system, which is thenapplied during actual FL training. Our attack framework is both adaptive andflexible and achieves strong attack performance and durability even understate-of-the-art defenses.", "output": "Learning to Backdoor Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Rewrite systems [6, 10, 12] have been widely employing equality saturation[9], which is an optimisation methodology that uses a saturated e-graph torepresent all possible sequences of rewrite simultaneously, and then extractsthe optimal one. As such, optimal results can be achieved by avoiding thephase-ordering problem. However, we observe that when the e-graph is notsaturated, it cannot represent all possible rewrite opportunities and thereforethe phase-ordering problem is re-introduced during the construction phase ofthe e-graph. To address this problem, we propose MCTS-GEB, a domain-generalrewrite system that applies reinforcement learning (RL) to e-graphconstruction. At its core, MCTS-GEB uses a Monte Carlo Tree Search (MCTS) [3]to efficiently plan for the optimal e-graph construction, and therefore it caneffectively eliminate the phase-ordering problem at the construction phase andachieve better performance within a reasonable time. Evaluation in twodifferent domains shows MCTS-GEB can outperform the state-of-the-art rewritesystems by up to 49x, while the optimisation can generally take less than anhour, indicating MCTS-GEB is a promising building block for the futuregeneration of rewrite systems.", "output": "MCTS-GEB: Monte Carlo Tree Search is a Good E-graph Builder."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of finding the optimal value of n in the n-steptemporal difference (TD) learning algorithm. We find the optimal n by resortingto a model-free optimization technique involving a one-simulation simultaneousperturbation stochastic approximation (SPSA) based procedure that we adopt tothe discrete optimization setting by using a random projection approach. Weprove the convergence of our proposed algorithm, SDPSA, using a differentialinclusions approach and show that it finds the optimal value of n in n-step TD.Through experiments, we show that the optimal value of n is achieved with SDPSAfor arbitrary initial values.", "output": "n-Step Temporal Difference Learning with Optimal n."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many natural language processing tasks benefit from long inputs, butprocessing long documents with Transformers is expensive -- not only due toquadratic attention complexity but also from applying feedforward andprojection layers to every token. However, not all tokens are equallyimportant, especially for longer documents. We propose CoLT5, a long-inputTransformer model that builds on this intuition by employing conditionalcomputation, devoting more resources to important tokens in both feedforwardand attention layers. We show that CoLT5 achieves stronger performance thanLongT5 with much faster training and inference, achieving SOTA on thelong-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractablymake use of extremely long inputs, showing strong gains up to 64k input length.", "output": "CoLT5: Faster Long-Range Transformers with Conditional Computation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While it is well known that population differences from genetics, sex, race,and environmental factors contribute to disease, AI studies in medicine havelargely focused on locoregional patient cohorts with less diverse data sources.Such limitation stems from barriers to large-scale data share and ethicalconcerns over data privacy. Federated learning (FL) is one potential pathwayfor AI development that enables learning across hospitals without data share.In this study, we show the results of various FL strategies on one of thelargest and most diverse COVID-19 chest CT datasets: 21 participating hospitalsacross five continents that comprise >10,000 patients with >1 million images.We also propose an FL strategy that leverages synthetically generated data toovercome class and size imbalances. We also describe the sources of dataheterogeneity in the context of FL, and show how even among the correctlylabeled populations, disparities can arise due to these biases.", "output": "AI Models Close to your Chest: Robust Federated Learning Strategies for Multi-site CT."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep ensemble is a simple and straightforward approach for approximatingBayesian inference and has been successfully applied to many classificationtasks. This study aims to comprehensively investigate this approach in themulti-output regression task to predict the aerodynamic performance of amissile configuration. By scrutinizing the effect of the number of neuralnetworks used in the ensemble, an obvious trend toward underconfidence inestimated uncertainty is observed. In this context, we propose the deepensemble framework that applies the post-hoc calibration method, and itsimproved uncertainty quantification performance is demonstrated. It is comparedwith Gaussian process regression, the most prevalent model for uncertaintyquantification in engineering, and is proven to have superior performance interms of regression accuracy, reliability of estimated uncertainty, andtraining efficiency. Finally, the impact of the suggested framework on theresults of Bayesian optimization is examined, showing that whether or not thedeep ensemble is calibrated can result in completely different explorationcharacteristics. This framework can be seamlessly applied and extended to anyregression task, as no special assumptions have been made for the specificproblem used in this study.", "output": "Towards Reliable Uncertainty Quantification via Deep Ensembles in Multi-output Regression Task."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the realm of machine learning, the study of anomaly detection andlocalization within image data has gained substantial traction, particularlyfor practical applications such as industrial defect detection. While themajority of existing methods predominantly use Convolutional Neural Networks(CNN) as their primary network architecture, we introduce a novel approachbased on the Transformer backbone network. Our method employs a two-stageincremental learning strategy. During the first stage, we train a MaskedAutoencoder (MAE) model solely on normal images. In the subsequent stage, weapply pixel-level data augmentation techniques to generate corrupted normalimages and their corresponding pixel labels. This process allows the model tolearn how to repair corrupted regions and classify the status of each pixel.Ultimately, the model generates a pixel reconstruction error matrix and a pixelanomaly probability matrix. These matrices are then combined to produce ananomaly scoring matrix that effectively detects abnormal regions. Whenbenchmarked against several state-of-the-art CNN-based methods, our approachexhibits superior performance on the MVTec AD dataset, achieving an impressive97.6% AUC.", "output": "ISSTAD: Incremental Self-Supervised Learning Based on Transformer for Anomaly Detection and Localization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated Learning (FL) aims to learn a single global model that enables thecentral server to help the model training in local clients without accessingtheir local data. The key challenge of FL is the heterogeneity of local data indifferent clients, such as heterogeneous label distribution and feature shift,which could lead to significant performance degradation of the learned models.Although many studies have been proposed to address the heterogeneous labeldistribution problem, few studies attempt to explore the feature shift issue.To address this issue, we propose a simple yet effective algorithm, namelytextbf{p}ersonalized textbf{Fed}erated learning with textbf{L}ocaltextbf{A}ttention (pFedLA), by incorporating the attention mechanism intopersonalized models of clients while keeping the attention blocksclient-specific. Specifically, two modules are proposed in pFedLA, i.e., thepersonalized single attention module and the personalized hybrid attentionmodule. In addition, the proposed pFedLA method is quite flexible and generalas it can be incorporated into any FL method to improve their performancewithout introducing additional communication costs. Extensive experimentsdemonstrate that the proposed pFedLA method can boost the performance ofstate-of-the-art FL methods on different tasks such as image classification andobject detection tasks.", "output": "Personalized Federated Learning with Local Attention."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the field of artificial intelligence for science, it is consistently anessential challenge to face a limited amount of labeled data for real-worldproblems. The prevailing approach is to pretrain a powerful task-agnostic modelon a large unlabeled corpus but may struggle to transfer knowledge todownstream tasks. In this study, we propose InstructMol, a semi-supervisedlearning algorithm, to take better advantage of unlabeled examples. Itintroduces an instructor model to provide the confidence ratios as themeasurement of pseudo-labels' reliability. These confidence scores then guidethe target model to pay distinct attention to different data points, avoidingthe over-reliance on labeled data and the negative influence of incorrectpseudo-annotations. Comprehensive experiments show that InstructBiosubstantially improves the generalization ability of molecular models, in notonly molecular property predictions but also activity cliff estimations,demonstrating the superiority of the proposed method. Furthermore, our evidenceindicates that InstructBio can be equipped with cutting-edge pretrainingmethods and used to establish large-scale and task-specific pseudo-labeledmolecular datasets, which reduces the predictive errors and shortens thetraining process. Our work provides strong evidence that semi-supervisedlearning can be a promising tool to overcome the data scarcity limitation andadvance molecular representation learning.", "output": "InstructBio: A Large-scale Semi-supervised Learning Paradigm for Biochemical Problems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The integration of renewable energy sources into the power grid is becomingincreasingly important as the world moves towards a more sustainable energyfuture in line with SDG 7. However, the intermittent nature of renewable energysources can make it challenging to manage the power grid and ensure a stablesupply of electricity, which is crucial for achieving SDG 9. In this paper, wepropose a deep learning-based approach for predicting energy demand in a smartpower grid, which can improve the integration of renewable energy sources byproviding accurate predictions of energy demand. Our approach aligns with SDG13 on climate action as it enables more efficient management of renewableenergy resources. We use long short-term memory networks, which are well-suitedfor time series data, to capture complex patterns and dependencies in energydemand data. The proposed approach is evaluated using four datasets ofhistorical short term energy demand data from different energy distributioncompanies including American Electric Power, Commonwealth Edison, Dayton Powerand Light, and Pennsylvania-New Jersey-Maryland Interconnection. The proposedmodel is also compared with three other state of the art forecasting algorithmsnamely, Facebook Prophet, Support Vector Regressor, and Random ForestRegressor. The experimental results show that the proposed REDf model canaccurately predict energy demand with a mean absolute error of 1.4%, indicatingits potential to enhance the stability and efficiency of the power grid andcontribute to achieving SDGs 7, 9, and 13. The proposed model also have thepotential to manage the integration of renewable energy sources in an effectivemanner.", "output": "Predicting Short Term Energy Demand in Smart Grid: A Deep Learning Approach for Integrating Renewable Energy Sources in Line with SDGs 7, 9, and 13."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the past few years, more and more AI applications have been applied toedge devices. However, models trained by data scientists with machine learningframeworks, such as PyTorch or TensorFlow, can not be seamlessly executed onedge. In this paper, we develop an end-to-end code generator parsing apre-trained model to C source libraries for the backend using MicroTVM, amachine learning compiler framework extension addressing inference on baremetal devices. An analysis shows that specific compute-intensive operators canbe easily offloaded to the dedicated accelerator with a Universal ModularAccelerator (UMA) interface, while others are processed in the CPU cores. Byusing the automatically generated ahead-of-time C runtime, we conduct a handgesture recognition experiment on an ARM Cortex M4F core.", "output": "Deploying Machine Learning Models to Ahead-of-Time Runtime on Edge Using MicroTVM."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent data regulations necessitate machine unlearning (MU): The removal ofthe effect of specific examples from the model. While exact unlearning ispossible by conducting a model retraining with the remaining data from scratch,its computational cost has led to the development of approximate but efficientunlearning schemes. Beyond data-centric MU solutions, we advance MU through anovel model-based viewpoint: sparsification via weight pruning. Our results inboth theory and practice indicate that model sparsity can boost themulti-criteria unlearning performance of an approximate unlearner, closing theapproximation gap, while continuing to be efficient. With this insight, wedevelop two new sparsity-aware unlearning meta-schemes, termed `prune first,then unlearn' and `sparsity-aware unlearning'. Extensive experiments show thatour findings and proposals consistently benefit MU in various scenarios,including class-wise data scrubbing, random data scrubbing, and backdoor dataforgetting. One highlight is the 77% unlearning efficacy gain of fine-tuning(one of the simplest approximate unlearning methods) in the proposedsparsity-aware unlearning paradigm. Codes are available at", "output": "Model Sparsification Can Simplify Machine Unlearning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The most relevant problems in discounted reinforcement learning involveestimating the mean of a function under the stationary distribution of a Markovreward process, such as the expected return in policy evaluation, or the policygradient in policy optimization. In practice, these estimates are producedthrough a finite-horizon episodic sampling, which neglects the mixingproperties of the Markov process. It is mostly unclear how this mismatchbetween the practical and the ideal setting affects the estimation, and theliterature lacks a formal study on the pitfalls of episodic sampling, and howto do it optimally. In this paper, we present a minimax lower bound on thediscounted mean estimation problem that explicitly connects the estimationerror with the mixing properties of the Markov process and the discount factor.Then, we provide a statistical analysis on a set of notable estimators and thecorresponding sampling procedures, which includes the finite-horizon estimatorsoften used in practice. Crucially, we show that estimating the mean by directlysampling from the discounted kernel of the Markov process brings compellingstatistical properties w.r.t. the alternative estimators, as it matches thelower bound without requiring a careful tuning of the episode horizon.", "output": "A Tale of Sampling and Estimation in Discounted Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Physics-informed neural networks (PINNs) have proven a suitable mathematicalscaffold for solving inverse ordinary (ODE) and partial differential equations(PDE). Typical inverse PINNs are formulated as soft-constrained multi-objectiveoptimization problems with several hyperparameters. In this work, wedemonstrate that inverse PINNs can be framed in terms of maximum-likelihoodestimators (MLE) to allow explicit error propagation from interpolation to thephysical model space through Taylor expansion, without the need ofhyperparameter tuning. We explore its application to high-dimensional coupledODEs constrained by differential algebraic equations that are common intransient chemical and biological kinetics. Furthermore, we show thatsingular-value decomposition (SVD) of the ODE coupling matrices (reactionstoichiometry matrix) provides reduced uncorrelated subspaces in which PINNssolutions can be represented and over which residuals can be projected.Finally, SVD bases serve as preconditioners for the inversion of covariancematrices in this hyperparameter-free robust application of MLE to``kinetics-informed neural networks''.", "output": "Maximum-likelihood Estimators in Physics-Informed Neural Networks for High-dimensional Inverse Problems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent applications of deep convolutional neural networks in medical imagingraise concerns about their interpretability. While most explainable deeplearning applications use post hoc methods (such as GradCAM) to generatefeature attribution maps, there is a new type of case-based reasoning models,namely ProtoPNet and its variants, which identify prototypes during trainingand compare input image patches with those prototypes. We propose the firstmedical prototype network (MProtoNet) to extend ProtoPNet to brain tumorclassification with 3D multi-parametric magnetic resonance imaging (mpMRI)data. To address different requirements between 2D natural images and 3D mpMRIsespecially in terms of localizing attention regions, a new attention modulewith soft masking and online-CAM loss is introduced. Soft masking helps sharpenattention maps, while online-CAM loss directly utilizes image-level labels whentraining the attention module. MProtoNet achieves statistically significantimprovements in interpretability metrics of both correctness and localizationcoherence (with a best activation precision of $0.713pm0.058$) withouthuman-annotated labels during training, when compared with GradCAM and severalProtoPNet variants. The source code is available at", "output": "MProtoNet: A Case-Based Interpretable Model for Brain Tumor Classification with 3D Multi-parametric Magnetic Resonance Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It has been reported that clustering-based topic models, which clusterhigh-quality sentence embeddings with an appropriate word selection method, cangenerate better topics than generative probabilistic topic models. However,these approaches suffer from the inability to select appropriate parameters andincomplete models that overlook the quantitative relation between words withtopics and topics with text. To solve these issues, we propose graph to topic(G2T), a simple but effective framework for topic modelling. The framework iscomposed of four modules. First, document representation is acquired usingpretrained language models. Second, a semantic graph is constructed accordingto the similarity between document representations. Third, communities indocument semantic graphs are identified, and the relationship between topicsand documents is quantified accordingly. Fourth, the word--topic distributionis computed based on a variant of TFIDF. Automatic evaluation suggests that G2Tachieved state-of-the-art performance on both English and Chinese documentswith different lengths. Human judgements demonstrate that G2T can producetopics with better interpretability and coverage than baselines. In addition,G2T can not only determine the topic number automatically but also give theprobabilistic distribution of words in topics and topics in documents. Finally,G2T is publicly available, and the distillation experiments provide instructionon how it works.", "output": "G2T: A Simple but Effective Framework for Topic Modeling based on Pretrained Language Model and Community Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spatial control is a core capability in controllable image generation.Advancements in layout-guided image generation have shown promising results onin-distribution (ID) datasets with similar spatial configurations. However, itis unclear how these models perform when facing out-of-distribution (OOD)samples with arbitrary, unseen layouts. In this paper, we propose LayoutBench,a diagnostic benchmark for layout-guided image generation that examines fourcategories of spatial control skills: number, position, size, and shape. Webenchmark two recent representative layout-guided image generation methods andobserve that the good ID layout control may not generalize well to arbitrarylayouts in the wild (e.g., objects at the boundary). Next, we proposeIterInpaint, a new baseline that generates foreground and background regions ina step-by-step manner via inpainting, demonstrating stronger generalizabilitythan existing models on OOD layouts in LayoutBench. We perform quantitative andqualitative evaluation and fine-grained analysis on the four LayoutBench skillsto pinpoint the weaknesses of existing models. Lastly, we show comprehensiveablation studies on IterInpaint, including training task ratio, crop&paste vs.repaint, and generation order. Project website: ", "output": "Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation."}] \ No newline at end of file