From fc0952e47351324381b193757f5a7ccc52f02c9c Mon Sep 17 00:00:00 2001 From: wangrongsheng Date: Sun, 30 Jul 2023 08:45:29 +0800 Subject: [PATCH] * update 2023-07-30 08:45:29 --- data/2023-07-30.json | 1 + 1 file changed, 1 insertion(+) create mode 100644 data/2023-07-30.json diff --git a/data/2023-07-30.json b/data/2023-07-30.json new file mode 100644 index 0000000..2bf7830 --- /dev/null +++ b/data/2023-07-30.json @@ -0,0 +1 @@ +[{"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recognizing handwritten digits is a challenging task primarily due to thediversity of writing styles and the presence of noisy images. The widely usedMNIST dataset, which is commonly employed as a benchmark for this task,includes distorted digits with irregular shapes, incomplete strokes, andvarying skew in both the training and testing datasets. Consequently, thesefactors contribute to reduced accuracy in digit recognition. To overcome thischallenge, we propose a two-stage deep learning approach. In the first stage,we create a simple neural network to identify distorted digits within thetraining set. This model serves to detect and filter out such distorted andambiguous images. In the second stage, we exclude these identified images fromthe training dataset and proceed to retrain the model using the filtereddataset. This process aims to improve the classification accuracy andconfidence levels while mitigating issues of underfitting and overfitting. Ourexperimental results demonstrate the effectiveness of the proposed approach,achieving an accuracy rate of over 99.5% on the testing dataset. Thissignificant improvement showcases the potential of our method in enhancingdigit classification accuracy. In our future work, we intend to explore thescalability of this approach and investigate techniques to further enhanceaccuracy by reducing the size of the training data.", "output": "Pruning Distorted Images in MNIST Handwritten Digits."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study a class of non-convex and non-smooth problems with textit{rank}regularization to promote sparsity in optimal solution. We propose to apply theproximal gradient descent method to solve the problem and accelerate theprocess with a novel support set projection operation on the singular values ofthe intermediate update. We show that our algorithms are able to achieve aconvergence rate of $O(frac{1}{t^2})$, which is exactly same as Nesterov'soptimal convergence rate for first-order methods on smooth and convex problems.Strict sparsity can be expected and the support set of singular values duringeach update is monotonically shrinking, which to our best knowledge, is novelin momentum-based algorithms.", "output": "Strictly Low Rank Constraint Optimization -- An Asymptotically $\\mathcal{O}(\\frac{1}{t^2})$ Method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Mobile edge computing (MEC) is essential for next-generation mobile networkapplications that prioritize various performance metrics, including delays andenergy consumption. However, conventional single-objective scheduling solutionscannot be directly applied to practical systems in which the preferences ofthese applications (i.e., the weights of different objectives) are oftenunknown or challenging to specify in advance. In this study, we address thisissue by formulating a multi-objective offloading problem for MEC with multipleedges to minimize expected long-term energy consumption and transmission delaywhile considering unknown preferences as parameters. To address the challengeof unknown preferences, we design a multi-objective (deep) reinforcementlearning (MORL)-based resource scheduling scheme with proximal policyoptimization (PPO). In addition, we introduce a well-designed state encodingmethod for constructing features for multiple edges in MEC systems, asophisticated reward function for accurately computing the utilities of delayand energy consumption. Simulation results demonstrate that our proposed MORLscheme enhances the hypervolume of the Pareto front by up to 233.1% compared tobenchmarks. Our full framework is available at", "output": "Multi-objective Deep Reinforcement Learning for Mobile Edge Computing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents an AI-assisted programming tool called Copilot for Xcodefor program composition and design to support human software developers. Byseamlessly integrating cloud-based Large Language Models (LLM) with Apple'slocal development environment, Xcode, this tool enhances productivity andunleashes creativity for software development in Apple software ecosystem(e.g., iOS apps, macOS). Leveraging advanced natural language processing (NLP)techniques, Copilot for Xcode effectively processes source code tokens andpatterns within code repositories, enabling features such as code generation,autocompletion, documentation, and error detection. Software developers canalso query and make \"small\" decisions for program composition, some of whichcan be made simultaneously, and this is facilitated through prompt engineeringin a chat interface of Copilot for Xcode. Finally, we present simple casestudies as evidence of the effectiveness of utilizing NLP in Xcode to promptpopular LLM services like OpenAI ChatGPT for program composition and design.", "output": "Copilot for Xcode: Exploring AI-Assisted Programming by Prompting Cloud-based Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We are in the process of building complex highly autonomous systems that havebuild-in beliefs, perceive their environment and exchange information. Thesesystems construct their respective world view and based on it they plan theirfuture manoeuvres, i.e., they choose their actions in order to establish theirgoals based on their prediction of the possible futures. Usually these systemsface an overwhelming flood of information provided by a variety of sourceswhere by far not everything is relevant. The goal of our work is to develop aformal approach to determine what is relevant for a safety critical autonomoussystem at its current mission, i.e., what information suffices to build anappropriate world view to accomplish its mission goals.", "output": "Framing Relevance for Safety-Critical Autonomous Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This study presents an ensemble model combining LSTM, BiLSTM, CNN, GRU, andGloVe to classify gene mutations using Kaggle's Personalized Medicine:Redefining Cancer Treatment dataset. The results were compared againstwell-known transformers like as BERT, Electra, Roberta, XLNet, Distilbert, andtheir LSTM ensembles. Our model outperformed all other models in terms ofaccuracy, precision, recall, F1 score, and Mean Squared Error. Surprisingly, italso needed less training time, resulting in a perfect combination ofperformance and efficiency. This study demonstrates the utility of ensemblemodels for difficult tasks such as gene mutation classification.", "output": "A Hybrid Machine Learning Model for Classifying Gene Mutations in Cancer using LSTM, BiLSTM, CNN, GRU, and GloVe."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Distributionally Robust Optimization (DRO), which aims to find an optimaldecision that minimizes the worst case cost over the ambiguity set ofprobability distribution, has been widely applied in diverse applications,e.g., network behavior analysis, risk management, etc. However, existing DROtechniques face three key challenges: 1) how to deal with the asynchronousupdating in a distributed environment; 2) how to leverage the priordistribution effectively; 3) how to properly adjust the degree of robustnessaccording to different scenarios. To this end, we propose an asynchronousdistributed algorithm, named Asynchronous Single-looP alternatIve gRadientprojEction (ASPIRE) algorithm with the itErative Active SEt method (EASE) totackle the federated distributionally robust optimization (FDRO) problem.Furthermore, a new uncertainty set, i.e., constrained D-norm uncertainty set,is developed to effectively leverage the prior distribution and flexiblycontrol the degree of robustness. Finally, our theoretical analysis elucidatesthat the proposed algorithm is guaranteed to converge and the iterationcomplexity is also analyzed. Extensive empirical studies on real-world datasetsdemonstrate that the proposed method can not only achieve fast convergence, andremain robust against data heterogeneity as well as malicious attacks, but alsotradeoff robustness with performance.", "output": "Federated Distributionally Robust Optimization with Non-Convex Objectives: Algorithm and Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces a general approach for synthesizing procedural modelsof the state-transitions of a given discrete system. The approach is general inthat it accepts different target languages for modeling the state-transitionsof a discrete system; different model acquisition tasks with different targetlanguages, such as the synthesis of STRIPS action models, or the update rule ofa cellular automaton, fit as particular instances of our general approach. Wefollow an inductive approach to synthesis meaning that a set of examples ofstate-transitions, represented as (pre-state, action, post-state) tuples, aregiven as input. The goal is to synthesize a structured program that, whenexecuted on a given pre-state, outputs its associated post-state. Our synthesismethod implements a combinatorial search in the space of well-structuredterminating programs that can be built using a Random-Access Machine (RAM),with a minimalist instruction set, and a finite amount of memory. Thecombinatorial search is guided with functions that asses the complexity of thecandidate programs, as well as their fitness to the given input set ofexamples.", "output": "Synthesis of Procedural Models for Deterministic Transition Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The advancement of Large Language Models (LLMs), including GPT-4, providesexciting new opportunities for generative design. We investigate theapplication of this tool across the entire design and manufacturing workflow.Specifically, we scrutinize the utility of LLMs in tasks such as: converting atext-based prompt into a design specification, transforming a design intomanufacturing instructions, producing a design space and design variations,computing the performance of a design, and searching for designs predicated onperformance. Through a series of examples, we highlight both the benefits andthe limitations of the current LLMs. By exposing these limitations, we aspireto catalyze the continued improvement and progression of these models.", "output": "How Can Large Language Models Help Humans in Design and Manufacturing?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep edge intelligence aims to deploy deep learning models that demandcomputationally expensive training in the edge network with limitedcomputational power. Moreover, many deep edge intelligence applications requirehandling distributed data that cannot be transferred to a central server due toprivacy concerns. Decentralized learning methods, such as federated learning,offer solutions where models are learned collectively by exchanging learnedweights. However, they often require complex models that edge devices may nothandle and multiple rounds of network communication to achieve state-of-the-artperformances. This study proposes a convolutional ensemble learning approach,coined EdgeConvEns, that facilitates training heterogeneous weak models on edgeand learning to ensemble them where data on edge are heterogeneouslydistributed. Edge models are implemented and trained independently onField-Programmable Gate Array (FPGA) devices with various computationalcapacities. Learned data representations are transferred to a central serverwhere the ensemble model is trained with the learned features received from theedge devices to boost the overall prediction performance. Extensive experimentsdemonstrate that the EdgeConvEns can outperform the state-of-the-artperformance with fewer communications and less data in various trainingscenarios.", "output": "EdgeConvEns: Convolutional Ensemble Learning for Edge Intelligence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-Task Learning (MTL) aims to learn multiple tasks simultaneously whileexploiting their mutual relationships. By using shared resources tosimultaneously calculate multiple outputs, this learning paradigm has thepotential to have lower memory requirements and inference times compared to thetraditional approach of using separate methods for each task. Previous work inMTL has mainly focused on fully-supervised methods, as task relationships cannot only be leveraged to lower the level of data-dependency of those methodsbut they can also improve performance. However, MTL introduces a set ofchallenges due to a complex optimisation scheme and a higher labelingrequirement. This review focuses on how MTL could be utilised under differentpartial supervision settings to address these challenges. First, this reviewanalyses how MTL traditionally uses different parameter sharing techniques totransfer knowledge in between tasks. Second, it presents the differentchallenges arising from such a multi-objective optimisation scheme. Third, itintroduces how task groupings can be achieved by analysing task relationships.Fourth, it focuses on how partially supervised methods applied to MTL cantackle the aforementioned challenges. Lastly, this review presents theavailable datasets, tools and benchmarking results of such methods.", "output": "When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) collaboratively models user data in a decentralizedway. However, in the real world, non-identical and independent datadistributions (non-IID) among clients hinder the performance of FL due to threeissues, i.e., (1) the class statistics shifting, (2) the insufficienthierarchical information utilization, and (3) the inconsistency in aggregatingclients. To address the above issues, we propose HyperFed which contains threemain modules, i.e., hyperbolic prototype Tammes initialization (HPTI),hyperbolic prototype learning (HPL), and consistent aggregation (CA). Firstly,HPTI in the server constructs uniformly distributed and fixed class prototypes,and shares them with clients to match class statistics, further guidingconsistent feature representation for local clients. Secondly, HPL in eachclient captures the hierarchical information in local data with the supervisionof shared class prototypes in the hyperbolic model space. Additionally, CA inthe server mitigates the impact of the inconsistent deviations from clients toserver. Extensive studies of four datasets prove that HyperFed is effective inenhancing the performance of FL under the non-IID set.", "output": "HyperFed: Hyperbolic Prototypes Exploration with Consistent Aggregation for Non-IID Data in Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Random Walks-based Anomaly Detection (RWAD) is commonly used to identifyanomalous patterns in various applications. An intriguing characteristic ofRWAD is that the input graph can either be pre-existing or constructed from rawfeatures. Consequently, there are two potential attack surfaces against RWAD:graph-space attacks and feature-space attacks. In this paper, we explore thisvulnerability by designing practical dual-space attacks, investigating theinterplay between graph-space and feature-space attacks. To this end, weconduct a thorough complexity analysis, proving that attacking RWAD is NP-hard.Then, we proceed to formulate the graph-space attack as a bi-level optimizationproblem and propose two strategies to solve it: alternative iteration(alterI-attack) or utilizing the closed-form solution of the random walk model(cf-attack). Finally, we utilize the results from the graph-space attacks asguidance to design more powerful feature-space attacks (i.e., graph-guidedattacks). Comprehensive experiments demonstrate that our proposed attacks areeffective in enabling the target nodes from RWAD with a limited attack budget.In addition, we conduct transfer attack experiments in a black-box setting,which show that our feature attack significantly decreases the anomaly scoresof target nodes. Our study opens the door to studying the dual-space attackagainst graph anomaly detection in which the graph space relies on the featurespace.", "output": "Dual-Space Attacks against Random-Walk-based Anomaly Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The isomorphism problem is a fundamental problem in network analysis, whichinvolves capturing both low-order and high-order structural information. Interms of extracting low-order structural information, graph isomorphismalgorithms analyze the structural equivalence to reduce the solver spacedimension, which demonstrates its power in many applications, such as proteindesign, chemical pathways, and community detection. For the more commonlyoccurring high-order relationships in real-life scenarios, the problem ofhypergraph isomorphism, which effectively captures these high-order structuralrelationships, cannot be straightforwardly addressed using graph isomorphismmethods. Besides, the existing hypergraph kernel methods may suffer from highmemory consumption or inaccurate sub-structure identification, thus yieldingsub-optimal performance. In this paper, to address the abovementioned problems,we first propose the hypergraph Weisfiler-Lehman test algorithm for thehypergraph isomorphism test problem by generalizing the Weisfiler-Lehman testalgorithm from graphs to hypergraphs. Secondly, based on the presentedalgorithm, we propose a general hypergraph Weisfieler-Lehman kernel frameworkand implement two instances, which are Hypergraph Weisfeiler-Lehamn SubtreeKernel and Hypergraph Weisfeiler-Lehamn Hyperedge Kernel. In order to fulfillour research objectives, a comprehensive set of experiments was meticulouslydesigned, including seven graph classification datasets and 12 hypergraphclassification datasets. Results on hypergraph classification datasets showsignificant improvements compared to other typical kernel-based methods, whichdemonstrates the effectiveness of the proposed methods. In our evaluation, wefound that our proposed methods outperform the second-best method in terms ofruntime, running over 80 times faster when handling complex hypergraphstructures.", "output": "Hypergraph Isomorphism Computation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, using neural networks to simulate spatio-temporal dynamics hasreceived a lot of attention. However, most existing methods adopt puredata-driven black-box models, which have limited accuracy and interpretability.By combining trainable difference operators with black-box models, we propose anew hybrid architecture explicitly embedded with partial prior knowledge of theunderlying PDEs named PDE-Net++. Furthermore, we introduce two distinct optionscalled the trainable flipping difference layer (TFDL) and the trainable dynamicdifference layer (TDDL) for the difference operators. Numerous numericalexperiments have demonstrated that PDE-Net++ has superior prediction accuracyand better extrapolation performance than black-box models.", "output": "Learning to simulate partially known spatio-temporal dynamics with trainable difference operators."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Immunotherapy emerges as promising approach for treating cancer. Encouragingfindings have validated the efficacy of immunotherapy medications in addressingtumors, resulting in prolonged survival rates and notable reductions intoxicity compared to conventional chemotherapy methods. However, the pool ofeligible patients for immunotherapy remains relatively small, indicating a lackof comprehensive understanding regarding the physiological mechanismsresponsible for favorable treatment response in certain individuals whileothers experience limited benefits. To tackle this issue, the authors presentan innovative strategy that harnesses a non-linear cellular architecture inconjunction with a deep downstream classifier. This approach aims to carefullyselect and enhance 2D features extracted from chest-abdomen CT images, therebyimproving the prediction of treatment outcomes. The proposed pipeline has beenmeticulously designed to seamlessly integrate with an advanced embedded Pointof Care system. In this context, the authors present a compelling case studyfocused on Metastatic Urothelial Carcinoma (mUC), a particularly aggressiveform of cancer. Performance evaluation of the proposed approach underscores itseffectiveness, with an impressive overall accuracy of approximately 93%", "output": "Non-Linear Self Augmentation Deep Pipeline for Cancer Treatment outcome Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In latest years, deep learning has gained a leading role in the pansharpeningof multiresolution images. Given the lack of ground truth data, most deeplearning-based methods carry out supervised training in a reduced-resolutiondomain. However, models trained on downsized images tend to perform poorly onhigh-resolution target images. For this reason, several research groups are nowturning to unsupervised training in the full-resolution domain, through thedefinition of appropriate loss functions and training paradigms. In thiscontext, we have recently proposed a full-resolution training framework whichcan be applied to many existing architectures.Here, we propose a new deep learning-based pansharpening model that fullyexploits the potential of this approach and provides cutting-edge performance.Besides architectural improvements with respect to previous work, such as theuse of residual attention modules, the proposed model features a novel lossfunction that jointly promotes the spectral and spatial quality of thepansharpened data. In addition, thanks to a new fine-tuning strategy, itimproves inference-time adaptation to target images. Experiments on a largevariety of test images, performed in challenging scenarios, demonstrate thatthe proposed method compares favorably with the state of the art both in termsof numerical results and visual output. Code is available online at", "output": "Unsupervised Deep Learning-based Pansharpening with Jointly-Enhanced Spectral and Spatial Fidelity."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Big data and machine learning tools have jointly empowered humans in makingdata-driven decisions. However, many of them capture empirical associationsthat might be spurious due to confounding factors and subgroup heterogeneity.The famous Simpson's paradox is such a phenomenon where aggregated andsubgroup-level associations contradict with each other, causing cognitiveconfusions and difficulty in making adequate interpretations and decisions.Existing tools provide little insights for humans to locate, reason about, andprevent pitfalls of spurious association in practice. We propose VISPUR, avisual analytic system that provides a causal analysis framework and ahuman-centric workflow for tackling spurious associations. These include aCONFOUNDER DASHBOARD, which can automatically identify possible confoundingfactors, and a SUBGROUP VIEWER, which allows for the visualization andcomparison of diverse subgroup patterns that likely or potentially result in amisinterpretation of causality. Additionally, we propose a REASONINGSTORYBOARD, which uses a flow-based approach to illustrate paradoxicalphenomena, as well as an interactive DECISION DIAGNOSIS panel that helps ensureaccountable decision-making. Through an expert interview and a controlled userexperiment, our qualitative and quantitative results demonstrate that theproposed \"de-paradox\" workflow and the designed visual analytic system areeffective in helping human users to identify and understand spuriousassociations, as well as to make accountable causal decisions.", "output": "VISPUR: Visual Aids for Identifying and Interpreting Spurious Associations in Data-Driven Decisions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Speech enhancement (SE) is crucial for reliable communication devices orrobust speech recognition systems. Although conventional artificial neuralnetworks (ANN) have demonstrated remarkable performance in SE, they requiresignificant computational power, along with high energy costs. In this paper,we propose a novel approach to SE using a spiking neural network (SNN) based ona U-Net architecture. SNNs are suitable for processing data with a temporaldimension, such as speech, and are known for their energy-efficientimplementation on neuromorphic hardware. As such, SNNs are thus interestingcandidates for real-time applications on devices with limited resources. Theprimary objective of the current work is to develop an SNN-based model withcomparable performance to a state-of-the-art ANN model for SE. We train a deepSNN using surrogate-gradient-based optimization and evaluate its performanceusing perceptual objective tests under different signal-to-noise ratios andreal-world noise conditions. Our results demonstrate that the proposedenergy-efficient SNN model outperforms the Intel Neuromorphic Deep NoiseSuppression Challenge (Intel N-DNS Challenge) baseline solution and achievesacceptable performance compared to an equivalent ANN model.", "output": "Single Channel Speech Enhancement Using U-Net Spiking Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Computer vision (CV), a non-intrusive and cost-effective technology, hasfurthered the development of precision livestock farming by enabling optimizeddecision-making through timely and individualized animal care. The availabilityof affordable two- and three-dimensional camera sensors, combined with variousmachine learning and deep learning algorithms, has provided a valuableopportunity to improve livestock production systems. However, despite theavailability of various CV tools in the public domain, applying these tools toanimal data can be challenging, often requiring users to have programming anddata analysis skills, as well as access to computing resources. Moreover, therapid expansion of precision livestock farming is creating a growing need toeducate and train animal science students in CV. This presents educators withthe challenge of efficiently demonstrating the complex algorithms involved inCV. Thus, the objective of this study was to develop ShinyAnimalCV, anopen-source cloud-based web application. This application provides auser-friendly interface for performing CV tasks, including object segmentation,detection, three-dimensional surface visualization, and extraction of two- andthree-dimensional morphological features. Nine pre-trained CV models usingtop-view animal data are included in the application. ShinyAnimalCV has beendeployed online using cloud computing platforms. The source code ofShinyAnimalCV is available on GitHub, along with detailed documentation ontraining CV models using custom data and deploying ShinyAnimalCV locally toallow users to fully leverage the capabilities of the application.ShinyAnimalCV can contribute to CV research and teaching in the animal sciencecommunity.", "output": "Technical note: ShinyAnimalCV: open-source cloud-based web application for object detection, segmentation, and three-dimensional visualization of animals using computer vision."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We improve reliable, long-horizon, goal-directed navigation inpartially-mapped environments by using non-locally available information topredict the goodness of temporally-extended actions that enter unseen space.Making predictions about where to navigate in general requires non-localinformation: any observations the robot has seen so far may provide informationabout the goodness of a particular direction of travel. Building on recent workin learning-augmented model-based planning under uncertainty, we present anapproach that can both rely on non-local information to make predictions (via agraph neural network) and is reliable by design: it will always reach its goal,even when learning does not provide accurate predictions. We conductexperiments in three simulated environments in which non-local information isneeded to perform well. In our large scale university building environment,generated from real-world floorplans to the scale, we demonstrate a 9.3%reduction in cost-to-go compared to a non-learned baseline and a 14.9%reduction compared to a learning-informed planner that can only use localinformation to inform its predictions.", "output": "Improving Reliable Navigation under Uncertainty via Predictions Informed by Non-Local Information."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "High-resolution tactile sensing can provide accurate information about localcontact in contact-rich robotic tasks. However, the deployment of such tasks inunstructured environments remains under-investigated. To improve the robustnessof tactile robot control in unstructured environments, we propose and study anew concept: textit{tactile saliency} for robot touch, inspired by the humantouch attention mechanism from neuroscience and the visual saliency predictionproblem from computer vision. In analogy to visual saliency, this conceptinvolves identifying key information in tactile images captured by a tactilesensor. While visual saliency datasets are commonly annotated by humans,manually labelling tactile images is challenging due to their counterintuitivepatterns. To address this challenge, we propose a novel approach comprised ofthree interrelated networks: 1) a Contact Depth Network (ConDepNet), whichgenerates a contact depth map to localize deformation in a real tactile imagethat contains target and noise features; 2) a Tactile Saliency Network(TacSalNet), which predicts a tactile saliency map to describe the target areasfor an input contact depth map; 3) and a Tactile Noise Generator (TacNGen),which generates noise features to train the TacSalNet. Experimental results incontact pose estimation and edge-following in the presence of distractorsshowcase the accurate prediction of target features from real tactile images.Overall, our tactile saliency prediction approach gives robust sim-to-realtactile control in environments with unknown distractors. Project page:", "output": "Attention of Robot Touch: Tactile Saliency Prediction for Robust Sim-to-Real Tactile Control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This research draws upon cognitive psychology and information systems studiesto anticipate user engagement and decision-making on digital platforms. Byemploying natural language processing (NLP) techniques and insights fromcognitive bias research, we delve into user interactions with synonyms withindigital content. Our methodology synthesizes four cognitivebiasesRepresentativeness, Ease-of-use, Affect, and Distributioninto the READmodel. Through a comprehensive user survey, we assess the model's ability topredict user engagement, discovering that synonyms that accurately representcore ideas, are easy to understand, elicit emotional responses, and arecommonly encountered, promote greater user engagement. Crucially, our workoffers a fresh lens on human-computer interaction, digital behaviors, anddecision-making processes. Our results highlight the promise of cognitivebiases as potent indicators of user engagement, underscoring their significancein designing effective digital content across fields like education andmarketing.", "output": "Words That Stick: Predicting Decision Making and Synonym Engagement Using Cognitive Biases and Computational Linguistics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Interpretable part-prototype models are computer vision models that areexplainable by design. The models learn prototypical parts and recognise thesecomponents in an image, thereby combining classification and explanation.Despite the recent attention for intrinsically interpretable models, there isno comprehensive overview on evaluating the explanation quality ofinterpretable part-prototype models. Based on the Co-12 properties forexplanation quality as introduced in arXiv:2201.08164 (e.g., correctness,completeness, compactness), we review existing work that evaluatespart-prototype models, reveal research gaps and outline future approaches forevaluation of the explanation quality of part-prototype models. This paper,therefore, contributes to the progression and maturity of this relatively newresearch field on interpretable part-prototype models. We additionally providea ``Co-12 cheat sheet'' that acts as a concise summary of our findings onevaluating part-prototype models.", "output": "The Co-12 Recipe for Evaluating Interpretable Part-Prototype Image Classifiers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper explores the representation of vehicle lights in computer visionand its implications for various tasks in the field of autonomous driving.Different specifications for representing vehicle lights, including boundingboxes, center points, corner points, and segmentation masks, are discussed interms of their strengths and weaknesses. Three important tasks in autonomousdriving that can benefit from vehicle light detection are identified: nighttimevehicle detection, 3D vehicle orientation estimation, and dynamic trajectorycues. Each task may require a different representation of the light. Thechallenges of collecting and annotating large datasets for training data-drivenmodels are also addressed, leading to introduction of the LISA Vehicle LightsDataset and associated Light Visibility Model, which provides light annotationsspecifically designed for downstream applications in vehicle detection, intentand trajectory prediction, and safe path planning. A comparison of existingvehicle light datasets is provided, highlighting the unique features andlimitations of each dataset. Overall, this paper provides insights into therepresentation of vehicle lights and the importance of accurate annotations fortraining effective detection models in autonomous driving applications. Ourdataset and model are made available at", "output": "Patterns of Vehicle Lights: Addressing Complexities in Curation and Annotation of Camera-Based Vehicle Light Datasets and Metrics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper details the challenges in applying two computer vision systems, anEfficientDET supervised learning model and the unsupervised RX spectralclassifier, to 98.9 GB of drone imagery from the Wu-Murad wilderness search andrescue (WSAR) effort in Japan and identifies 3 directions for future research.There have been at least 19 proposed approaches and 3 datasets aimed atlocating missing persons in drone imagery, but only 3 approaches (2unsupervised and 1 of an unknown structure) are referenced in the literature ashaving been used in an actual WSAR operation. Of these proposed approaches, theEfficientDET architecture and the unsupervised spectral RX classifier wereselected as the most appropriate for this setting. The EfficientDET model wasapplied to the HERIDAL dataset and despite achieving performance that isstatistically equivalent to the state-of-the-art, the model fails to translateto the real world in terms of false positives (e.g., identifying tree limbs androcks as people), and false negatives (e.g., failing to identify members of thesearch team). The poor results in practice for algorithms that showed goodresults on datasets suggest 3 areas of future research: more realistic datasetsfor wilderness SAR, computer vision models that are capable of seamlesslyhandling the variety of imagery that can be collected during actual WSARoperations, and better alignment on performance measures.", "output": "Open Problems in Computer Vision for Wilderness SAR and The Search for Patricia Wu-Murad."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a novel approach to assist students with dyslexia, ADHD,and short attention span in digesting any text-based information moreefficiently. The proposed solution utilizes the Multilayer Perceptron (MLP)algorithm for complex text processing and summarization tasks. The toolleverages the T5 (Text-to-Text Transfer Transformer) model from Hugging Face,which treats every NLP task as a text generation task. The model is fine-tunedon specific tasks using a smaller dataset. The NLTK's Punkt Sentence Tokenizeris used to divide a text into a list of sentences. The application is servedusing Flask, a lightweight web server and framework. The tool also appliesprinciples from Bionic Reading to enhance readability, which includes a boldingfunction and adjustments to line, word, and character spacing. The paperdiscusses the methodology, implementation, and results of the AI-based speedreading tool.", "output": "Speed Reading Tool Powered by Artificial Intelligence for Students with ADHD, Dyslexia, or Short Attention Span."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents an efficient algorithm to solve the sleeping bandit withmultiple plays problem in the context of an online recommendation system. Theproblem involves bounded, adversarial loss and unknown i.i.d. distributions forarm availability. The proposed algorithm extends the sleeping bandit algorithmfor single arm selection and is guaranteed to achieve theoretical performancewith regret upper bounded by $bigO(kN^2sqrt{Tlog T})$, where $k$ is thenumber of arms selected per time step, $N$ is the total number of arms, and $T$is the time horizon.", "output": "Adversarial Sleeping Bandit Problems with Multiple Plays: Algorithm and Ranking Application."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generation-based fuzz testing can uncover various bugs and securityvulnerabilities. However, compared to mutation-based fuzz testing, it takesmuch longer to develop a well-balanced generator that produces good test casesand decides where to break the underlying structure to exercise new code paths.We propose a novel approach to combine a trained test case generator deeplearning model with a double deep Q-network (DDQN) for the first time. The DDQNguides test case creation based on a code coverage signal. Our approachimproves the code coverage performance of the underlying generator model by upto 18.5% for the Firefox HTML rendering engine compared to the baselinegrammar based fuzzer.", "output": "Reinforcement learning guided fuzz testing for a browser's HTML rendering engine."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While reinforcement learning algorithms have had great success in the fieldof autonomous navigation, they cannot be straightforwardly applied to the realautonomous systems without considering the safety constraints. The later arecrucial to avoid unsafe behaviors of the autonomous vehicle on the road. Tohighlight the importance of these constraints, in this study, we compare twolearnable navigation policies: safe and unsafe. The safe policy takes theconstraints into account, while the other does not. We show that the safepolicy is able to generate trajectories with more clearance (distance to theobstacles) and makes less collisions while training without sacrificing theoverall performance.", "output": "Evaluation of Safety Constraints in Autonomous Navigation with Deep Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Identifying traffic accidents in driving videos is crucial to ensuring thesafety of autonomous driving and driver assistance systems. To address thepotential danger caused by the long-tailed distribution of driving events,existing traffic accident detection (TAD) methods mainly rely on unsupervisedlearning. However, TAD is still challenging due to the rapid movement ofcameras and dynamic scenes in driving scenarios. Existing unsupervised TADmethods mainly rely on a single pretext task, i.e., an appearance-based orfuture object localization task, to detect accidents. However, appearance-basedapproaches are easily disturbed by the rapid movement of the camera and changesin illumination, which significantly reduce the performance of traffic accidentdetection. Methods based on future object localization may fail to captureappearance changes in video frames, making it difficult to detect ego-involvedaccidents (e.g., out of control of the ego-vehicle). In this paper, we proposea novel memory-augmented multi-task collaborative framework (MAMTCF) forunsupervised traffic accident detection in driving videos. Different fromprevious approaches, our method can more accurately detect both ego-involvedand non-ego accidents by simultaneously modeling appearance changes and objectmotions in video frames through the collaboration of optical flowreconstruction and future object localization tasks. Further, we introduce amemory-augmented motion representation mechanism to fully explore theinterrelation between different types of motion representations and exploit thehigh-level features of normal traffic patterns stored in memory to augmentmotion representations, thus enlarging the difference from anomalies.Experimental results on recently published large-scale dataset demonstrate thatour method achieves better performance compared to previous state-of-the-artapproaches.", "output": "A Memory-Augmented Multi-Task Collaborative Framework for Unsupervised Traffic Accident Detection in Driving Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Explainability for Deep Learning Models is especially important for clinicalapplications, where decisions of automated systems have far-reachingconsequences.While various post-hoc explainable methods, such as attention visualizationand saliency maps, already exist for common data modalities, including naturallanguage and images, little work has been done to adapt them to the modality ofFlow CytoMetry (FCM) data.In this work, we evaluate the usage of a transformer architecture calledReluFormer that ease attention visualization as well as we propose a gradient-and an attention-based visualization technique tailored for FCM. Wequalitatively evaluate the visualization techniques for cell classification andpolygon regression on pediatric Acute Lymphoblastic Leukemia (ALL) FCM samples.The results outline the model's decision process and demonstrate how to utilizethe proposed techniques to inspect the trained model. The gradient-basedvisualization not only identifies cells that are most significant for aparticular prediction but also indicates the directions in the FCM featurespace in which changes have the most impact on the prediction. The attentionvisualization provides insights on the transformer's decision process whenhandling FCM data. We show that different attention heads specialize byattending to different biologically meaningful sub-populations in the data,even though the model retrieved solely supervised binary classification signalsduring training.", "output": "Explainable Techniques for Analyzing Flow Cytometry Cell Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The purpose of multi-object tracking (MOT) is to continuously track andidentify objects detected in videos. Currently, most methods for multi-objecttracking model the motion information and combine it with appearanceinformation to determine and track objects. In this paper, unfalsified controlis employed to address the ID-switch problem in multi-object tracking. Weestablish sequences of appearance information variations for the trajectoriesduring the tracking process and design a detection and rectification modulespecifically for ID-switch detection and recovery. We also propose a simple andeffective strategy to address the issue of ambiguous matching of appearanceinformation during the data association process. Experimental results onpublicly available MOT datasets demonstrate that the tracker exhibits excellenteffectiveness and robustness in handling tracking errors caused by occlusionsand rapid movements.", "output": "The detection and rectification for identity-switch based on unfalsified control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Point cloud analysis (such as 3D segmentation and detection) is a challengingtask, because of not only the irregular geometries of many millions ofunordered points, but also the great variations caused by depth, viewpoint,occlusion, etc. Current studies put much focus on the adaption of neuralnetworks to the complex geometries of point clouds, but are blind to afundamental question: how to learn an appropriate point embedding space that isaware of both discriminative semantics and challenging variations? As aresponse, we propose a clustering based supervised learning scheme for pointcloud analysis. Unlike current de-facto, scene-wise training paradigm, ouralgorithm conducts within-class clustering on the point embedding space forautomatically discovering subclass patterns which are latent yet representativeacross scenes. The mined patterns are, in turn, used to repaint the embeddingspace, so as to respect the underlying distribution of the entire trainingdataset and improve the robustness to the variations. Our algorithm isprincipled and readily pluggable to modern point cloud segmentation networksduring training, without extra overhead during testing. With various 3D networkarchitectures (i.e., voxel-based, point-based, Transformer-based, automaticallysearched), our algorithm shows notable improvements on famous point cloudsegmentation datasets (i.e.,2.0-2.6% on single-scan and 2.0-2.2% multi-scan ofSemanticKITTI, 1.8-1.9% on S3DIS, in terms of mIoU). Our algorithm alsodemonstrates utility in 3D detection, showing 2.0-3.4% mAP gains on KITTI.", "output": "Clustering based Point Cloud Representation Learning for 3D Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Augmentation techniques and sampling strategies are crucial in contrastivelearning, but in most existing works, augmentation techniques require carefuldesign, and their sampling strategies can only capture a small amount ofintrinsic supervision information. Additionally, the existing methods requirecomplex designs to obtain two different representations of the data. Toovercome these limitations, we propose a novel framework called theSelf-Contrastive Graph Diffusion Network (SCGDN). Our framework consists of twomain components: the Attentional Module (AttM) and the Diffusion Module (DiFM).AttM aggregates higher-order structure and feature information to get anexcellent embedding, while DiFM balances the state of each node in the graphthrough Laplacian diffusion learning and allows the cooperative evolution ofadjacency and feature information in the graph. Unlike existing methodologies,SCGDN is an augmentation-free approach that avoids \"sampling bias\" and semanticdrift, without the need for pre-training. We conduct a high-quality sampling ofsamples based on structure and feature information. If two nodes are neighbors,they are considered positive samples of each other. If two disconnected nodesare also unrelated on $k$NN graph, they are considered negative samples foreach other. The contrastive objective reasonably uses our proposed samplingstrategies, and the redundancy reduction term minimizes redundant informationin the embedding and can well retain more discriminative information. In thisnovel framework, the graph self-contrastive learning paradigm gives expressionto a powerful force. SCGDN effectively balances between preserving high-orderstructure information and avoiding overfitting. The results manifest that SCGDNcan consistently generate outperformance over both the contrastive methods andthe classical methods.", "output": "Self-Contrastive Graph Diffusion Network."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the field of phase change phenomena, the lack of accessible and diversedatasets suitable for machine learning (ML) training poses a significantchallenge. Existing experimental datasets are often restricted, with limitedavailability and sparse ground truth data, impeding our understanding of thiscomplex multi-physics phenomena. To bridge this gap, we present the BubbleMLDataset( which leverages physics-drivensimulations to provide accurate ground truth information for various boilingscenarios, encompassing nucleate pool boiling, flow boiling, and sub-cooledboiling. This extensive dataset covers a wide range of parameters, includingvarying gravity conditions, flow rates, sub-cooling levels, and wall superheat,comprising 51 simulations. BubbleML is validated against experimentalobservations and trends, establishing it as an invaluable resource for MLresearch. Furthermore, we showcase its potential to facilitate exploration ofdiverse downstream tasks by introducing two benchmarks: (a) optical flowanalysis to capture bubble dynamics, and (b) operator networks for learningtemperature dynamics. The BubbleML dataset and its benchmarks serve as acatalyst for advancements in ML-driven research on multi-physics phase changephenomena, enabling the development and comparison of state-of-the-arttechniques and models.", "output": "BubbleML: A Multi-Physics Dataset and Benchmarks for Machine Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In-context learning (ICL) for large language models has proven to be apowerful approach for many natural language processing tasks. However,determining the best method to select examples for ICL is nontrivial as theresults can vary greatly depending on the quality, quantity, and order ofexamples used. In this paper, we conduct a case study on text simplification(TS) to investigate how to select the best and most robust examples for ICL. Wepropose Metric-Based in-context Learning (MBL) method that utilizes commonlyused TS metrics such as SARI, compression ratio, and BERT-Precision forselection. Through an extensive set of experiments with various-sized GPTmodels on standard TS benchmarks such as TurkCorpus and ASSET, we show thatexamples selected by the top SARI scores perform the best on larger models suchas GPT-175B, while the compression ratio generally performs better on smallermodels such as GPT-13B and GPT-6.7B. Furthermore, we demonstrate that MBL isgenerally robust to example orderings and out-of-domain test sets, andoutperforms strong baselines and state-of-the-art finetuned language models.Finally, we show that the behaviour of large GPT models can be implicitlycontrolled by the chosen metric. Our research provides a new framework forselecting examples in ICL, and demonstrates its effectiveness in textsimplification tasks, breaking new ground for more accurate and efficient NLGsystems.", "output": "Metric-Based In-context Learning: A Case Study in Text Simplification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With advances in generative artificial intelligence (AI), it is now possibleto produce realistic-looking automated reports for preliminary reads ofradiology images. This can expedite clinical workflows, improve accuracy andreduce overall costs. However, it is also well-known that such models oftenhallucinate, leading to false findings in the generated reports. In this paper,we propose a new method of fact-checking of AI-generated reports using theirassociated images. Specifically, the developed examiner differentiates real andfake sentences in reports by learning the association between an image andsentences describing real or potentially fake findings. To train such anexaminer, we first created a new dataset of fake reports by perturbing thefindings in the original ground truth radiology reports associated with images.Text encodings of real and fake sentences drawn from these reports are thenpaired with image encodings to learn the mapping to real/fake labels. Theutility of such an examiner is demonstrated for verifying automaticallygenerated reports by detecting and removing fake sentences. Future generativeAI approaches can use the resulting tool to validate their reports leading to amore responsible use of AI in expediting clinical workflows.", "output": "Fact-Checking of AI-Generated Reports."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many planning formalisms allow for mixing numeric with Boolean effects.However, most of these formalisms are undecidable. In this paper, we willanalyze possible causes for this undecidability by studying the number ofdifferent occurrences of actions, an approach that proved useful for metricfluents before. We will start by reformulating a numeric planning problem knownas restricted tasks as a search problem. We will then show how an NP-completefragment of numeric planning can be found by using heuristics. To achieve this,we will develop the idea of multi-valued partial order plans, a leastcommitting compact representation for (sequential and parallel) plans. Finally,we will study optimization techniques for this representation to incorporatesoft preconditions.", "output": "Multi-Valued Partial Order Plans in Numeric Planning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Order-Sorted Feature (OSF) logic is a knowledge representation and reasoninglanguage based on function-denoting feature symbols and set-denoting sortsymbols ordered in a subsumption lattice. OSF logic allows the construction ofrecord-like terms that represent classes of entities and that are themselvesordered in a subsumption relation. The unification algorithm for suchstructures provides an efficient calculus of type subsumption, which has beenapplied in computational linguistics and implemented in constraint logicprogramming languages such as LOGIN and LIFE and automated reasoners such asCEDAR. This work generalizes OSF logic to a fuzzy setting. We give a flexibledefinition of a fuzzy subsumption relation which generalizes Zadeh's inclusionbetween fuzzy sets. Based on this definition we define a fuzzy semantics of OSFlogic where sort symbols and OSF terms denote fuzzy sets. We extend thesubsumption relation to OSF terms and prove that it constitutes a fuzzy partialorder with the property that two OSF terms are subsumed by one another in thecrisp sense if and only if their subsumption degree is greater than 0. We showhow to find the greatest lower bound of two OSF terms by unifying them and howto compute the subsumption degree between two OSF terms, and we provide thecomplexity of these operations.", "output": "Fuzzy order-sorted feature logic."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The ever-growing use of wind energy makes necessary the optimization ofturbine operations through pitch angle controllers and their maintenance withearly fault detection. It is crucial to have accurate and robust modelsimitating the behavior of wind turbines, especially to predict the generatedpower as a function of the wind speed. Existing empirical and physics-basedmodels have limitations in capturing the complex relations between the inputvariables and the power, aggravated by wind variability. Data-driven methodsoffer new opportunities to enhance wind turbine modeling of large datasets byimproving accuracy and efficiency. In this study, we used physics-informedneural networks to reproduce historical data coming from 4 turbines in a windfarm, while imposing certain physical constraints to the model. The developedmodels for regression of the power, torque, and power coefficient as outputvariables showed great accuracy for both real data and physical equationsgoverning the system. Lastly, introducing an efficient evidential layerprovided uncertainty estimations of the predictions, proved to be consistentwith the absolute error, and made possible the definition of a confidenceinterval in the power curve.", "output": "Prediction of wind turbines power with physics-informed neural networks and evidential uncertainty quantification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) have been widely employed for graph-to-textgeneration tasks. However, the process of finetuning LLMs requires significanttraining resources and annotation work. In this paper, we explore thecapability of generative models to generate descriptive text from graph data ina zero-shot setting. Specifically, we evaluate GPT-3 and ChatGPT on twograph-to-text datasets and compare their performance with that of finetuned LLMmodels such as T5 and BART. Our results demonstrate that generative models arecapable of generating fluent and coherent text, achieving BLEU scores of 10.57and 11.08 for the AGENDA and WebNLG datasets, respectively. However, our erroranalysis reveals that generative models still struggle with understanding thesemantic relations between entities, and they also tend to generate text withhallucinations or irrelevant information. As a part of error analysis, weutilize BERT to detect machine-generated text and achieve high macro-F1 scores.We have made the text generated by generative models publicly available.", "output": "Evaluating Generative Models for Graph-to-Text Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the rapidly growing field of electronic design automation (EDA),professional software such as KiCad, Cadence , and Altium Designer provideincreasingly extensive design functionalities. However, the intricate commandstructure and high learning curve create a barrier, particularly for noviceprinted circuit board (PCB) designers. This results in difficulties inselecting appropriate functions or plugins for varying design purposes,compounded by the lack of intuitive learning methods beyond traditionaldocumentation, videos, and online forums. To address this challenge, anartificial intelligence (AI) interaction assist plugin for EDA software namedSmartonAl is developed here, also KiCad is taken as the first example.SmartonAI is inspired by the HuggingGPT framework and employs large languagemodels, such as GPT and BERT, to facilitate task planning and execution. Onreceiving a designer request, SmartonAI conducts a task breakdown andefficiently executes relevant subtasks, such as analysis of help documentationparagraphs and execution of different plugins, along with leveraging thebuilt-in schematic and PCB manipulation functions in both SmartonAl itself andsoftware. Our preliminary results demonstrate that SmartonAI can significantlystreamline the PCB design process by simplifying complex commands intointuitive language-based interactions. By harnessing the powerful languagecapabilities of ChatGPT and the rich design functions of KiCad, the plugineffectively bridges the gap between complex EDA software and user-friendlyinteraction. Meanwhile, the new paradigm behind SmartonAI can also extend toother complex software systems, illustrating the immense potential ofAI-assisted user interfaces in advancing digital interactions across variousdomains.", "output": "New Interaction Paradigm for Complex EDA Software Leveraging GPT."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training an image captioner without annotated image-sentence pairs has gainedtraction in recent years. Previous approaches can be categorized into twostrategies: crawling sentences from mismatching corpora and aligning them withthe given images as pseudo annotations, or pre-training the captioner usingexternal image-text pairs. However, the aligning setting seems to reach itsperformance limit due to the quality problem of pairs, and pre-trainingrequires significant computational resources. To address these challenges, wepropose a new strategy ``LPM + retrieval-augmented learning\" where the priorknowledge from large pre-trained models (LPMs) is leveraged as supervision, anda retrieval process is integrated to further reinforce its effectiveness.Specifically, we introduce Retrieval-augmented Pseudo Sentence Generation(RaPSG), which adopts an efficient approach to retrieve highly relevant shortregion descriptions from the mismatching corpora and use them to generate avariety of pseudo sentences with distinct representations as well as highquality via LPMs. In addition, a fluency filter and a CLIP-guided trainingobjective are further introduced to facilitate model optimization. Experimentalresults demonstrate that our method surpasses the SOTA pre-training model(Flamingo3B) by achieving a CIDEr score of 78.1 (+5.1) while utilizing only0.3% of its trainable parameters (1.3B VS 33M). Importantly, our approacheliminates the need of computationally expensive pre-training processes onexternal datasets (e.g., the requirement of 312M image-text pairs forFlamingo3B). We further show that with a simple extension, the generated pseudosentences can be deployed as weak supervision to boost the 1% semi-supervisedimage caption benchmark up to 93.4 CIDEr score (+8.9) which showcases theversatility and effectiveness of our approach.", "output": "Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As public consciousness regarding the collection and use of personalinformation by corporations grows, it is of increasing importance thatconsumers be active participants in the curation of corporate datasets. Inlight of this, data governance frameworks such as the General Data ProtectionRegulation (GDPR) have outlined the right to be forgotten as a key principleallowing individuals to request that their personal data be deleted from thedatabases and models used by organizations. To achieve forgetting in practice,several machine unlearning methods have been proposed to address thecomputational inefficiencies of retraining a model from scratch with eachunlearning request. While efficient online alternatives to retraining, it isunclear how these methods impact other properties critical to real-worldapplications, such as fairness. In this work, we propose the first fair machineunlearning method that can provably and efficiently unlearn data instanceswhile preserving group fairness. We derive theoretical results whichdemonstrate that our method can provably unlearn data instances whilemaintaining fairness objectives. Extensive experimentation with real-worlddatasets highlight the efficacy of our method at unlearning data instanceswhile preserving fairness.", "output": "Fair Machine Unlearning: Data Removal while Mitigating Disparities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a new large-scale emotion-labeled symbolic music datasetconsisting of 12k MIDI songs. To create this dataset, we first trained emotionclassification models on the GoEmotions dataset, achieving state-of-the-artresults with a model half the size of the baseline. We then applied thesemodels to lyrics from two large-scale MIDI datasets. Our dataset covers a widerange of fine-grained emotions, providing a valuable resource to explore theconnection between music and emotions and, especially, to develop models thatcan generate music based on specific emotions. Our code for inference, trainedmodels, and datasets are available online.", "output": "Emotion4MIDI: a Lyrics-based Emotion-Labeled Symbolic Music Dataset."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Modern semiconductor manufacturing involves intricate production processesconsisting of hundreds of operations, which can take several months from lotrelease to completion. The high-tech machines used in these processes arediverse, operate on individual wafers, lots, or batches in multiple stages, andnecessitate product-specific setups and specialized maintenance procedures.This situation is different from traditional job-shop scheduling scenarios,which have less complex production processes and machines, and mainly focus onsolving highly combinatorial but abstract scheduling problems. In this work, weaddress the scheduling of realistic semiconductor manufacturing processes bymodeling their specific requirements using hybrid Answer Set Programming withdifference logic, incorporating flexible machine processing, setup, batchingand maintenance operations. Unlike existing methods that schedule semiconductormanufacturing processes locally with greedy heuristics or by independentlyoptimizing specific machine group allocations, we examine the potentials oflarge-scale scheduling subject to multiple optimization objectives.", "output": "Hybrid ASP-based multi-objective scheduling of semiconductor manufacturing processes (Extended version)."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Counterfactual examples have emerged as an effective approach to producesimple and understandable post-hoc explanations. In the context of graphclassification, previous work has focused on generating counterfactualexplanations by manipulating the most elementary units of a graph, i.e.,removing an existing edge, or adding a non-existing one. In this paper, weclaim that such language of explanation might be too fine-grained, and turn ourattention to some of the main characterizing features of real-world complexnetworks, such as the tendency to close triangles, the existence of recurringmotifs, and the organization into dense modules. We thus define a generaldensity-based counterfactual search framework to generate instance-levelcounterfactual explanations for graph classifiers, which can be instantiatedwith different notions of dense substructures. In particular, we show twospecific instantiations of this general framework: a method that searches forcounterfactual graphs by opening or closing triangles, and a method driven bymaximal cliques. We also discuss how the general method can be instantiated toexploit any other notion of dense substructures, including, for instance, agiven taxonomy of nodes. We evaluate the effectiveness of our approaches in 7brain network datasets and compare the counterfactual statements generatedaccording to several widely-used metrics. Results confirm that adopting asemantic-relevant unit of change like density is essential to define versatileand interpretable counterfactual explanation methods.", "output": "Counterfactual Explanations for Graph Classification Through the Lenses of Density."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In-context learning, which offers substantial advantages over fine-tuning, ispredominantly observed in decoder-only models, while encoder-decoder (i.e.,seq2seq) models excel in methods that rely on weight updates. Recently, a fewstudies have demonstrated the feasibility of few-shot learning with seq2seqmodels; however, this has been limited to tasks that align well with theseq2seq architecture, such as summarization and translation. Inspired by theseinitial studies, we provide a first-ever extensive experiment comparing thein-context few-shot learning capabilities of decoder-only and encoder-decodermodels on a broad range of tasks. Furthermore, we propose two methods to moreeffectively elicit in-context learning ability in seq2seq models:objective-aligned prompting and a fusion-based approach. Remarkably, ourapproach outperforms a decoder-only model that is six times larger and exhibitssignificant performance improvements compared to conventional seq2seq modelsacross a variety of settings. We posit that, with the right configuration andprompt design, seq2seq models can be highly effective few-shot learners for awide spectrum of applications.", "output": "Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurate 3D human pose estimation (3D HPE) is crucial for enabling autonomousvehicles (AVs) to make informed decisions and respond proactively in criticalroad scenarios. Promising results of 3D HPE have been gained in several domainssuch as human-computer interaction, robotics, sports and medical analytics,often based on data collected in well-controlled laboratory environments.Nevertheless, the transfer of 3D HPE methods to AVs has received limitedresearch attention, due to the challenges posed by obtaining accurate 3D poseannotations and the limited suitability of data from other domains.We present a simple yet efficient weakly supervised approach for 3D HPE inthe AV context by employing a high-level sensor fusion between camera and LiDARdata. The weakly supervised setting enables training on the target datasetswithout any 2D/3D keypoint labels by using an off-the-shelf 2D joint extractorand pseudo labels generated from LiDAR to image projections. Our approachoutperforms state-of-the-art results by up to $sim$ 13% on the Waymo OpenDataset in the weakly supervised setting and achieves state-of-the-art resultsin the supervised setting.", "output": "Weakly Supervised Multi-Modal 3D Human Body Pose Estimation for Autonomous Driving."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a novel semantics for the language of multi-agent only believingexploiting belief bases, and show how to use it for automatically checkingformulas of this language and of its dynamic extension with private beliefexpansion operators. We provide a PSPACE algorithm for model checking relyingon a reduction to QBF and alternative dedicated algorithm relying on theexploration of the state space. We present an implementation of the QBF-basedalgorithm and some experimental results on computation time in a concreteexample.", "output": "Base-based Model Checking for Multi-Agent Only Believing (long version)."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent surge of foundation models in computer vision and natural languageprocessing opens up perspectives in utilizing multi-modal clinical data totrain large models with strong generalizability. Yet pathological imagedatasets often lack biomedical text annotation and enrichment. Guidingdata-efficient image diagnosis from the use of biomedical text knowledgebecomes a substantial interest. In this paper, we propose to Connect Image andText Embeddings (CITE) to enhance pathological image classification. CITEinjects text insights gained from language models pre-trained with a broadrange of biomedical texts, leading to adapt foundation models towardspathological image understanding. Through extensive experiments on thePatchGastric stomach tumor pathological image dataset, we demonstrate that CITEachieves leading performance compared with various baselines especially whentraining data is scarce. CITE offers insights into leveraging in-domain textknowledge to reinforce data-efficient pathological image classification. Codeis available at ", "output": "Text-guided Foundation Model Adaptation for Pathological Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Representing source code in a generic input format is crucial to automatesoftware engineering tasks, e.g., applying machine learning algorithms toextract information. Visualizing code representations can further enable humanexperts to gain an intuitive insight into the code. Unfortunately, as of today,there is no universal tool that can simultaneously visualise different types ofcode representations. In this paper, we introduce a tool, CodeLens, whichprovides a visual interaction environment that supports various representationmethods and helps developers understand and explore them. CodeLens is designedto support multiple programming languages, such as Java, Python, andJavaScript, and four types of code representations, including sequence oftokens, abstract syntax tree (AST), data flow graph (DFG), and control flowgraph (CFG). By using CodeLens, developers can quickly visualize the specificcode representation and also obtain the represented inputs for models of code.The Web-based interface of CodeLens is available at The demonstration video can be found at ", "output": "CodeLens: An Interactive Tool for Visualizing Code Representations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work introduces TRON, a scalable session-based Transformer Recommenderusing Optimized Negative-sampling. Motivated by the scalability and performancelimitations of prevailing models such as SASRec and GRU4Rec+, TRON integratestop-k negative sampling and listwise loss functions to enhance itsrecommendation accuracy. Evaluations on relevant large-scale e-commercedatasets show that TRON improves upon the recommendation quality of currentmethods while maintaining training speeds similar to SASRec. A live A/B testyielded an 18.14% increase in click-through rate over SASRec, highlighting thepotential of TRON in practical settings. For further research, we provideaccess to our source code at and an anonymizeddataset at ", "output": "Scaling Session-Based Transformer Recommendations using Optimized Negative Sampling and Loss Functions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data profiling is an essential process in modern data-driven industries. Oneof its critical components is the discovery and validation of complexstatistics, including functional dependencies, data constraints, associationrules, and others.However, most existing data profiling systems that focus on complexstatistics do not provide proper integration with the tools used bycontemporary data scientists. This creates a significant barrier to theadoption of these tools in the industry. Moreover, existing systems were notcreated with industrial-grade workloads in mind. Finally, they do not aim toprovide descriptive explanations, i.e. why a given pattern is not found. It isa significant issue as it is essential to understand the underlying reasons fora specific pattern's absence to make informed decisions based on the data.Because of that, these patterns are effectively rest in thin air: theirapplication scope is rather limited, they are rarely used by the broaderpublic. At the same time, as we are going to demonstrate in this presentation,complex statistics can be efficiently used to solve many classic data qualityproblems.Desbordante is an open-source data profiler that aims to close this gap. Itis built with emphasis on industrial application: it is efficient, scalable,resilient to crashes, and provides explanations. Furthermore, it providesseamless Python integration by offloading various costly operations to the C++core, not only mining.In this demonstration, we show several scenarios that allow end users tosolve different data quality problems. Namely, we showcase typo detection, datadeduplication, and data anomaly detection scenarios.", "output": "Solving Data Quality Problems with Desbordante: a Demo."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large Language Models for Code (Code LLM) are flourishing. New and powerfulmodels are released on a weekly basis, demonstrating remarkable performance onthe code generation task. Various approaches have been proposed to boost thecode generation performance of pre-trained Code LLMs, such as supervisedfine-tuning, instruction tuning, reinforcement learning, etc. In this paper, wepropose a novel RRTF (Rank Responses to align Test&Teacher Feedback) framework,which can effectively and efficiently boost pre-trained large language modelsfor code generation. Under this framework, we present PanGu-Coder2, whichachieves 62.20% pass@1 on the OpenAI HumanEval benchmark. Furthermore, throughan extensive evaluation on CoderEval and LeetCode benchmarks, we show thatPanGu-Coder2 consistently outperforms all previous Code LLMs.", "output": "PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper seeks to solve Multi-Source Domain Adaptation (MSDA), which aimsto mitigate data distribution shifts when transferring knowledge from multiplelabeled source domains to an unlabeled target domain. We propose a novel MSDAframework based on dictionary learning and optimal transport. We interpret eachdomain in MSDA as an empirical distribution. As such, we express each domain asa Wasserstein barycenter of dictionary atoms, which are empiricaldistributions. We propose a novel algorithm, DaDiL, for learning viamini-batches: (i) atom distributions; (ii) a matrix of barycentric coordinates.Based on our dictionary, we propose two novel methods for MSDA: DaDil-R, basedon the reconstruction of labeled samples in the target domain, and DaDiL-E,based on the ensembling of classifiers learned on atom distributions. Weevaluate our methods in 3 benchmarks: Caltech-Office, Office 31, and CRWU,where we improved previous state-of-the-art by 3.15%, 2.29%, and 7.71% inclassification performance. Finally, we show that interpolations in theWasserstein hull of learned atoms provide data that can generalize to thetarget domain.", "output": "Multi-Source Domain Adaptation through Dataset Dictionary Learning in Wasserstein Space."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the overwhelming trend of mask image modeling led by MAE, generativepre-training has shown a remarkable potential to boost the performance offundamental models in 2D vision. However, in 3D vision, the over-reliance onTransformer-based backbones and the unordered nature of point clouds haverestricted the further development of generative pre-training. In this paper,we propose a novel 3D-to-2D generative pre-training method that is adaptable toany point cloud model. We propose to generate view images from differentinstructed poses via the cross-attention mechanism as the pre-training scheme.Generating view images has more precise supervision than its point cloudcounterpart, thus assisting 3D backbones to have a finer comprehension of thegeometrical structure and stereoscopic relations of the point cloud.Experimental results have proved the superiority of our proposed 3D-to-2Dgenerative pre-training over previous pre-training methods. Our method is alsoeffective in boosting the performance of architecture-oriented approaches,achieving state-of-the-art performance when fine-tuning on ScanObjectNNclassification and ShapeNetPart segmentation tasks. Code is available at", "output": "Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many software projects implement APIs and algorithms in multiple programminglanguages. Maintaining such projects is tiresome, as developers have to ensurethat any change (e.g., a bug fix or a new feature) is being propagated, timelyand without errors, to implementations in other programming languages. In theworld of ever-changing software, using rule-based translation tools (i.e.,transpilers) or machine learning models for translating code from one languageto another provides limited value. Translating each time the entire codebasefrom one language to another is not the way developers work. In this paper, wetarget a novel task: translating code changes from one programming language toanother using large language models (LLMs). We design and implement the firstLLM, dubbed Codeditor, to tackle this task. Codeditor explicitly models codechanges as edit sequences and learns to correlate changes across programminglanguages. To evaluate Codeditor, we collect a corpus of 6,613 aligned codechanges from 8 pairs of open-source software projects implementing similarfunctionalities in two programming languages (Java and C#). Results show thatCodeditor outperforms the state-of-the-art approaches by a large margin on allcommonly used automatic metrics. Our work also reveals that Codeditor iscomplementary to the existing generation-based models, and their combinationensures even greater performance.", "output": "Multilingual Code Co-Evolution Using Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose the Thinker algorithm, a novel approach that enables reinforcementlearning agents to autonomously interact with and utilize a learned worldmodel. The Thinker algorithm wraps the environment with a world model andintroduces new actions designed for interacting with the world model. Thesemodel-interaction actions enable agents to perform planning by proposingalternative plans to the world model before selecting a final action to executein the environment. This approach eliminates the need for hand-crafted planningalgorithms by enabling the agent to learn how to plan autonomously and allowsfor easy interpretation of the agent's plan with visualization. We demonstratethe algorithm's effectiveness through experimental results in the game ofSokoban and the Atari 2600 benchmark, where the Thinker algorithm achievesstate-of-the-art performance and competitive results, respectively.Visualizations of agents trained with the Thinker algorithm demonstrate thatthey have learned to plan effectively with the world model to select betteractions. The algorithm's generality opens a new research direction on how aworld model can be used in reinforcement learning and how planning can beseamlessly integrated into an agent's decision-making process.", "output": "Thinker: Learning to Plan and Act."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) are now highly capable at a diverse range oftasks. This paper studies whether or not GPT-4, one such LLM, is capable ofassisting researchers in the field of adversarial machine learning. As a casestudy, we evaluate the robustness of AI-Guardian, a recent defense toadversarial examples published at IEEE S&P 2023, a top computer securityconference. We completely break this defense: the proposed scheme does notincrease robustness compared to an undefended baseline.We write none of the code to attack this model, and instead prompt GPT-4 toimplement all attack algorithms following our instructions and guidance. Thisprocess was surprisingly effective and efficient, with the language model attimes producing code from ambiguous instructions faster than the author of thispaper could have done. We conclude by discussing (1) the warning signs presentin the evaluation that suggested to us AI-Guardian would be broken, and (2) ourexperience with designing attacks and performing novel research using the mostrecent advances in language modeling.", "output": "A LLM Assisted Exploitation of AI-Guardian."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Google's Bard has emerged as a formidable competitor to OpenAI's ChatGPT inthe field of conversational AI. Notably, Bard has recently been updated tohandle visual inputs alongside text prompts during conversations. Given Bard'simpressive track record in handling textual inputs, we explore its capabilitiesin understanding and interpreting visual data (images) conditioned by textquestions. This exploration holds the potential to unveil new insights andchallenges for Bard and other forthcoming multi-modal Generative models,especially in addressing complex computer vision problems that demand accuratevisual and language understanding. Specifically, in this study, we focus on 15diverse task scenarios encompassing regular, camouflaged, medical, under-waterand remote sensing data to comprehensively evaluate Bard's performance. Ourprimary finding indicates that Bard still struggles in these vision scenarios,highlighting the significant gap in vision-based understanding that needs to bebridged in future developments. We expect that this empirical study will provevaluable in advancing future models, leading to enhanced capabilities incomprehending and interpreting fine-grained visual data. Our project isreleased on ", "output": "How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) have shown the potential to be integrated intohuman daily lives. Therefore, user preference is the most critical criterionfor assessing LLMs' performance in real-world scenarios. However, existingbenchmarks mainly focus on measuring models' accuracy using multi-choicequestions, which limits the understanding of their capabilities in realapplications. We fill this gap by proposing a comprehensive Chinese benchmarkSuperCLUE, named after another popular Chinese LLM benchmark CLUE. SuperCLUEencompasses three sub-tasks: actual users' queries and ratings derived from anLLM battle platform (CArena), open-ended questions with single andmultiple-turn dialogues (OPEN), and closed-ended questions with the same stemsas open-ended single-turn ones (CLOSE). Our study shows that accuracy onclosed-ended questions is insufficient to reflect human preferences achieved onopen-ended ones. At the same time, they can complement each other to predictactual user preferences. We also demonstrate that GPT-4 is a reliable judge toautomatically evaluate human preferences on open-ended questions in a Chinesecontext. Our benchmark will be released at ", "output": "SuperCLUE: A Comprehensive Chinese Large Language Model Benchmark."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Because \"out-of-the-box\" large language models are capable of generating agreat deal of objectionable content, recent work has focused on aligning thesemodels in an attempt to prevent undesirable generation. While there has beensome success at circumventing these measures -- so-called \"jailbreaks\" againstLLMs -- these attacks have required significant human ingenuity and are brittlein practice. In this paper, we propose a simple and effective attack methodthat causes aligned language models to generate objectionable behaviors.Specifically, our approach finds a suffix that, when attached to a wide rangeof queries for an LLM to produce objectionable content, aims to maximize theprobability that the model produces an affirmative response (rather thanrefusing to answer). However, instead of relying on manual engineering, ourapproach automatically produces these adversarial suffixes by a combination ofgreedy and gradient-based search techniques, and also improves over pastautomatic prompt generation methods.Surprisingly, we find that the adversarial prompts generated by our approachare quite transferable, including to black-box, publicly released LLMs.Specifically, we train an adversarial attack suffix on multiple prompts (i.e.,queries asking for many different types of objectionable content), as well asmultiple models (in our case, Vicuna-7B and 13B). When doing so, the resultingattack suffix is able to induce objectionable content in the public interfacesto ChatGPT, Bard, and Claude, as well as open source LLMs such as LLaMA-2-Chat,Pythia, Falcon, and others. In total, this work significantly advances thestate-of-the-art in adversarial attacks against aligned language models,raising important questions about how such systems can be prevented fromproducing objectionable information. Code is available atgithub.com/llm-attacks/llm-attacks.", "output": "Universal and Transferable Adversarial Attacks on Aligned Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Clinical trials are vital in advancing drug development and evidence-basedmedicine, but their success is often hindered by challenges in patientrecruitment. In this work, we investigate the potential of large languagemodels (LLMs) to assist individual patients and referral physicians inidentifying suitable clinical trials from an extensive selection. Specifically,we introduce TrialGPT, a novel architecture employing LLMs to predictcriterion-level eligibility with detailed explanations, which are thenaggregated for ranking and excluding candidate clinical trials based onfree-text patient notes. We evaluate TrialGPT on three publicly availablecohorts of 184 patients and 18,238 annotated clinical trials. The experimentalresults demonstrate several key findings: First, TrialGPT achieves highcriterion-level prediction accuracy with faithful explanations. Second, theaggregated trial-level TrialGPT scores are highly correlated with experteligibility annotations. Third, these scores prove effective in rankingclinical trials and exclude ineligible candidates. Our error analysis suggeststhat current LLMs still make some mistakes due to limited medical knowledge anddomain-specific context understanding. Nonetheless, we believe the explanatorycapabilities of LLMs are highly valuable. Future research is warranted on howsuch AI assistants can be integrated into the routine trial matching workflowin real-world settings to improve its efficiency.", "output": "Matching Patients to Clinical Trials with Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Regulation of Multi-Agent Systems (MAS) and Declarative ElectronicInstitutions (DEIs) was a multidisciplinary research topic of the past decadeinvolving (Physical and Software) Agents and Law since the beginning, butrecently evolved towards News-claimed Robot Lawyer since 2016. One of thesefirst proposals of restricting the behaviour of Software Agents was ElectronicInstitutions.However, with the recent reformulation of Artificial NeuralNetworks (ANNs) as Deep Learning (DL), Security, Privacy,Ethical and Legalissues regarding the use of DL has raised concerns in the ArtificialIntelligence (AI) Community. Now that the Regulation of MAS is almost correctlyaddressed, we propose the Regulation of Artificial Neural Networks asAgent-based Training of a special type of regulated Artificial Neural Networkthat we call Institutional Neural Network (INN).The main purpose of this paperis to bring attention to Artificial Teaching (AT) and to give a tentativeanswer showing a proof-of-concept implementation of Regulated Deep Learning(RDL). This paper introduces the former concept and provide $I^*$, a languagepreviously used to model declaratively and extend Electronic Institutions, as ameans to regulate the execution of Artificial Neural Networks and theirinteractions with Artificial Teachers (ATs)", "output": "Towards Regulated Deep Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data augmentation is a powerful technique to improve performance inapplications such as image and text classification tasks. Yet, there is littlerigorous understanding of why and how various augmentations work. In this work,we consider a family of linear transformations and study their effects on theridge estimator in an over-parametrized linear regression setting. First, weshow that transformations that preserve the labels of the data can improveestimation by enlarging the span of the training data. Second, we show thattransformations that mix data can improve estimation by playing aregularization effect. Finally, we validate our theoretical insights on MNIST.Based on the insights, we propose an augmentation scheme that searches over thespace of transformations by how uncertain the model is about the transformeddata. We validate our proposed scheme on image and text datasets. For example,our method outperforms random sampling methods by 1.24% on CIFAR-100 usingWide-ResNet-28-10. Furthermore, we achieve comparable accuracy to the SoTAAdversarial AutoAugment on CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets.", "output": "On the Generalization Effects of Linear Transformations in Data Augmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we present DSN (Deep Serial Number), a simple yet effectivewatermarking algorithm designed specifically for deep neural networks (DNNs).Unlike traditional methods that incorporate identification signals into DNNs,our approach explores a novel Intellectual Property (IP) protection mechanismfor DNNs, effectively thwarting adversaries from using stolen networks.Inspired by the success of serial numbers in safeguarding conventional softwareIP, we propose the first implementation of serial number embedding within DNNs.To achieve this, DSN is integrated into a knowledge distillation framework, inwhich a private teacher DNN is initially trained. Subsequently, its knowledgeis distilled and imparted to a series of customized student DNNs. Each customerDNN functions correctly only upon input of a valid serial number. Experimentalresults across various applications demonstrate DSN's efficacy in preventingunauthorized usage without compromising the original DNN performance. Theexperiments further show that DSN is resistant to different categories ofwatermark attacks.", "output": "Deep Serial Number: Computational Watermarking for DNN Intellectual Property Protection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generating 3D dances from music is an emerged research task that benefits alot of applications in vision and graphics. Previous works treat this task assequence generation, however, it is challenging to render a music-alignedlong-term sequence with high kinematic complexity and coherent movements. Inthis paper, we reformulate it by a two-stage process, ie, a key pose generationand then an in-between parametric motion curve prediction, where the key posesare easier to be synchronized with the music beats and the parametric curvescan be efficiently regressed to render fluent rhythm-aligned movements. Wenamed the proposed method as DanceFormer, which includes two cascadingkinematics-enhanced transformer-guided networks (called DanTrans) that tackleeach stage, respectively. Furthermore, we propose a large-scale musicconditioned 3D dance dataset, called PhantomDance, that is accurately labeledby experienced animators rather than reconstruction or motion capture. Thisdataset also encodes dances as key poses and parametric motion curves apartfrom pose sequences, thus benefiting the training of our DanceFormer. Extensiveexperiments demonstrate that the proposed method, even trained by existingdatasets, can generate fluent, performative, and music-matched 3D dances thatsurpass previous works quantitatively and qualitatively. Moreover, the proposedDanceFormer, together with the PhantomDance dataset( are seamlessly compatible withindustrial animation software, thus facilitating the adaptation for variousdownstream applications.", "output": "DanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It has long been believed that the brain is highly modular both in terms ofstructure and function, although recent evidence has led some to question theextent of both types of modularity. We used artificial neural networks to testthe hypothesis that structural modularity is sufficient to guarantee functionalspecialization, and find that in general, this doesn't necessarily hold exceptat extreme levels. We then systematically tested which features of theenvironment and network do lead to the emergence of specialization. We used asimple toy environment, task and network, allowing us precise control, and showthat in this setup, several distinct measures of specialization givequalitatively similar results. We further find that (1) specialization can onlyemerge in environments where features of that environment are meaningfullyseparable, (2) specialization preferentially emerges when the network isstrongly resource-constrained, and (3) these findings are qualitatively similaracross different network architectures, but the quantitative relationshipsdepends on the architecture type. Finally, we show that functionalspecialization varies dynamically across time, and demonstrate that thesedynamics depend on both the timing and bandwidth of information flow in thenetwork. We conclude that a static notion of specialization, based onstructural modularity, is likely too simple a framework for understandingintelligent systems in situations of real-world complexity. We propose thatthoroughly stress testing candidate definitions of functional modularity insimplified scenarios before extending to more complex data, network models andelectrophysiological recordings is likely to be a fruitful approach.", "output": "Dynamics of specialization in neural modules under resource constraints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning models always make a prediction, even when it is likely tobe inaccurate. This behavior should be avoided in many decision supportapplications, where mistakes can have severe consequences. Albeit alreadystudied in 1970, machine learning with rejection recently gained interest. Thismachine learning subfield enables machine learning models to abstain frommaking a prediction when likely to make a mistake.This survey aims to provide an overview on machine learning with rejection.We introduce the conditions leading to two types of rejection, ambiguity andnovelty rejection, which we carefully formalize. Moreover, we review andcategorize strategies to evaluate a model's predictive and rejective quality.Additionally, we define the existing architectures for models with rejectionand describe the standard techniques for learning such models. Finally, weprovide examples of relevant application domains and show how machine learningwith rejection relates to other machine learning research areas.", "output": "Machine Learning with a Reject Option: A survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many researchers have used tag information to improve the performance ofrecommendation techniques in recommender systems. Examining the tags of userswill help to get their interests and leads to more accuracy in therecommendations. Since user-defined tags are chosen freely and without anyrestrictions, problems arise in determining their exact meaning and thesimilarity of tags. However, using thesaurus and ontologies to find the meaningof tags is not very efficient due to their free definition by users and the useof different languages in many data sets. Therefore, this article usesmathematical and statistical methods to determine lexical similarity andco-occurrence tags solution to assign semantic similarity. On the other hand,due to the change of users' interests over time this article has considered thetime of tag assignments in co-occurrence tags for determining similarity oftags. Then the graph is created based on similarity of tags. For modeling theinterests of the users, the communities of tags are determined by usingcommunity detection methods. So, recommendations based on the communities oftags and similarity between resources are done. The performance of the proposedmethod has been evaluated using two criteria of precision and recall throughevaluations on two public datasets. The evaluation results show that theprecision and recall of the proposed method have significantly improved,compared to the other methods. According to the experimental results, thecriteria of recall and precision have been improved, on average by 5% and 7%respectively.", "output": "Graph-Based Recommendation System Enhanced with Community Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Prompt learning approaches have made waves in natural language processing byinducing better few-shot performance while they still follow a parametric-basedlearning paradigm; the oblivion and rote memorization problems in learning mayencounter unstable generalization issues. Specifically, vanilla prompt learningmay struggle to utilize atypical instances by rote during fully-supervisedtraining or overfit shallow patterns with low-shot data. To alleviate suchlimitations, we develop RetroPrompt with the motivation of decoupling knowledgefrom memorization to help the model strike a balance between generalization andmemorization. In contrast with vanilla prompt learning, RetroPrompt constructsan open-book knowledge-store from training instances and implements a retrievalmechanism during the process of input, training and inference, thus equippingthe model with the ability to retrieve related contexts from the trainingcorpus as cues for enhancement. Extensive experiments demonstrate thatRetroPrompt can obtain better performance in both few-shot and zero-shotsettings. Besides, we further illustrate that our proposed RetroPrompt canyield better generalization abilities with new datasets. Detailed analysis ofmemorization indeed reveals RetroPrompt can reduce the reliance of languagemodels on memorization; thus, improving generalization for downstream tasks.Code is available in", "output": "Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work we propose an algorithm for trace recovery from stochasticallyknown logs, a setting that is becoming more common with the increasing numberof sensors and predictive models that generate uncertain data. The suggestedapproach calculates the conformance between a process model and astochastically known trace and recovers the best alignment within thisstochastic trace as the true trace. The paper offers an analysis of the impactof various cost models on trace recovery accuracy and makes use of a productmulti-graph to compare alternative trace recovery options. The average accuracyof our approach, evaluated using two publicly available datasets, isimpressive, with an average recovery accuracy score of 90-97%, significantlyimproving a common heuristic that chooses the most likely value for eachuncertain activity. We believe that the effectiveness of the proposed algorithmin recovering correct traces from stochastically known logs may be a powerfulaid for developing credible decision-making tools in uncertain settings.", "output": "Trace Recovery from Stochastically Known Logs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One powerful paradigm in visual navigation is to predict actions fromobservations directly. Training such an end-to-end system allowsrepresentations useful for downstream tasks to emerge automatically. However,the lack of inductive bias makes this system data inefficient. We hypothesize asufficient representation of the current view and the goal view for anavigation policy can be learned by predicting the location and size of a cropof the current view that corresponds to the goal. We further show that trainingsuch random crop prediction in a self-supervised fashion purely on syntheticnoise images transfers well to natural home images. The learned representationcan then be bootstrapped to learn a navigation policy efficiently with littleinteraction data. The code is available at ", "output": "Visual Pre-training for Navigation: What Can We Learn from Noise?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While the theoretical analysis of evolutionary algorithms (EAs) has madesignificant progress for pseudo-Boolean optimization problems in the last 25years, only sporadic theoretical results exist on how EAs solvepermutation-based problems.To overcome the lack of permutation-based benchmark problems, we propose ageneral way to transfer the classic pseudo-Boolean benchmarks into benchmarksdefined on sets of permutations. We then conduct a rigorous runtime analysis ofthe permutation-based $(1+1)$ EA proposed by Scharnow, Tinnefeld, and Wegener(2004) on the analogues of the LeadingOnes and Jump benchmarks. The lattershows that, different from bit-strings, it is not only the Hamming distancethat determines how difficult it is to mutate a permutation $sigma$ intoanother one $tau$, but also the precise cycle structure of $sigma tau^{-1}$.For this reason, we also regard the more symmetric scramble mutation operator.We observe that it not only leads to simpler proofs, but also reduces theruntime on jump functions with odd jump size by a factor of $Theta(n)$.Finally, we show that a heavy-tailed version of the scramble operator, as inthe bit-string case, leads to a speed-up of order $m^{Theta(m)}$ on jumpfunctions with jump size $m$. A short empirical analysis confirms thesefindings, but also reveals that small implementation details like the rate ofvoid mutations can make an important difference.", "output": "Runtime Analysis for Permutation-based Evolutionary Algorithms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This note serves three purposes: (i) we provide a self-contained expositionof the fact that conjunctive queries are not efficiently learnable in theProbably-Approximately-Correct (PAC) model, paying clear attention to thecomplicating fact that this concept class lacks the polynomial-size fittingproperty, a property that is tacitly assumed in much of the computationallearning theory literature; (ii) we establish a strong negative PAClearnability result that applies to many restricted classes of conjunctivequeries (CQs), including acyclic CQs for a wide range of notions of\"acyclicity\"; (iii) we show that CQs (and UCQs) are efficiently PAC learnablewith membership queries.", "output": "On the non-efficient PAC learnability of conjunctive queries."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training reinforcement learning (RL) agents using scalar reward signals isoften infeasible when an environment has sparse and non-Markovian rewards.Moreover, handcrafting these reward functions before training is prone tomisspecification, especially when the environment's dynamics are only partiallyknown. This paper proposes a novel pipeline for learning non-Markovian taskspecifications as succinct finite-state `task automata' from episodes of agentexperience within unknown environments. We leverage two key algorithmicinsights. First, we learn a product MDP, a model composed of thespecification's automaton and the environment's MDP (both initially unknown),by treating the product MDP as a partially observable MDP and using thewell-known Baum-Welch algorithm for learning hidden Markov models. Second, wepropose a novel method for distilling the task automaton (assumed to be adeterministic finite automaton) from the learnt product MDP. Our learnt taskautomaton enables the decomposition of a task into its constituent sub-tasks,which improves the rate at which an RL agent can later synthesise an optimalpolicy. It also provides an interpretable encoding of high-level environmentaland task features, so a human can readily verify that the agent has learntcoherent tasks with no misspecifications. In addition, we take steps towardsensuring that the learnt automaton is environment-agnostic, making itwell-suited for use in transfer learning. Finally, we provide experimentalresults compared with two baselines to illustrate our algorithm's performancein different environments and tasks.", "output": "Learning Task Automata for Reinforcement Learning using Hidden Markov Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph neural networks (GNNs) have become compelling models designed toperform learning and inference on graph-structured data. However, little workhas been done to understand the fundamental limitations of GNNs for scaling tolarger graphs and generalizing to out-of-distribution (OOD) inputs. In thispaper, we use a random graph generator to systematically investigate how thegraph size and structural properties affect the predictive performance of GNNs.We present specific evidence that the average node degree is a key feature indetermining whether GNNs can generalize to unseen graphs, and that the use ofmultiple node update functions can improve the generalization performance ofGNNs when dealing with graphs of multimodal degree distributions. Accordingly,we propose a multi-module GNN framework that allows the network to adaptflexibly to new graphs by generalizing a single canonical nonlineartransformation over aggregated inputs. Our results show that the multi-moduleGNNs improve the OOD generalization on a variety of inference tasks in thedirection of diverse structural features.", "output": "Towards Better Generalization with Flexible Representation of Multi-Module Graph Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Constraint programming is known for being an efficient approach for solvingcombinatorial problems. Important design choices in a solver are the branchingheuristics, which are designed to lead the search to the best solutions in aminimum amount of time. However, developing these heuristics is atime-consuming process that requires problem-specific expertise. Thisobservation has motivated many efforts to use machine learning to automaticallylearn efficient heuristics without expert intervention. To the best of ourknowledge, it is still an open research question. Although several genericvariable-selection heuristics are available in the literature, the options fora generic value-selection heuristic are more scarce. In this paper, we proposeto tackle this issue by introducing a generic learning procedure that can beused to obtain a value-selection heuristic inside a constraint programmingsolver. This has been achieved thanks to the combination of a deep Q-learningalgorithm, a tailored reward signal, and a heterogeneous graph neural networkarchitecture. Experiments on graph coloring, maximum independent set, andmaximum cut problems show that our framework is able to find better solutionsclose to optimality without requiring a large amounts of backtracks while beinggeneric.", "output": "Learning a Generic Value-Selection Heuristic Inside a Constraint Programming Solver."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) such as GPT-4 have recently demonstratedimpressive results across a wide range of tasks. LLMs are still limited,however, in that they frequently fail at complex reasoning, their reasoningprocesses are opaque, they are prone to 'hallucinate' facts, and there areconcerns about their underlying biases. Letting models verbalize reasoningsteps as natural language, a technique known as chain-of-thought prompting, hasrecently been proposed as a way to address some of these issues. Here wepresent ThoughtSource, a meta-dataset and software library for chain-of-thought(CoT) reasoning. The goal of ThoughtSource is to improve future artificialintelligence systems by facilitating qualitative understanding of CoTs,enabling empirical evaluations, and providing training data. This first releaseof ThoughtSource integrates seven scientific/medical, three general-domain andfive math word question answering datasets.", "output": "ThoughtSource: A central hub for large language model reasoning data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing causal models for link prediction assume an underlying set ofinherent node factors -- an innate characteristic defined at the node's birth-- that governs the causal evolution of links in the graph. In some causaltasks, however, link formation is path-dependent: The outcome of linkinterventions depends on existing links. Unfortunately, these existing causalmethods are not designed for path-dependent link formation, as the cascadingfunctional dependencies between links (arising from path dependence) are eitherunidentifiable or require an impractical number of control variables. Toovercome this, we develop the first causal model capable of dealing with pathdependencies in link prediction. In this work we introduce the concept ofcausal lifting, an invariance in causal models of independent interest that, ongraphs, allows the identification of causal link prediction queries usinglimited interventional data. Further, we show how structural pairwiseembeddings exhibit lower bias and correctly represent the task's causalstructure, as opposed to existing node embeddings, e.g., graph neural networknode embeddings and matrix factorization. Finally, we validate our theoreticalfindings on three scenarios for causal link prediction tasks: knowledge basecompletion, covariance matrix estimation and consumer-product recommendations.", "output": "Causal Lifting and Link Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper explores the application of emerging machine learning methods fromimage super-resolution (SR) to the task of statistical downscaling. Wespecifically focus on convolutional neural network-based Generative AdversarialNetworks (GANs). Our GANs are conditioned on low-resolution (LR) inputs togenerate high-resolution (HR) surface winds emulating Weather Research andForecasting (WRF) model simulations over North America. Unlike traditional SRmodels, where LR inputs are idealized coarsened versions of the HR images, WRFemulation involves using non-idealized LR and HR pairs resulting inshared-scale mismatches due to internal variability. Our study builds uponcurrent SR-based statistical downscaling by experimenting with a novelfrequency-separation (FS) approach from the computer vision field. To assessthe skill of SR models, we carefully select evaluation metrics, and focus onperformance measures based on spatial power spectra. Our analyses reveal howGAN configurations influence spatial structures in the generated fields,particularly biases in spatial variability spectra. Using power spectra toevaluate the FS experiments reveals that successful applications of FS incomputer vision do not translate to climate fields. However, the FS experimentsdemonstrate the sensitivity of power spectra to a commonly used GAN-based SRobjective function, which helps interpret and understand its role indetermining spatial structures. This result motivates the development of anovel partial frequency-separation scheme as a promising configuration option.We also quantify the influence on GAN performance of non-idealized LR fieldsresulting from internal variability. Furthermore, we conduct a spectra-basedfeature-importance experiment allowing us to explore the dependence of thespatial structure of generated fields on different physically relevant LRcovariates.", "output": "Algorithmic Hallucinations of Near-Surface Winds: Statistical Downscaling with Generative Adversarial Networks to Convection-Permitting Scales."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "An Electroencephalogram (EEG) is a non-invasive exam that records theelectrical activity of the brain. This exam is used to help diagnose conditionssuch as different brain problems. EEG signals are taken for the purpose ofepilepsy detection and with Discrete Wavelet Transform (DWT) and machinelearning classifier, they perform epilepsy detection. In Epilepsy seizuredetection, mainly machine learning classifiers and statistical features areused. The hidden information in the EEG signal is useful for detecting diseasesaffecting the brain. Sometimes it is very difficult to identify the minimumchanges in the EEG in the time and frequency domains purpose. The DWT can givea good decomposition of the signals in different frequency bands and featureextraction. We use the tri-dimensionality reduction algorithm.; PrincipalComponent Analysis (PCA), Independent Component Analysis (ICA), and LinearDiscriminant Analysis (LDA). Finally, features are selected by using a fusionrule and at the last step three different classifiers Support Vector Machine(SVM), Naive Bayes (NB) and K-Nearest-Neighbor(KNN) have been used individuallyfor the classification. The proposed framework is tested on the Bonn datasetand the simulation results provide the accuracy for the combination of LDA andSVM 89.17%, LDA and KNN 80.42%, PCA and NB 89.92%, PCA and SVM 85.58%, PCA andKNN 80.42%, ICA and NB 82.33%, ICA and SVM 90.42%, and ICA and KNN 90%, LDA andNB 100%, accuracy. It shows the sensitivity, specificity, accuracy, Precision,and Recall of 100%, 100%, 100%, 100%, and 100%. This combination of LDA with NBmethod provides the accuracy of 100% outperforming all existing methods. Theresults prove the effectiveness of this model.", "output": "Empirical analysis of Different Dimensionality Reduction and classification Techniques for Epileptic Seizure detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robots operating in real-world environments must reason about possibleoutcomes of stochastic actions and make decisions based on partial observationsof the true world state. A major challenge for making accurate and robustaction predictions is the problem of confounding, which if left untreated canlead to prediction errors. The partially observable Markov decision process(POMDP) is a widely-used framework to model these stochastic andpartially-observable decision-making problems. However, due to a lack ofexplicit causal semantics, POMDP planning methods are prone to confounding biasand thus in the presence of unobserved confounders may produce underperformingpolicies. This paper presents a novel causally-informed extension of \"anytimeregularized determinized sparse partially observable tree\" (AR-DESPOT), amodern anytime online POMDP planner, using causal modelling and inference toeliminate errors caused by unmeasured confounder variables. We further proposea method to learn offline the partial parameterisation of the causal model forplanning, from ground truth model data. We evaluate our methods on a toyproblem with an unobserved confounder and show that the learned causal model ishighly accurate, while our planning method is more robust to confounding andproduces overall higher performing policies than AR-DESPOT.", "output": "CAR-DESPOT: Causally-Informed Online POMDP Planning for Robots in Confounded Environments."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) is the most popular distributed machine learningtechnique. However, implementation of FL over modern wireless networks faceskey challenges caused by (i) dynamics of the network conditions and (ii) thecoexistence of multiple FL services/tasks and other network services in thesystem, which are not jointly considered in prior works. Motivated by thesechallenges, we introduce a generic FL paradigm over NextG networks, calleddynamic multi-service FL (DMS-FL). We identify three unexplored designconsiderations in DMS-FL: (i) FL service operator accumulation, (ii) wirelessresource fragmentation, and (iii) signal strength fluctuations. We take thefirst steps towards addressing these design considerations by proposing a noveldistributed ML architecture called elastic virtualized FL (EV-FL). EV-FLunleashes the full potential of Open RAN (O-RAN) systems and introduces anelastic resource provisioning methodology to execute FL services. It furtherconstitutes a multi-time-scale FL management system that introduces threedimensions into existing FL architectures: (i) virtualization, (ii)scalability, and (iii) elasticity. Through investigating EV-FL, we reveal aseries of open research directions for future work. We finally simulate EV-FLto demonstrate its potential in saving wireless resources and increasingfairness among FL services.", "output": "Synergies Between Federated Learning and O-RAN: Towards an Elastic Virtualized Architecture for Multiple Distributed Machine Learning Services."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a new AI challenge, the Tales of Tribute AI Competition(TOTAIC), based on a two-player deck-building card game released with the HighIsle chapter of The Elder Scrolls Online. Currently, there is no other AIcompetition covering Collectible Card Games (CCG) genre, and there has neverbeen one that targets a deck-building game. Thus, apart from usual CCG-relatedobstacles to overcome, like randomness, hidden information, and large branchingfactor, the successful approach additionally requires long-term planning andversatility. The game can be tackled with multiple approaches, includingclassic adversarial search, single-player planning, and Neural Networks-basedalgorithms. This paper introduces the competition framework, describes therules of the game, and presents the results of a tournament between sample AIagents. The first edition of TOTAIC is hosted at the IEEE Conference on Games2023.", "output": "Introducing Tales of Tribute AI Competition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Fuzzy time series forecasting (FTSF) is a typical forecasting method withwide application. Traditional FTSF is regarded as an expert system which leadsto loss of the ability to recognize undefined features. The mentioned is themain reason for poor forecasting with FTSF. To solve the problem, the proposedmodel Differential Fuzzy Convolutional Neural Network (DFCNN) utilizes aconvolution neural network to re-implement FTSF with learnable ability. DFCNNis capable of recognizing potential information and improving forecastingaccuracy. Thanks to the learnable ability of the neural network, the length offuzzy rules established in FTSF is expended to an arbitrary length that theexpert is not able to handle by the expert system. At the same time, FTSFusually cannot achieve satisfactory performance of non-stationary time seriesdue to the trend of non-stationary time series. The trend of non-stationarytime series causes the fuzzy set established by FTSF to be invalid and causesthe forecasting to fail. DFCNN utilizes the Difference algorithm to weaken thenon-stationary of time series so that DFCNN can forecast the non-stationarytime series with a low error that FTSF cannot forecast in satisfactoryperformance. After the mass of experiments, DFCNN has an excellent predictioneffect, which is ahead of the existing FTSF and common time series forecastingalgorithms. Finally, DFCNN provides further ideas for improving FTSF and holdscontinued research value.", "output": "Differential Convolutional Fuzzy Time Series Forecasting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although Domain Generalization (DG) problem has been fast-growing in the 2Dimage tasks, its exploration on 3D point cloud data is still insufficient andchallenged by more complex and uncertain cross-domain variances with uneveninter-class modality distribution. In this paper, different from previous 2D DGworks, we focus on the 3D DG problem and propose a Single-dataset UnifiedGeneralization (SUG) framework that only leverages a single source dataset toalleviate the unforeseen domain differences faced by a well-trained sourcemodel. Specifically, we first design a Multi-grained Sub-domain Alignment (MSA)method, which can constrain the learned representations to be domain-agnosticand discriminative, by performing a multi-grained feature alignment processbetween the splitted sub-domains from the single source dataset. Then, aSample-level Domain-aware Attention (SDA) strategy is presented, which canselectively enhance easy-to-adapt samples from different sub-domains accordingto the sample-level inter-domain distance to avoid the negative transfer.Experiments demonstrate that our SUG can boost the generalization ability forunseen target domains, even outperforming the existing unsupervised domainadaptation methods that have to access extensive target domain data. Our codeis available at ", "output": "SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing models for named entity recognition (NER) are mainly based onlarge-scale labeled datasets, which always obtain using crowdsourcing. However,it is hard to obtain a unified and correct label via majority voting frommultiple annotators for NER due to the large labeling space and complexity ofthis task. To address this problem, we aim to utilize the originalmulti-annotator labels directly. Particularly, we propose a Confidence-basedPartial Label Learning (CPLL) method to integrate the prior confidence (givenby annotators) and posterior confidences (learned by models) forcrowd-annotated NER. This model learns a token- and content-dependentconfidence via an Expectation-Maximization (EM) algorithm by minimizingempirical risk. The true posterior estimator and confidence estimator performiteratively to update the true posterior and confidence respectively. Weconduct extensive experimental results on both real-world and syntheticdatasets, which show that our model can improve performance effectivelycompared with strong baselines.", "output": "A Confidence-based Partial Label Learning Model for Crowd-Annotated Named Entity Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models of artificial intelligence (AI), such as ChatGPT, findremarkable but controversial applicability in science and research. This paperreviews epistemological challenges, ethical and integrity risks in scienceconduct in the advent of generative AI. This is with the aim to lay new timelyfoundations for a high-quality research ethics review. The role of AI languagemodels as a research instrument and subject is scrutinized along with ethicalimplications for scientists, participants and reviewers. New emerging practicesfor research ethics review are discussed, concluding with ten recommendationsthat shape a response for a more responsible research conduct in the era of AI.", "output": "Science in the Era of ChatGPT, Large Language Models and Generative AI: Challenges for Research Ethics and How to Respond."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Procedural planning, which entails decomposing a high-level goal into asequence of temporally ordered steps, is an important yet intricate task formachines. It involves integrating common-sense knowledge to reason aboutcomplex contextualized situations that are often counterfactual, e.g.\"scheduling a doctor's appointment without a phone\". While current approachesshow encouraging results using large language models (LLMs), they are hinderedby drawbacks such as costly API calls and reproducibility issues. In thispaper, we advocate planning using smaller language models. We present PlaSma, anovel two-pronged approach to endow small language models with proceduralknowledge and (counterfactual) planning capabilities. More concretely, wedevelop symbolic procedural knowledge distillation to enhance the implicitknowledge in small language models and an inference-time algorithm tofacilitate more structured and accurate reasoning. In addition, we introduce anovel task, Counterfactual Planning, that requires a revision of a plan to copewith a counterfactual situation. In both the original and counterfactualsetting, we show that orders-of-magnitude smaller models (770M-11B parameters)can compete and often surpass their larger teacher models' capabilities.", "output": "PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) have achieved a milestone that undenia-blychanged many held beliefs in artificial intelligence (AI). However, thereremains many limitations of these LLMs when it comes to true languageunderstanding, limitations that are a byproduct of the under-lying architectureof deep neural networks. Moreover, and due to their subsymbolic nature,whatever knowledge these models acquire about how language works will always beburied in billions of microfeatures (weights), none of which is meaningful onits own, making such models hopelessly unexplainable. To address theselimitations, we suggest com-bining the strength of symbolic representationswith what we believe to be the key to the success of LLMs, namely a successfulbottom-up re-verse engineering of language at scale. As such we argue for abottom-up reverse engineering of language in a symbolic setting. Hints on whatthis project amounts to have been suggested by several authors, and we discussin some detail here how this project could be accomplished.", "output": "Towards Explainable and Language-Agnostic LLMs: Symbolic Reverse Engineering of Language at Scale."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Art curatorial practice is characterized by the presentation of an artcollection in a knowledgeable way. Machine processes are characterized by theircapacity to manage and analyze large amounts of data. This paper envisages AIcuration and audience interaction to explore the implications of contemporarymachine learning models for the curatorial world. This project was developedfor the occasion of the 2023 Helsinki Art Biennial, entitled New Directions MayEmerge. We use the Helsinki Art Museum (HAM) collection to re-imagine the cityof Helsinki through the lens of machine perception. We use visual-textualmodels to place indoor artworks in public spaces, assigning fictionalcoordinates based on similarity scores. We transform the space that eachartwork inhabits in the city by generating synthetic 360 art panoramas. Weguide the generation estimating depth values from 360 panoramas at each artworklocation, and machine-generated prompts of the artworks. The result of thisproject is an AI curation that places the artworks in their imagined physicalspace, blurring the lines of artwork, context, and machine perception. The workis virtually presented as a web-based installation on this link, where users can navigate an alternative version ofthe city while exploring and interacting with its cultural heritage at scale.", "output": "AI Art Curation: Re-imagining the city of Helsinki in occasion of its Biennial."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Bayesian state and parameter estimation have been automated effectively in avariety of probabilistic programming languages. The process of model comparisonon the other hand, which still requires error-prone and time-consuming manualderivations, is often overlooked despite its importance. This paper efficientlyautomates Bayesian model averaging, selection, and combination by messagepassing on a Forney-style factor graph with a custom mixture node. Parameterand state inference, and model comparison can then be executed simultaneouslyusing message passing with scale factors. This approach shortens the modeldesign cycle and allows for the straightforward extension to hierarchical andtemporal model priors to accommodate for modeling complicated time-varyingprocesses.", "output": "Automating Model Comparison in Factor Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, encoders like ViT (vision transformer) and ResNet have been trainedon vast datasets and utilized as perceptual metrics for comparing sketches andimages, as well as multi-domain encoders in a zero-shot setting. However, therehas been limited effort to quantify the granularity of these encoders. Our workaddresses this gap by focusing on multi-modal 2D projections of individual 3Dinstances. This task holds crucial implications for retrieval and sketch-basedmodeling. We show that in a zero-shot setting, the more abstract the sketch,the higher the likelihood of incorrect image matches. Even within the samesketch domain, sketches of the same object drawn in different styles, forexample by distinct individuals, might not be accurately matched. One of thekey findings of our research is that meticulous fine-tuning on one class of 3Dshapes can lead to improved performance on other shape classes, reaching orsurpassing the accuracy of supervised methods. We compare and discuss severalfine-tuning strategies. Additionally, we delve deeply into how the scale of anobject in a sketch influences the similarity of features at different networklayers, helping us identify which network layers provide the most accuratematching. Significantly, we discover that ViT and ResNet perform best whendealing with similar object scales. We believe that our work will have asignificant impact on research in the sketch domain, providing insights andguidance on how to adopt large pretrained models as perceptual losses.", "output": "Fine-Tuned but Zero-Shot 3D Shape Sketch View Similarity and Retrieval."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dynamic structural causal models (SCMs) are a powerful framework forreasoning in dynamic systems about direct effects which measure how a change inone variable affects another variable while holding all other variablesconstant. The causal relations in a dynamic structural causal model can bequalitatively represented with a full-time causal graph. Assuming linearity andcausal sufficiency and given the full-time causal graph, the direct causaleffect is always identifiable and can be estimated from data by adjusting onany set of variables given by the so-called single-door criterion. However, inmany application such a graph is not available for various reasons butnevertheless experts have access to an abstraction of the full-time causalgraph which represents causal relations between time series while omittingtemporal information. This paper presents a complete identifiability resultwhich characterizes all cases for which the direct effect is graphicallyidentifiable from summary causal graphs and gives two sound finite adjustmentsets that can be used to estimate the direct effect whenever it isidentifiable.", "output": "Identifiability of direct effects from summary causal graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Incomplete utterance rewriting has recently raised wide attention. However,previous works do not consider the semantic structural information betweenincomplete utterance and rewritten utterance or model the semantic structureimplicitly and insufficiently. To address this problem, we propose aQUEry-Enhanced Network (QUEEN). Firstly, our proposed query template explicitlybrings guided semantic structural knowledge between the incomplete utteranceand the rewritten utterance making model perceive where to refer back to orrecover omitted tokens. Then, we adopt a fast and effective edit operationscoring network to model the relation between two tokens. Benefiting fromproposed query template and the well-designed edit operation scoring network,QUEEN achieves state-of-the-art performance on several public datasets.", "output": "Mining Clues from Incomplete Utterance: A Query-enhanced Network for Incomplete Utterance Rewriting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robotic navigation in unknown, cluttered environments with limited sensingcapabilities poses significant challenges in robotics. Local trajectoryoptimization methods, such as Model Predictive Path Intergal (MPPI), are apromising solution to this challenge. However, global guidance is required toensure effective navigation, especially when encountering challengingenvironmental conditions or navigating beyond the planning horizon. This studypresents the GP-MPPI, an online learning-based control strategy that integratesMPPI with a local perception model based on Sparse Gaussian Process (SGP). Thekey idea is to leverage the learning capability of SGP to construct a variance(uncertainty) surface, which enables the robot to learn about the navigablespace surrounding it, identify a set of suggested subgoals, and ultimatelyrecommend the optimal subgoal that minimizes a predefined cost function to thelocal MPPI planner. Afterward, MPPI computes the optimal control sequence thatsatisfies the robot and collision avoidance constraints. Such an approacheliminates the necessity of a global map of the environment or an offlinetraining process. We validate the efficiency and robustness of our proposedcontrol strategy through both simulated and real-world experiments of 2Dautonomous navigation tasks in complex unknown environments, demonstrating itssuperiority in guiding the robot safely towards its desired goal while avoidingobstacles and escaping entrapment in local minima. The GPU implementation ofGP-MPPI, including the supplementary video, is available at", "output": "GP-guided MPPI for Efficient Navigation in Complex Unknown Cluttered Environments."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "What is the prospect of developing artificial general intelligence (AGI)? Iinvestigate this question by systematically comparing living and algorithmicsystems, with a special focus on the notion of \"agency.\" There are threefundamental differences to consider: (1) Living systems are autopoietic, thatis, self-manufacturing, and therefore able to set their own intrinsic goals,while algorithms exist in a computational environment with target functionsthat are both provided by an external agent. (2) Living systems are embodied inthe sense that there is no separation between their symbolic and physicalaspects, while algorithms run on computational architectures that maximallyisolate software from hardware. (3) Living systems experience a large world, inwhich most problems are ill-defined (and not all definable), while algorithmsexist in a small world, in which all problems are well-defined. These threedifferences imply that living and algorithmic systems have very differentcapabilities and limitations. In particular, it is extremely unlikely that trueAGI (beyond mere mimicry) can be developed in the current algorithmic frameworkof AI research. Consequently, discussions about the proper development anddeployment of algorithmic tools should be shaped around the dangers andopportunities of current narrow AI, not the extremely unlikely prospect of theemergence of true agency in artificial systems.", "output": "Artificial intelligence is algorithmic mimicry: why artificial \"agents\" are not (and won't be) proper agents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The research field of Information Retrieval (IR) has evolved significantly,expanding beyond traditional search to meet diverse user information needs.Recently, Large Language Models (LLMs) have demonstrated exceptionalcapabilities in text understanding, generation, and knowledge inference,opening up exciting avenues for IR research. LLMs not only facilitategenerative retrieval but also offer improved solutions for user understanding,model evaluation, and user-system interactions. More importantly, thesynergistic relationship among IR models, LLMs, and humans forms a newtechnical paradigm that is more powerful for information seeking. IR modelsprovide real-time and relevant information, LLMs contribute internal knowledge,and humans play a central role of demanders and evaluators to the reliabilityof information services. Nevertheless, significant challenges exist, includingcomputational costs, credibility concerns, domain-specific limitations, andethical considerations. To thoroughly discuss the transformative impact of LLMson IR research, the Chinese IR community conducted a strategic workshop inApril 2023, yielding valuable insights. This paper provides a summary of theworkshop's outcomes, including the rethinking of IR's core values, the mutualenhancement of LLMs and IR, the proposal of a novel IR technical paradigm, andopen challenges.", "output": "Information Retrieval Meets Large Language Models: A Strategic Report from Chinese IR Community."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spiking neural networks (SNNs) are brain-inspired energy-efficient modelsthat encode information in spatiotemporal dynamics. Recently, deep SNNs traineddirectly have shown great success in achieving high performance onclassification tasks with very few time steps. However, how to design adirectly-trained SNN for the regression task of object detection still remainsa challenging problem. To address this problem, we propose EMS-YOLO, a noveldirectly-trained SNN framework for object detection, which is the first trialto train a deep SNN with surrogate gradients for object detection rather thanANN-SNN conversion strategies. Specifically, we design a full-spike residualblock, EMS-ResNet, which can effectively extend the depth of thedirectly-trained SNN with low power consumption. Furthermore, we theoreticallyanalyze and prove the EMS-ResNet could avoid gradient vanishing or exploding.The results demonstrate that our approach outperforms the state-of-the-artANN-SNN conversion methods (at least 500 time steps) in extremely fewer timesteps (only 4 time steps). It is shown that our model could achieve comparableperformance to the ANN with the same architecture while consuming 5.83 timesless energy on the frame-based COCO Dataset and the event-based Gen1 Dataset.", "output": "Deep Directly-Trained Spiking Neural Networks for Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the vulnerability of fairness-constrained learning to smallamounts of malicious noise in the training data. Konstantinov and Lampert(2021) initiated the study of this question and presented negative resultsshowing there exist data distributions where for several fairness constraints,any proper learner will exhibit high vulnerability when group sizes areimbalanced. Here, we present a more optimistic view, showing that if we allowrandomized classifiers, then the landscape is much more nuanced. For example,for Demographic Parity we show we can incur only a $Theta(alpha)$ loss inaccuracy, where $alpha$ is the malicious noise rate, matching the bestpossible even without fairness constraints. For Equal Opportunity, we show wecan incur an $O(sqrt{alpha})$ loss, and give a matching$Omega(sqrt{alpha})$lower bound. In contrast, Konstantinov and Lampert(2021) showed for proper learners the loss in accuracy for both notions is$Omega(1)$. The key technical novelty of our work is how randomization canbypass simple \"tricks\" an adversary can use to amplify his power. We alsoconsider additional fairness notions including Equalized Odds and Calibration.For these fairness notions, the excess accuracy clusters into three naturalregimes $O(alpha)$,$O(sqrt{alpha})$ and $O(1)$. These results provide a morefine-grained view of the sensitivity of fairness-constrained learning toadversarial noise in training data.", "output": "On the Vulnerability of Fairness Constrained Learning to Malicious Noise."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, with the emergence of numerous Large Language Models (LLMs), theimplementation of AI has entered a new era. Irrespective of these models' owncapacity and structure, there is a growing demand for LLMs to possess enhancedcomprehension of longer and more complex contexts with relatively smallersizes. Models often encounter an upper limit when processing sequences ofsentences that extend beyond their comprehension capacity and result inoff-topic or even chaotic responses. While several recent works attempt toaddress this issue in various ways, they rarely focus on \"why models are unableto compensate or strengthen their capabilities on their own\". In this paper, wethoroughly investigate the nature of information transfer within LLMs andpropose a novel technique called Attention Transition. This technique empowersmodels to achieve longer and better context comprehension with minimaladditional training or impact on generation fluency. Our experiments areconducted on the challenging XSum dataset using LLaMa-7b model with contexttoken length ranging from 800 to 1900. Results demonstrate that we achievesubstantial improvements compared with the original generation resultsevaluated by GPT4.", "output": "Empower Your Model with Longer and Better Context Comprehension."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learned cardinality estimation methods have achieved high precision comparedto traditional methods. Among learned methods, query-driven approaches face thedata and workload drift problem for a long time. Although both query-driven andhybrid methods are proposed to avoid this problem, even the state-of-art ofthem suffer from high training and estimation costs, limited scalability,instability, and long-tailed distribution problem on high cardinality and highdimensional tables, which seriously affects the practical application oflearned cardinality estimators. In this paper, we prove that most of theseproblems are directly caused by the widely used progressive sampling. We solvethis problem by introducing predicates into the autoregressive model andpropose Duet, a stable, efficient, and scalable hybrid method to estimatecardinality directly without sampling or any non-differentiable process, whichcan not only reduces the inference complexity from $O(n)$ to $O(1)$ compared toNaru and UAE but also achieve higher accuracy on high cardinality and highdimensional tables. Experimental results show that Duet can achieve all thedesign goals above and be much more practical and even has a lower inferencecost on CPU than that of most learned methods on GPU.", "output": "Duet: efficient and scalable hybriD neUral rElation undersTanding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many properties in the real world, such as desirability or strength incompetitive environment, can't be directly observed, which makes them difficultto evaluate. To deal with this challenging problem, prior works have primarilyfocused on estimating those properties of known items, especially the strengthof sports players, only of those who appears in paired comparison dataset. Inthis paper, we introduce Deep Bradley-Terry Rating (DBTR), a novel ML frameworkto evaluate any properties of unknown items, not necessarily present in thetraining data. Our method seamlessly integrates traditional Bradley-Terry modelwith a neural network structure. We also generalizes this architecture furtherfor asymmetric environment with unfairness, which is much more common in realworld settings. In our experimental analysis, DBTR successfully learned desiredquantification of those properties.", "output": "Deep Bradley-Terry Rating: Estimate Properties Without Metric of Unseen Items."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Preserving training dynamics across batch sizes is an important tool forpractical machine learning as it enables the trade-off between batch size andwall-clock time. This trade-off is typically enabled by a scaling rule, forexample, in stochastic gradient descent, one should scale the learning ratelinearly with the batch size. Another important tool for practical machinelearning is the model Exponential Moving Average (EMA), which is a model copythat does not receive gradient information, but instead follows its targetmodel with some momentum. This model EMA can improve the robustness andgeneralization properties of supervised learning, stabilize pseudo-labeling,and provide a learning signal for Self-Supervised Learning (SSL). Prior workshave treated the model EMA separately from optimization, leading to differenttraining dynamics across batch sizes and lower model performance. In this work,we provide a scaling rule for optimization in the presence of model EMAs anddemonstrate its validity across a range of architectures, optimizers, and datamodalities. We also show the rule's validity where the model EMA contributes tothe optimization of the target model, enabling us to train EMA-basedpseudo-labeling and SSL methods at small and large batch sizes. For SSL, weenable training of BYOL up to batch size 24,576 without sacrificingperformance, optimally a 6$times$ wall-clock time reduction.", "output": "How to Scale Your EMA."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recognizing handwritten digits is a challenging task primarily due to thediversity of writing styles and the presence of noisy images. The widely usedMNIST dataset, which is commonly employed as a benchmark for this task,includes distorted digits with irregular shapes, incomplete strokes, andvarying skew in both the training and testing datasets. Consequently, thesefactors contribute to reduced accuracy in digit recognition. To overcome thischallenge, we propose a two-stage deep learning approach. In the first stage,we create a simple neural network to identify distorted digits within thetraining set. This model serves to detect and filter out such distorted andambiguous images. In the second stage, we exclude these identified images fromthe training dataset and proceed to retrain the model using the filtereddataset. This process aims to improve the classification accuracy andconfidence levels while mitigating issues of underfitting and overfitting. Ourexperimental results demonstrate the effectiveness of the proposed approach,achieving an accuracy rate of over 99.5% on the testing dataset. Thissignificant improvement showcases the potential of our method in enhancingdigit classification accuracy. In our future work, we intend to explore thescalability of this approach and investigate techniques to further enhanceaccuracy by reducing the size of the training data.", "output": "Pruning Distorted Images in MNIST Handwritten Digits."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large-scale text-to-image models pre-trained on massive text-image pairs showexcellent performance in image synthesis recently. However, image can providemore intuitive visual concepts than plain text. People may ask: how can weintegrate the desired visual concept into an existing image, such as ourportrait? Current methods are inadequate in meeting this demand as they lackthe ability to preserve content or translate visual concepts effectively.Inspired by this, we propose a novel framework named visual concept translator(VCT) with the ability to preserve content in the source image and translatethe visual concepts guided by a single reference image. The proposed VCTcontains a content-concept inversion (CCI) process to extract contents andconcepts, and a content-concept fusion (CCF) process to gather the extractedinformation to obtain the target image. Given only one reference image, theproposed VCT can complete a wide range of general image-to-image translationtasks with excellent results. Extensive experiments are conducted to prove thesuperiority and effectiveness of the proposed methods. Codes are available at", "output": "General Image-to-Image Translation with One-Shot Image Guidance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural operations that rely on neighborhood information are much moreexpensive when deployed on point clouds than on grid data due to the irregulardistances between points in a point cloud. In a grid, on the other hand, we cancompute the kernel only once and reuse it for all query positions. As a result,operations that rely on neighborhood information scale much worse for pointclouds than for grid data, specially for large inputs and large neighborhoods.In this work, we address the scalability issue of point cloud methods bytackling its root cause: the irregularity of the data. We propose learnablegridification as the first step in a point cloud processing pipeline totransform the point cloud into a compact, regular grid. Thanks togridification, subsequent layers can use operations defined on regular grids,e.g., Conv3D, which scale much better than native point cloud methods. We thenextend gridification to point cloud to point cloud tasks, e.g., segmentation,by adding a learnable de-gridification step at the end of the point cloudprocessing pipeline to map the compact, regular grid back to its original pointcloud form. Through theoretical and empirical analysis, we show that gridifiednetworks scale better in terms of memory and time than networks directlyapplied on raw point cloud data, while being able to achieve competitiveresults. Our code is publicly available at", "output": "Learned Gridification for Efficient Point Cloud Processing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-Task Learning (MTL) aims to learn multiple tasks simultaneously whileexploiting their mutual relationships. By using shared resources tosimultaneously calculate multiple outputs, this learning paradigm has thepotential to have lower memory requirements and inference times compared to thetraditional approach of using separate methods for each task. Previous work inMTL has mainly focused on fully-supervised methods, as task relationships cannot only be leveraged to lower the level of data-dependency of those methodsbut they can also improve performance. However, MTL introduces a set ofchallenges due to a complex optimisation scheme and a higher labelingrequirement. This review focuses on how MTL could be utilised under differentpartial supervision settings to address these challenges. First, this reviewanalyses how MTL traditionally uses different parameter sharing techniques totransfer knowledge in between tasks. Second, it presents the differentchallenges arising from such a multi-objective optimisation scheme. Third, itintroduces how task groupings can be achieved by analysing task relationships.Fourth, it focuses on how partially supervised methods applied to MTL cantackle the aforementioned challenges. Lastly, this review presents theavailable datasets, tools and benchmarking results of such methods.", "output": "When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human-centric scene understanding is significant for real-world applications,but it is extremely challenging due to the existence of diverse human poses andactions, complex human-environment interactions, severe occlusions in crowds,etc. In this paper, we present a large-scale multi-modal dataset forhuman-centric scene understanding, dubbed HuCenLife, which is collected indiverse daily-life scenarios with rich and fine-grained annotations. OurHuCenLife can benefit many 3D perception tasks, such as segmentation,detection, action recognition, etc., and we also provide benchmarks for thesetasks to facilitate related research. In addition, we design novel modules forLiDAR-based segmentation and action recognition, which are more applicable forlarge-scale human-centric scenarios and achieve state-of-the-art performance.", "output": "Human-centric Scene Understanding for 3D Large-scale Scenarios."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In machine learning, generative modeling aims to learn to generate new datastatistically similar to the training data distribution. In this paper, wesurvey learning generative models under limited data, few shots and zero shot,referred to as Generative Modeling under Data Constraint (GM-DC). This is animportant topic when data acquisition is challenging, e.g. healthcareapplications. We discuss background, challenges, and propose two taxonomies:one on GM-DC tasks and another on GM-DC approaches. Importantly, we studyinteractions between different GM-DC tasks and approaches. Furthermore, wehighlight research gaps, research trends, and potential avenues for futureexploration. Project website: ", "output": "A Survey on Generative Modeling with Limited Data, Few Shots, and Zero Shot."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In latest years, deep learning has gained a leading role in the pansharpeningof multiresolution images. Given the lack of ground truth data, most deeplearning-based methods carry out supervised training in a reduced-resolutiondomain. However, models trained on downsized images tend to perform poorly onhigh-resolution target images. For this reason, several research groups are nowturning to unsupervised training in the full-resolution domain, through thedefinition of appropriate loss functions and training paradigms. In thiscontext, we have recently proposed a full-resolution training framework whichcan be applied to many existing architectures.Here, we propose a new deep learning-based pansharpening model that fullyexploits the potential of this approach and provides cutting-edge performance.Besides architectural improvements with respect to previous work, such as theuse of residual attention modules, the proposed model features a novel lossfunction that jointly promotes the spectral and spatial quality of thepansharpened data. In addition, thanks to a new fine-tuning strategy, itimproves inference-time adaptation to target images. Experiments on a largevariety of test images, performed in challenging scenarios, demonstrate thatthe proposed method compares favorably with the state of the art both in termsof numerical results and visual output. Code is available online at", "output": "Unsupervised Deep Learning-based Pansharpening with Jointly-Enhanced Spectral and Spatial Fidelity."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Aortic stenosis (AS) is a common heart valve disease that requires accurateand timely diagnosis for appropriate treatment. Most current automatic ASseverity detection methods rely on black-box models with a low level oftrustworthiness, which hinders clinical adoption. To address this issue, wepropose ProtoASNet, a prototypical network that directly detects AS from B-modeechocardiography videos, while making interpretable predictions based on thesimilarity between the input and learned spatio-temporal prototypes. Thisapproach provides supporting evidence that is clinically relevant, as theprototypes typically highlight markers such as calcification and restrictedmovement of aortic valve leaflets. Moreover, ProtoASNet utilizes abstentionloss to estimate aleatoric uncertainty by defining a set of prototypes thatcapture ambiguity and insufficient information in the observed data. Thisprovides a reliable system that can detect and explain when it may fail. Weevaluate ProtoASNet on a private dataset and the publicly available TMED-2dataset, where it outperforms existing state-of-the-art methods with anaccuracy of 80.0% and 79.7%, respectively. Furthermore, ProtoASNet providesinterpretability and an uncertainty measure for each prediction, which canimprove transparency and facilitate the interactive usage of deep networks toaid clinical decision-making. Our source code is available at:", "output": "ProtoASNet: Dynamic Prototypes for Inherently Interpretable and Uncertainty-Aware Aortic Stenosis Classification in Echocardiography."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the past decades, automated high-content microscopy demonstrated itsability to deliver large quantities of image-based data powering theversatility of phenotypic drug screening and systems biology applications.However, as the sizes of image-based datasets grew, it became infeasible forhumans to control, avoid and overcome the presence of imaging and samplepreparation artefacts in the images. While novel techniques like machinelearning and deep learning may address these shortcomings through generativeimage inpainting, when applied to sensitive research data this may come at thecost of undesired image manipulation. Undesired manipulation may be caused byphenomena such as neural hallucinations, to which some artificial neuralnetworks are prone. To address this, here we evaluate the state-of-the-artinpainting methods for image restoration in a high-content fluorescencemicroscopy dataset of cultured cells with labelled nuclei. We show thatarchitectures like DeepFill V2 and Edge Connect can faithfully restoremicroscopy images upon fine-tuning with relatively little data. Our resultsdemonstrate that the area of the region to be restored is of higher importancethan shape. Furthermore, to control for the quality of restoration, we proposea novel phenotype-preserving metric design strategy. In this strategy, the sizeand count of the restored biological phenotypes like cell nuclei are quantifiedto penalise undesirable manipulation. We argue that the design principles ofour approach may also generalise to other applications.", "output": "Phenotype-preserving metric design for high-content image reconstruction by generative inpainting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Few-shot semantic segmentation (FSS) offers immense potential in the field ofmedical image analysis, enabling accurate object segmentation with limitedtraining data. However, existing FSS techniques heavily rely on annotatedsemantic classes, rendering them unsuitable for medical images due to thescarcity of annotations. To address this challenge, multiple contributions areproposed: First, inspired by spectral decomposition methods, the problem ofimage decomposition is reframed as a graph partitioning task. The eigenvectorsof the Laplacian matrix, derived from the feature affinity matrix ofself-supervised networks, are analyzed to estimate the distribution of theobjects of interest from the support images. Secondly, we propose a novelself-supervised FSS framework that does not rely on any annotation. Instead, itadaptively estimates the query mask by leveraging the eigenvectors obtainedfrom the support images. This approach eliminates the need for manualannotation, making it particularly suitable for medical images with limitedannotated data. Thirdly, to further enhance the decoding of the query imagebased on the information provided by the support image, we introduce amulti-scale large kernel attention module. By selectively emphasizing relevantfeatures and details, this module improves the segmentation process andcontributes to better object delineation. Evaluations on both natural andmedical image datasets demonstrate the efficiency and effectiveness of ourmethod. Moreover, the proposed approach is characterized by its generality andmodel-agnostic nature, allowing for seamless integration with various deeparchitectures. The code is publicly available athref{", "output": "Self-supervised Few-shot Learning for Semantic Segmentation: An Annotation-free Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We release MiDaS v3.1 for monocular depth estimation, offering a variety ofnew models based on different encoder backbones. This release is motivated bythe success of transformers in computer vision, with a large variety ofpretrained vision transformers now available. We explore how using the mostpromising vision transformers as image encoders impacts depth estimationquality and runtime of the MiDaS architecture. Our investigation also includesrecent convolutional approaches that achieve comparable quality to visiontransformers in image classification tasks. While the previous release MiDaSv3.0 solely leverages the vanilla vision transformer ViT, MiDaS v3.1 offersadditional models based on BEiT, Swin, SwinV2, Next-ViT and LeViT. These modelsoffer different performance-runtime tradeoffs. The best model improves thedepth estimation quality by 28% while efficient models enable downstream tasksrequiring high frame rates. We also describe the general process forintegrating new backbones. A video summarizing the work can be found at and the code is available at", "output": "MiDaS v3.1 -- A Model Zoo for Robust Monocular Relative Depth Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Purpose: This study evaluated the out-of-domain performance andgeneralization capabilities of automated medical image segmentation models,with a particular focus on adaptation to new image acquisitions and diseasetype.Materials: Datasets from both non-contrast and contrast-enhanced abdominal CTscans of healthy patients and those with polycystic kidney disease (PKD) wereused. A total of 400 images (100 non-contrast controls, 100 contrast controls,100 non-contrast PKD, 100 contrast PKD) were utilized for training/validationof models to segment kidneys, livers, and spleens, and the final models werethen tested on 100 non-contrast CT images of patients affected by PKD.Performance was evaluated using Dice, Jaccard, TPR, and Precision.Results: Models trained on a diverse range of data showed no worseperformance than models trained exclusively on in-domain data when tested onin-domain data. For instance, the Dice similarity of the model trained on 25%from each dataset was found to be non-inferior to the model trained purely onin-domain data.Conclusions: The results indicate that broader training examplessignificantly enhances model generalization and out-of-domain performance,thereby improving automated segmentation tools' applicability in clinicalsettings. The study's findings provide a roadmap for future research to adopt adata-centric approach in medical image AI model development.", "output": "Role of Image Acquisition and Patient Phenotype Variations in Automatic Segmentation Model Generalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Computer vision (CV), a non-intrusive and cost-effective technology, hasfurthered the development of precision livestock farming by enabling optimizeddecision-making through timely and individualized animal care. The availabilityof affordable two- and three-dimensional camera sensors, combined with variousmachine learning and deep learning algorithms, has provided a valuableopportunity to improve livestock production systems. However, despite theavailability of various CV tools in the public domain, applying these tools toanimal data can be challenging, often requiring users to have programming anddata analysis skills, as well as access to computing resources. Moreover, therapid expansion of precision livestock farming is creating a growing need toeducate and train animal science students in CV. This presents educators withthe challenge of efficiently demonstrating the complex algorithms involved inCV. Thus, the objective of this study was to develop ShinyAnimalCV, anopen-source cloud-based web application. This application provides auser-friendly interface for performing CV tasks, including object segmentation,detection, three-dimensional surface visualization, and extraction of two- andthree-dimensional morphological features. Nine pre-trained CV models usingtop-view animal data are included in the application. ShinyAnimalCV has beendeployed online using cloud computing platforms. The source code ofShinyAnimalCV is available on GitHub, along with detailed documentation ontraining CV models using custom data and deploying ShinyAnimalCV locally toallow users to fully leverage the capabilities of the application.ShinyAnimalCV can contribute to CV research and teaching in the animal sciencecommunity.", "output": "Technical note: ShinyAnimalCV: open-source cloud-based web application for object detection, segmentation, and three-dimensional visualization of animals using computer vision."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we introduce a challenging image restoration task, referred toas SuperInpaint, which aims to reconstruct missing regions in low-resolutionimages and generate completed images with arbitrarily higher resolutions. Wehave found that this task cannot be effectively addressed by stackingstate-of-the-art super-resolution and image inpainting methods as they amplifyeach other's flaws, leading to noticeable artifacts. To overcome theselimitations, we propose the detail-enhanced attentional implicit representation(DEAR) that can achieve SuperInpaint with a single model, resulting inhigh-quality completed images with arbitrary resolutions. Specifically, we usea deep convolutional network to extract the latent embedding of an input imageand then enhance the high-frequency components of the latent embedding via anadaptive high-pass filter. This leads to detail-enhanced semantic embedding. Wefurther feed the semantic embedding into an unmask-attentional module thatsuppresses embeddings from ineffective masked pixels. Additionally, we extracta pixel-wise importance map that indicates which pixels should be used forimage reconstruction. Given the coordinates of a pixel we want to reconstruct,we first collect its neighboring pixels in the input image and extract theirdetail-enhanced semantic embeddings, unmask-attentional semantic embeddings,importance values, and spatial distances to the desired pixel. Then, we feedall the above terms into an implicit representation and generate the color ofthe specified pixel. To evaluate our method, we extend three existing datasetsfor this new task and build 18 meaningful baselines using SOTA inpainting andsuper-resolution methods. Extensive experimental results demonstrate that ourmethod outperforms all existing methods by a significant margin on four widelyused metrics.", "output": "SuperInpaint: Learning Detail-Enhanced Attentional Implicit Representation for Super-resolutional Image Inpainting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Interpretable part-prototype models are computer vision models that areexplainable by design. The models learn prototypical parts and recognise thesecomponents in an image, thereby combining classification and explanation.Despite the recent attention for intrinsically interpretable models, there isno comprehensive overview on evaluating the explanation quality ofinterpretable part-prototype models. Based on the Co-12 properties forexplanation quality as introduced in arXiv:2201.08164 (e.g., correctness,completeness, compactness), we review existing work that evaluatespart-prototype models, reveal research gaps and outline future approaches forevaluation of the explanation quality of part-prototype models. This paper,therefore, contributes to the progression and maturity of this relatively newresearch field on interpretable part-prototype models. We additionally providea ``Co-12 cheat sheet'' that acts as a concise summary of our findings onevaluating part-prototype models.", "output": "The Co-12 Recipe for Evaluating Interpretable Part-Prototype Image Classifiers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In brain tumor resection, accurate removal of cancerous tissues whilepreserving eloquent regions is crucial to the safety and outcomes of thetreatment. However, intra-operative tissue deformation (called brain shift) canmove the surgical target and render the pre-surgical plan invalid.Intra-operative ultrasound (iUS) has been adopted to provide real-time imagesto track brain shift, and inter-modal (i.e., MRI-iUS) registration is oftenrequired to update the pre-surgical plan. Quality control for the registrationresults during surgery is important to avoid adverse outcomes, but manualverification faces great challenges due to difficult 3D visualization and thelow contrast of iUS. Automatic algorithms are urgently needed to address thisissue, but the problem was rarely attempted. Therefore, we propose a novel deeplearning technique based on 3D focal modulation in conjunction with uncertaintyestimation to accurately assess MRI-iUS registration errors for brain tumorsurgery. Developed and validated with the public RESECT clinical database, theresulting algorithm can achieve an estimation error of 0.59+-0.57 mm.", "output": "FocalErrorNet: Uncertainty-aware focal modulation network for inter-modal registration error estimation in ultrasound-guided neurosurgery."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper explores the representation of vehicle lights in computer visionand its implications for various tasks in the field of autonomous driving.Different specifications for representing vehicle lights, including boundingboxes, center points, corner points, and segmentation masks, are discussed interms of their strengths and weaknesses. Three important tasks in autonomousdriving that can benefit from vehicle light detection are identified: nighttimevehicle detection, 3D vehicle orientation estimation, and dynamic trajectorycues. Each task may require a different representation of the light. Thechallenges of collecting and annotating large datasets for training data-drivenmodels are also addressed, leading to introduction of the LISA Vehicle LightsDataset and associated Light Visibility Model, which provides light annotationsspecifically designed for downstream applications in vehicle detection, intentand trajectory prediction, and safe path planning. A comparison of existingvehicle light datasets is provided, highlighting the unique features andlimitations of each dataset. Overall, this paper provides insights into therepresentation of vehicle lights and the importance of accurate annotations fortraining effective detection models in autonomous driving applications. Ourdataset and model are made available at", "output": "Patterns of Vehicle Lights: Addressing Complexities in Curation and Annotation of Camera-Based Vehicle Light Datasets and Metrics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Homologous anatomical landmarks between medical scans are instrumental inquantitative assessment of image registration quality in various clinicalapplications, such as MRI-ultrasound registration for tissue shift correctionin ultrasound-guided brain tumor resection. While manually identified landmarkpairs between MRI and ultrasound (US) have greatly facilitated the validationof different registration algorithms for the task, the procedure requiressignificant expertise, labor, and time, and can be prone to inter- andintra-rater inconsistency. So far, many traditional and machine learningapproaches have been presented for anatomical landmark detection, but theyprimarily focus on mono-modal applications. Unfortunately, despite the clinicalneeds, inter-modal/contrast landmark detection has very rarely been attempted.Therefore, we propose a novel contrastive learning framework to detectcorresponding landmarks between MRI and intra-operative US scans inneurosurgery. Specifically, two convolutional neural networks were trainedjointly to encode image features in MRI and US scans to help match the US imagepatch that contain the corresponding landmarks in the MRI. We developed andvalidated the technique using the public RESECT database. With a mean landmarkdetection accuracy of 5.88+-4.79 mm against 18.78+-4.77 mm with SIFT features,the proposed method offers promising results for MRI-US landmark detection inneurosurgical applications for the first time.", "output": "Towards multi-modal anatomical landmark detection for ultrasound-guided brain tumor resection with contrastive learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper details the challenges in applying two computer vision systems, anEfficientDET supervised learning model and the unsupervised RX spectralclassifier, to 98.9 GB of drone imagery from the Wu-Murad wilderness search andrescue (WSAR) effort in Japan and identifies 3 directions for future research.There have been at least 19 proposed approaches and 3 datasets aimed atlocating missing persons in drone imagery, but only 3 approaches (2unsupervised and 1 of an unknown structure) are referenced in the literature ashaving been used in an actual WSAR operation. Of these proposed approaches, theEfficientDET architecture and the unsupervised spectral RX classifier wereselected as the most appropriate for this setting. The EfficientDET model wasapplied to the HERIDAL dataset and despite achieving performance that isstatistically equivalent to the state-of-the-art, the model fails to translateto the real world in terms of false positives (e.g., identifying tree limbs androcks as people), and false negatives (e.g., failing to identify members of thesearch team). The poor results in practice for algorithms that showed goodresults on datasets suggest 3 areas of future research: more realistic datasetsfor wilderness SAR, computer vision models that are capable of seamlesslyhandling the variety of imagery that can be collected during actual WSARoperations, and better alignment on performance measures.", "output": "Open Problems in Computer Vision for Wilderness SAR and The Search for Patricia Wu-Murad."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Holistic 3D human-scene reconstruction is a crucial and emerging researcharea in robot perception. A key challenge in holistic 3D human-scenereconstruction is to generate a physically plausible 3D scene from a singlemonocular RGB image. The existing research mainly proposes optimization-basedapproaches for reconstructing the scene from a sequence of RGB frames withexplicitly defined physical laws and constraints between different sceneelements (humans and objects). However, it is hard to explicitly define andmodel every physical law in every scenario. This paper proposes using animplicit feature representation of the scene elements to distinguish aphysically plausible alignment of humans and objects from an implausible one.We propose using a graph-based holistic representation with an encoded physicalrepresentation of the scene to analyze the human-object and object-objectinteractions within the scene. Using this graphical representation, weadversarially train our model to learn the feasible alignments of the sceneelements from the training data itself without explicitly defining the laws andconstraints between them. Unlike the existing inference-time optimization-basedapproaches, we use this adversarially trained model to produce a per-frame 3Dreconstruction of the scene that abides by the physical laws and constraints.Our learning-based method achieves comparable 3D reconstruction quality toexisting optimization-based holistic human-scene reconstruction methods anddoes not need inference time optimization. This makes it better suited whencompared to existing methods, for potential use in robotic applications, suchas robot navigation, etc.", "output": "Physically Plausible 3D Human-Scene Reconstruction from Monocular RGB Image using an Adversarial Learning Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vehicle light detection is required for important downstream safe autonomousdriving tasks, such as predicting a vehicle's light state to determine if thevehicle is making a lane change or turning. Currently, many vehicle lightdetectors use single-stage detectors which predict bounding boxes to identify avehicle light, in a manner decoupled from vehicle instances. In this paper, wepresent a method for detecting a vehicle light given an upstream vehicledetection and approximation of a visible light's center. Our method predictsfour approximate corners associated with each vehicle light. We experiment withCNN architectures, data augmentation, and contextual preprocessing methodsdesigned to reduce surrounding-vehicle confusion. We achieve an averagedistance error from the ground truth corner of 5.09 pixels, about 17.24% of thesize of the vehicle light on average. We train and evaluate our model on theLISA Lights dataset, allowing us to thoroughly evaluate our vehicle lightcorner detection model on a large variety of vehicle light shapes and lightingconditions. We propose that this model can be integrated into a pipeline withvehicle detection and vehicle light center detection to make a fully-formedvehicle light detection network, valuable to identifying trajectory-informativesignals in driving scenes.", "output": "Robust Detection, Assocation, and Localization of Vehicle Lights: A Context-Based Cascaded CNN Approach and Evaluations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Identifying traffic accidents in driving videos is crucial to ensuring thesafety of autonomous driving and driver assistance systems. To address thepotential danger caused by the long-tailed distribution of driving events,existing traffic accident detection (TAD) methods mainly rely on unsupervisedlearning. However, TAD is still challenging due to the rapid movement ofcameras and dynamic scenes in driving scenarios. Existing unsupervised TADmethods mainly rely on a single pretext task, i.e., an appearance-based orfuture object localization task, to detect accidents. However, appearance-basedapproaches are easily disturbed by the rapid movement of the camera and changesin illumination, which significantly reduce the performance of traffic accidentdetection. Methods based on future object localization may fail to captureappearance changes in video frames, making it difficult to detect ego-involvedaccidents (e.g., out of control of the ego-vehicle). In this paper, we proposea novel memory-augmented multi-task collaborative framework (MAMTCF) forunsupervised traffic accident detection in driving videos. Different fromprevious approaches, our method can more accurately detect both ego-involvedand non-ego accidents by simultaneously modeling appearance changes and objectmotions in video frames through the collaboration of optical flowreconstruction and future object localization tasks. Further, we introduce amemory-augmented motion representation mechanism to fully explore theinterrelation between different types of motion representations and exploit thehigh-level features of normal traffic patterns stored in memory to augmentmotion representations, thus enlarging the difference from anomalies.Experimental results on recently published large-scale dataset demonstrate thatour method achieves better performance compared to previous state-of-the-artapproaches.", "output": "A Memory-Augmented Multi-Task Collaborative Framework for Unsupervised Traffic Accident Detection in Driving Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Gait recognition holds the promise of robustly identifying subjects based ontheir walking patterns instead of color information. While previous approacheshave performed well for curated indoor scenes, they have significantly impededapplicability in unconstrained situations, e.g. outdoor, long distance scenes.We propose an end-to-end GAit DEtection and Recognition (GADER) algorithm forhuman authentication in challenging outdoor scenarios. Specifically, GADERleverages a Double Helical Signature to detect the fragment of human movementand incorporates a novel gait recognition method, which learns representationsby distilling from an auxiliary RGB recognition model. At inference time, GADERonly uses the silhouette modality but benefits from a more robustrepresentation. Extensive experiments on indoor and outdoor datasetsdemonstrate that the proposed method outperforms the State-of-The-Arts for gaitrecognition and verification, with a significant 20.6% improvement onunconstrained, long distance scenes.", "output": "GADER: GAit DEtection and Recognition in the Wild."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This study introduces a novel reconstruction method for dental cone-beamcomputed tomography (CBCT), focusing on effectively reducing metal-inducedartifacts commonly encountered in the presence of prevalent metallic implants.Despite significant progress in metal artifact reduction techniques, challengespersist owing to the intricate physical interactions between polychromaticX-ray beams and metal objects, which are further compounded by the additionaleffects associated with metal-tooth interactions and factors specific to thedental CBCT data environment. To overcome these limitations, we propose animplicit neural network that generates two distinct and informative tomographicimages. One image represents the monochromatic attenuation distribution at aspecific energy level, whereas the other captures the nonlinear beam-hardeningfactor resulting from the polychromatic nature of X-ray beams. In contrast toexisting CT reconstruction techniques, the proposed method relies exclusivelyon the Beer--Lambert law, effectively preventing the generation ofmetal-induced artifacts during the backprojection process commonly implementedin conventional methods. Extensive experimental evaluations demonstrate thatthe proposed method effectively reduces metal artifacts while providinghigh-quality image reconstructions, thus emphasizing the significance of thesecond image in capturing the nonlinear beam-hardening factor.", "output": "Neural Representation-Based Method for Metal-induced Artifact Reduction in Dental CBCT Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The UNet architecture, based on Convolutional Neural Networks (CNN), hasdemonstrated its remarkable performance in medical image analysis. However, itfaces challenges in capturing long-range dependencies due to the limitedreceptive fields and inherent bias of convolutional operations. Recently,numerous transformer-based techniques have been incorporated into the UNetarchitecture to overcome this limitation by effectively capturing globalfeature correlations. However, the integration of the Transformer modules mayresult in the loss of local contextual information during the global featurefusion process. To overcome these challenges, we propose a 2D medical imagesegmentation model called Multi-scale Cross Perceptron Attention Network(MCPA). The MCPA consists of three main components: an encoder, a decoder, anda Cross Perceptron. The Cross Perceptron first captures the local correlationsusing multiple Multi-scale Cross Perceptron modules, facilitating the fusion offeatures across scales. The resulting multi-scale feature vectors are thenspatially unfolded, concatenated, and fed through a Global Perceptron module tomodel global dependencies. Furthermore, we introduce a Progressive Dual-branchStructure to address the semantic segmentation of the image involving finertissue structures. This structure gradually shifts the segmentation focus ofMCPA network training from large-scale structural features to moresophisticated pixel-level features. We evaluate our proposed MCPA model onseveral publicly available medical image datasets from different tasks anddevices, including the open large-scale dataset of CT (Synapse), MRI (ACDC),fundus camera (DRIVE, CHASE_DB1, HRF), and OCTA (ROSE). The experimentalresults show that our MCPA model achieves state-of-the-art performance. Thecode is available at", "output": "MCPA: Multi-scale Cross Perceptron Attention Network for 2D Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The purpose of multi-object tracking (MOT) is to continuously track andidentify objects detected in videos. Currently, most methods for multi-objecttracking model the motion information and combine it with appearanceinformation to determine and track objects. In this paper, unfalsified controlis employed to address the ID-switch problem in multi-object tracking. Weestablish sequences of appearance information variations for the trajectoriesduring the tracking process and design a detection and rectification modulespecifically for ID-switch detection and recovery. We also propose a simple andeffective strategy to address the issue of ambiguous matching of appearanceinformation during the data association process. Experimental results onpublicly available MOT datasets demonstrate that the tracker exhibits excellenteffectiveness and robustness in handling tracking errors caused by occlusionsand rapid movements.", "output": "The detection and rectification for identity-switch based on unfalsified control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Face-swap DeepFake is an emerging AI-based face forgery technique that canreplace the original face in a video with a generated face of the targetidentity while retaining consistent facial attributes such as expression andorientation. Due to the high privacy of faces, the misuse of this technique canraise severe social concerns, drawing tremendous attention to defend againstDeepFakes recently. In this paper, we describe a new proactive defense methodcalled FakeTracer to expose face-swap DeepFakes via implanting traces intraining. Compared to general face-synthesis DeepFake, the face-swap DeepFakeis more complex as it involves identity change, is subjected to theencoding-decoding process, and is trained unsupervised, increasing thedifficulty of implanting traces into the training phase. To effectively defendagainst face-swap DeepFake, we design two types of traces, sustainable trace(STrace) and erasable trace (ETrace), to be added to training faces. During thetraining, these manipulated faces affect the learning of the face-swap DeepFakemodel, enabling it to generate faces that only contain sustainable traces. Inlight of these two traces, our method can effectively expose DeepFakes byidentifying them. Extensive experiments are conducted on the Celeb-DF dataset,compared with recent passive and proactive defense methods, and are studiedthoroughly regarding various factors, corroborating the efficacy of our methodon defending against face-swap DeepFake.", "output": "FakeTracer: Proactively Defending Against Face-swap DeepFakes via Implanting Traces in Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The presence of tertiary lymphoid structures (TLSs) on pancreaticpathological images is an important prognostic indicator of pancreatic tumors.Therefore, TLSs detection on pancreatic pathological images plays a crucialrole in diagnosis and treatment for patients with pancreatic tumors. However,fully supervised detection algorithms based on deep learning usually require alarge number of manual annotations, which is time-consuming andlabor-intensive. In this paper, we aim to detect the TLSs in a manner offew-shot learning by proposing a weakly supervised segmentation network. Wefirstly obtain the lymphocyte density maps by combining a pretrained model fornuclei segmentation and a domain adversarial network for lymphocyte nucleirecognition. Then, we establish a cross-scale attention guidance mechanism byjointly learning the coarse-scale features from the original histopathologyimages and fine-scale features from our designed lymphocyte density attention.A noise-sensitive constraint is introduced by an embedding signed distancefunction loss in the training procedure to reduce tiny prediction errors.Experimental results on two collected datasets demonstrate that our proposedmethod significantly outperforms the state-of-the-art segmentation-basedalgorithms in terms of TLSs detection accuracy. Additionally, we apply ourmethod to study the congruent relationship between the density of TLSs andperipancreatic vascular invasion and obtain some clinically statisticalresults.", "output": "A Weakly Supervised Segmentation Network Embedding Cross-scale Attention Guidance and Noise-sensitive Constraint for Detecting Tertiary Lymphoid Structures of Pancreatic Tumors."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Point cloud analysis (such as 3D segmentation and detection) is a challengingtask, because of not only the irregular geometries of many millions ofunordered points, but also the great variations caused by depth, viewpoint,occlusion, etc. Current studies put much focus on the adaption of neuralnetworks to the complex geometries of point clouds, but are blind to afundamental question: how to learn an appropriate point embedding space that isaware of both discriminative semantics and challenging variations? As aresponse, we propose a clustering based supervised learning scheme for pointcloud analysis. Unlike current de-facto, scene-wise training paradigm, ouralgorithm conducts within-class clustering on the point embedding space forautomatically discovering subclass patterns which are latent yet representativeacross scenes. The mined patterns are, in turn, used to repaint the embeddingspace, so as to respect the underlying distribution of the entire trainingdataset and improve the robustness to the variations. Our algorithm isprincipled and readily pluggable to modern point cloud segmentation networksduring training, without extra overhead during testing. With various 3D networkarchitectures (i.e., voxel-based, point-based, Transformer-based, automaticallysearched), our algorithm shows notable improvements on famous point cloudsegmentation datasets (i.e.,2.0-2.6% on single-scan and 2.0-2.2% multi-scan ofSemanticKITTI, 1.8-1.9% on S3DIS, in terms of mIoU). Our algorithm alsodemonstrates utility in 3D detection, showing 2.0-3.4% mAP gains on KITTI.", "output": "Clustering based Point Cloud Representation Learning for 3D Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent label mix-based augmentation methods have shown their effectiveness ingeneralization despite their simplicity, and their favorable effects are oftenattributed to semantic-level augmentation. However, we found that they arevulnerable to highly skewed class distribution, because scarce data classes arerarely sampled for inter-class perturbation. We propose TextManiA, atext-driven manifold augmentation method that semantically enriches visualfeature spaces, regardless of data distribution. TextManiA augments visual datawith intra-class semantic perturbation by exploiting easy-to-understandvisually mimetic words, i.e., attributes. To this end, we bridge between thetext representation and a target visual feature space, and propose an efficientvector augmentation. To empirically support the validity of our design, wedevise two visualization-based analyses and show the plausibility of the bridgebetween two different modality spaces. Our experiments demonstrate thatTextManiA is powerful in scarce samples with class imbalance as well as evendistribution. We also show compatibility with the label mix-based approaches inevenly distributed scarce data.", "output": "TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Classifying and segmenting patterns from a limited number of examples is asignificant challenge in remote sensing and earth observation due to thedifficulty in acquiring accurately labeled data in large quantities. Previousstudies have shown that meta-learning, which involves episodic training onquery and support sets, is a promising approach. However, there has been littleattention paid to direct fine-tuning techniques. This paper repurposescontrastive learning as a pre-training method for few-shot learning forclassification and semantic segmentation tasks. Specifically, we introduce agenerator-based contrastive learning framework (GenCo) that pre-trainsbackbones and simultaneously explores variants of feature samples. Infine-tuning, the auxiliary generator can be used to enrich limited labeled datasamples in feature space. We demonstrate the effectiveness of our method inimproving few-shot learning performance on two key remote sensing datasets:Agriculture-Vision and EuroSAT. Empirically, our approach outperforms purelysupervised training on the nearly 95,000 images in Agriculture-Vision for bothclassification and semantic segmentation tasks. Similarly, the proposedfew-shot method achieves better results on the land-cover classification taskon EuroSAT compared to the results obtained from fully supervised modeltraining on the dataset.", "output": "GenCo: An Auxiliary Generator from Contrastive Learning for Enhanced Few-Shot Learning in Remote Sensing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Occlusion is a common problem with biometric recognition in the wild. Thegeneralization ability of CNNs greatly decreases due to the adverse effects ofvarious occlusions. To this end, we propose a novel unified frameworkintegrating the merits of both CNNs and graph models to overcome occlusionproblems in biometric recognition, called multiscale dynamic graphrepresentation (MS-DGR). More specifically, a group of deep features reflectedon certain subregions is recrafted into a feature graph (FG). Each node insidethe FG is deemed to characterize a specific local region of the input sample,and the edges imply the co-occurrence of non-occluded regions. By analyzing thesimilarities of the node representations and measuring the topologicalstructures stored in the adjacent matrix, the proposed framework leveragesdynamic graph matching to judiciously discard the nodes corresponding to theoccluded parts. The multiscale strategy is further incorporated to attain morediverse nodes representing regions of various sizes. Furthermore, the proposedframework exhibits a more illustrative and reasonable inference by showing thepaired nodes. Extensive experiments demonstrate the superiority of the proposedframework, which boosts the accuracy in both natural and occlusion-simulatedcases by a large margin compared with that of baseline methods.", "output": "Multiscale Dynamic Graph Representation for Biometric Recognition with Occlusions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present NeRF-Det, a novel method for indoor 3D detection with posed RGBimages as input. Unlike existing indoor 3D detection methods that struggle tomodel scene geometry, our method makes novel use of NeRF in an end-to-endmanner to explicitly estimate 3D geometry, thereby improving 3D detectionperformance. Specifically, to avoid the significant extra latency associatedwith per-scene optimization of NeRF, we introduce sufficient geometry priors toenhance the generalizability of NeRF-MLP. Furthermore, we subtly connect thedetection and NeRF branches through a shared MLP, enabling an efficientadaptation of NeRF to detection and yielding geometry-aware volumetricrepresentations for 3D detection. Our method outperforms state-of-the-arts by3.9 mAP and 3.1 mAP on the ScanNet and ARKITScenes benchmarks, respectively. Weprovide extensive analysis to shed light on how NeRF-Det works. As a result ofour joint-training design, NeRF-Det is able to generalize well to unseen scenesfor object detection, view synthesis, and depth estimation tasks withoutrequiring per-scene optimization. Code is available aturl{", "output": "NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It has long been an ill-posed problem to predict absolute depth maps fromsingle images in real (unseen) indoor scenes. We observe that it is essentiallydue to not only the scale-ambiguous problem but also the focal-ambiguousproblem that decreases the generalization ability of monocular depthestimation. That is, images may be captured by cameras of different focallengths in scenes of different scales. In this paper, we develop afocal-and-scale depth estimation model to well learn absolute depth maps fromsingle images in unseen indoor scenes. First, a relative depth estimationnetwork is adopted to learn relative depths from single images with diversescales/semantics. Second, multi-scale features are generated by mapping asingle focal length value to focal length features and concatenating them withintermediate features of different scales in relative depth estimation.Finally, relative depths and multi-scale features are jointly fed into anabsolute depth estimation network. In addition, a new pipeline is developed toaugment the diversity of focal lengths of public datasets, which are oftencaptured with cameras of the same or similar focal lengths. Our model istrained on augmented NYUDv2 and tested on three unseen datasets. Our modelconsiderably improves the generalization ability of depth estimation by 41%/13%(RMSE) with/without data augmentation compared with five recent SOTAs and wellalleviates the deformation problem in 3D reconstruction. Notably, our modelwell maintains the accuracy of depth estimation on original NYUDv2.", "output": "FS-Depth: Focal-and-Scale Depth Estimation from a Single Image in Unseen Indoor Scene."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "360{deg} images can provide an omnidirectional field of view which isimportant for stable and long-term scene perception. In this paper, we explore360{deg} images for visual object tracking and perceive new challenges causedby large distortion, stitching artifacts, and other unique attributes of360{deg} images. To alleviate these problems, we take advantage of novelrepresentations of target localization, i.e., bounding field-of-view, and thenintroduce a general 360 tracking framework that can adopt typical trackers foromnidirectional tracking. More importantly, we propose a new large-scaleomnidirectional tracking benchmark dataset, 360VOT, in order to facilitatefuture research. 360VOT contains 120 sequences with up to 113K high-resolutionframes in equirectangular projection. The tracking targets cover 32 categoriesin diverse scenarios. Moreover, we provide 4 types of unbiased ground truth,including (rotated) bounding boxes and (rotated) bounding field-of-views, aswell as new metrics tailored for 360{deg} images which allow for the accurateevaluation of omnidirectional tracking performance. Finally, we extensivelyevaluated 20 state-of-the-art visual trackers and provided a new baseline forfuture comparisons. Homepage: ", "output": "360VOT: A New Benchmark Dataset for Omnidirectional Visual Object Tracking."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With advances in generative artificial intelligence (AI), it is now possibleto produce realistic-looking automated reports for preliminary reads ofradiology images. This can expedite clinical workflows, improve accuracy andreduce overall costs. However, it is also well-known that such models oftenhallucinate, leading to false findings in the generated reports. In this paper,we propose a new method of fact-checking of AI-generated reports using theirassociated images. Specifically, the developed examiner differentiates real andfake sentences in reports by learning the association between an image andsentences describing real or potentially fake findings. To train such anexaminer, we first created a new dataset of fake reports by perturbing thefindings in the original ground truth radiology reports associated with images.Text encodings of real and fake sentences drawn from these reports are thenpaired with image encodings to learn the mapping to real/fake labels. Theutility of such an examiner is demonstrated for verifying automaticallygenerated reports by detecting and removing fake sentences. Future generativeAI approaches can use the resulting tool to validate their reports leading to amore responsible use of AI in expediting clinical workflows.", "output": "Fact-Checking of AI-Generated Reports."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Facial expression is related to facial muscle contractions and differentmuscle movements correspond to different emotional states. For micro-expressionrecognition, the muscle movements are usually subtle, which has a negativeimpact on the performance of current facial emotion recognition algorithms.Most existing methods use self-attention mechanisms to capture relationshipsbetween tokens in a sequence, but they do not take into account the inherentspatial relationships between facial landmarks. This can result in sub-optimalperformance on micro-expression recognition tasks.Therefore, learning torecognize facial muscle movements is a key challenge in the area ofmicro-expression recognition. In this paper, we propose a HierarchicalTransformer Network (HTNet) to identify critical areas of facial musclemovement. HTNet includes two major components: a transformer layer thatleverages the local temporal features and an aggregation layer that extractslocal and global semantical facial features. Specifically, HTNet divides theface into four different facial areas: left lip area, left eye area, right eyearea and right lip area. The transformer layer is used to focus on representinglocal minor muscle movement with local self-attention in each area. Theaggregation layer is used to learn the interactions between eye areas and lipareas. The experiments on four publicly available micro-expression datasetsshow that the proposed approach outperforms previous methods by a large margin.The codes and models are available at:url{", "output": "HTNet for micro-expression recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Due to the absence of fine structure and texture information, existingfusion-based few-shot image generation methods suffer from unsatisfactorygeneration quality and diversity. To address this problem, we propose a novelfeature Equalization fusion Generative Adversarial Network (EqGAN) for few-shotimage generation. Unlike existing fusion strategies that rely on either deepfeatures or local representations, we design two separate branches to fusestructures and textures by disentangling encoded features into shallow and deepcontents. To refine image contents at all feature levels, we equalize the fusedstructure and texture semantics at different scales and supplement the decoderwith richer information by skip connections. Since the fused structures andtextures may be inconsistent with each other, we devise a consistentequalization loss between the equalized features and the intermediate output ofthe decoder to further align the semantics. Comprehensive experiments on threepublic datasets demonstrate that, EqGAN not only significantly improvesgeneration performance with FID score (by up to 32.7%) and LPIPS score (by upto 4.19%), but also outperforms the state-of-the-arts in terms of accuracy (byup to 1.97%) for downstream classification tasks.", "output": "EqGAN: Feature Equalization Fusion for Few-shot Image Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we study the denoising diffusion probabilistic model (DDPM) inwavelet space, instead of pixel space, for visual synthesis. Considering thewavelet transform represents the image in spatial and frequency domains, wecarefully design a novel architecture SFUNet to effectively capture thecorrelation for both domains. Specifically, in the standard denoising U-Net forpixel data, we supplement the 2D convolutions and spatial-only attention layerswith our spatial frequency-aware convolution and attention modules to jointlymodel the complementary information from spatial and frequency domains inwavelet data. Our new architecture can be used as a drop-in replacement to thepixel-based network and is compatible with the vanilla DDPM training process.By explicitly modeling the wavelet signals, we find our model is able togenerate images with higher quality on CIFAR-10, FFHQ, LSUN-Bedroom, andLSUN-Church datasets, than the pixel-based counterpart.", "output": "Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current deep learning methods for low-light image enhancement (LLIE)typically rely on pixel-wise mapping learned from paired data. However, thesemethods often overlook the importance of considering degradationrepresentations, which can lead to sub-optimal outcomes. In this paper, weaddress this limitation by proposing a degradation-aware learning scheme forLLIE using diffusion models, which effectively integrates degradation and imagepriors into the diffusion process, resulting in improved image enhancement. Ourproposed degradation-aware learning scheme is based on the understanding thatdegradation representations play a crucial role in accurately modeling andcapturing the specific degradation patterns present in low-light images. Tothis end, First, a joint learning framework for both image generation and imageenhancement is presented to learn the degradation representations. Second, toleverage the learned degradation representations, we develop a Low-LightDiffusion model (LLDiffusion) with a well-designed dynamic diffusion module.This module takes into account both the color map and the latent degradationrepresentations to guide the diffusion process. By incorporating theseconditioning factors, the proposed LLDiffusion can effectively enhancelow-light images, considering both the inherent degradation patterns and thedesired color fidelity. Finally, we evaluate our proposed method on severalwell-known benchmark datasets, including synthetic and real-world unpaireddatasets. Extensive experiments on public benchmarks demonstrate that ourLLDiffusion outperforms state-of-the-art LLIE methods both quantitatively andqualitatively. The source code and pre-trained models are available at", "output": "LLDiffusion: Learning Degradation Representations in Diffusion Models for Low-Light Image Enhancement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Physical adversarial attacks have put a severe threat to DNN-based objectdetectors. To enhance security, a combination of visible and infrared sensorsis deployed in various scenarios, which has proven effective in disablingexisting single-modal physical attacks. To further demonstrate the potentialrisks in such cases, we design a unified adversarial patch that can performcross-modal physical attacks, achieving evasion in both modalitiessimultaneously with a single patch. Given the different imaging mechanisms ofvisible and infrared sensors, our work manipulates patches' shape features,which can be captured in different modalities when they undergo changes. Todeal with challenges, we propose a novel boundary-limited shape optimizationapproach that aims to achieve compact and smooth shapes for the adversarialpatch, making it easy to implement in the physical world. And a score-awareiterative evaluation method is also introduced to balance the fooling degreebetween visible and infrared detectors during optimization, which guides theadversarial patch to iteratively reduce the predicted scores of the multi-modalsensors. Furthermore, we propose an Affine-Transformation-based enhancementstrategy that makes the learnable shape robust to various angles, thusmitigating the issue of shape deformation caused by different shooting anglesin the real world. Our method is evaluated against several state-of-the-artobject detectors, achieving an Attack Success Rate (ASR) of over 80%. We alsodemonstrate the effectiveness of our approach in physical-world scenarios undervarious settings, including different angles, distances, postures, and scenesfor both visible and infrared sensors.", "output": "Unified Adversarial Patch for Visible-Infrared Cross-modal Attacks in the Physical World."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unsupervised Out-of-Distribution (OOD) detection consists in identifyinganomalous regions in images leveraging only models trained on images of healthyanatomy. An established approach is to tokenize images and model thedistribution of tokens with Auto-Regressive (AR) models. AR models are used to1) identify anomalous tokens and 2) in-paint anomalous representations within-distribution tokens. However, AR models are slow at inference time and proneto error accumulation issues which negatively affect OOD detection performance.Our novel method, MIM-OOD, overcomes both speed and error accumulation issuesby replacing the AR model with two task-specific networks: 1) a transformeroptimized to identify anomalous tokens and 2) a transformer optimized toin-paint anomalous tokens using masked image modelling (MIM). Our experimentswith brain MRI anomalies show that MIM-OOD substantially outperforms AR models(DICE 0.458 vs 0.301) while achieving a nearly 25x speedup (9.5s vs 244s).", "output": "MIM-OOD: Generative Masked Image Modelling for Out-of-Distribution Detection in Medical Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Thanks to High Dynamic Range (HDR) imaging methods, the scope of photographyhas seen profound changes recently. To be more specific, such methods try toreconstruct the lost luminosity of the real world caused by the limitation ofregular cameras from the Low Dynamic Range (LDR) images. Additionally, althoughthe State-Of-The-Art methods in this topic perform well, they mainlyconcentrate on combining different exposures and have less attention toextracting the informative parts of the images. Thus, this paper aims tointroduce a new model capable of incorporating information from the mostvisible areas of each image extracted by a visual attention module (VAM), whichis a result of a segmentation strategy. In particular, the model, based on adeep learning architecture, utilizes the extracted areas to produce the finalHDR image. The results demonstrate that our method outperformed most of theState-Of-The-Art algorithms.", "output": "High Dynamic Range Imaging via Visual Attention Modules."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The success of automated medical image analysis depends on large-scale andexpert-annotated training sets. Unsupervised domain adaptation (UDA) has beenraised as a promising approach to alleviate the burden of labeled datacollection. However, they generally operate under the closed-set adaptationsetting assuming an identical label set between the source and target domains,which is over-restrictive in clinical practice where new classes commonly existacross datasets due to taxonomic inconsistency. While several methods have beenpresented to tackle both domain shifts and incoherent label sets, none of themtake into account the common characteristics of the two issues and consider thelearning dynamics along network training. In this work, we propose optimizationtrajectory distillation, a unified approach to address the two technicalchallenges from a new perspective. It exploits the low-rank nature of gradientspace and devises a dual-stream distillation algorithm to regularize thelearning dynamics of insufficiently annotated domain and classes with theexternal guidance obtained from reliable sources. Our approach resolves theissue of inadequate navigation along network optimization, which is the majorobstacle in the taxonomy adaptive cross-domain adaptation scenario. We evaluatethe proposed method extensively on several tasks towards various endpoints withclinical and open-world significance. The results demonstrate its effectivenessand improvements over previous methods.", "output": "Taxonomy Adaptive Cross-Domain Adaptation in Medical Imaging via Optimization Trajectory Distillation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Formula-driven supervised learning (FDSL) is a pre-training method thatrelies on synthetic images generated from mathematical formulae such asfractals. Prior work on FDSL has shown that pre-training vision transformers onsuch synthetic datasets can yield competitive accuracy on a wide range ofdownstream tasks. These synthetic images are categorized according to theparameters in the mathematical formula that generate them. In the present work,we hypothesize that the process for generating different instances for the samecategory in FDSL, can be viewed as a form of data augmentation. We validatethis hypothesis by replacing the instances with data augmentation, which meanswe only need a single image per category. Our experiments shows that thisone-instance fractal database (OFDB) performs better than the original datasetwhere instances were explicitly generated. We further scale up OFDB to 21,000categories and show that it matches, or even surpasses, the model pre-trainedon ImageNet-21k in ImageNet-1k fine-tuning. The number of images in OFDB is21k, whereas ImageNet-21k has 14M. This opens new possibilities forpre-training vision transformers with much smaller datasets.", "output": "Pre-training Vision Transformers with Very Limited Synthesized Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Gait, the manner of walking, has been proven to be a reliable biometric withuses in surveillance, marketing and security. A promising new direction for thefield is training gait recognition systems without explicit human annotations,through self-supervised learning approaches. Such methods are heavily relianton strong augmentations for the same walking sequence to induce more datavariability and to simulate additional walking variations. Current dataaugmentation schemes are heuristic and cannot provide the necessary datavariation as they are only able to provide simple temporal and spatialdistortions. In this work, we propose GaitMorph, a novel method to modify thewalking variation for an input gait sequence. Our method entails the trainingof a high-compression model for gait skeleton sequences that leveragesunlabelled data to construct a discrete and interpretable latent space, whichpreserves identity-related features. Furthermore, we propose a method based onoptimal transport theory to learn latent transport maps on the discretecodebook that morph gait sequences between variations. We perform extensiveexperiments and show that our method is suitable to synthesize additional viewsfor an input sequence.", "output": "GaitMorph: Transforming Gait by Optimally Transporting Discrete Codes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Single-frame infrared small target detection is considered to be achallenging task, due to the extreme imbalance between target and background,bounding box regression is extremely sensitive to infrared small targets, andsmall target information is easy to lose in the high-level semantic layer. Inthis paper, we propose an enhancing feature learning network (EFLNet) based onYOLOv7 framework to solve these problems. First, we notice that there is anextremely imbalance between the target and the background in the infraredimage, which makes the model pay more attention to the background features,resulting in missed detection. To address this problem, we propose a newadaptive threshold focal loss function that adjusts the loss weightautomatically, compelling the model to allocate greater attention to targetfeatures. Second, we introduce the normalized Gaussian Wasserstein distance toalleviate the difficulty of model convergence caused by the extreme sensitivityof the bounding box regression to infrared small targets. Finally, weincorporate a dynamic head mechanism into the network to enable adaptivelearning of the relative importance of each semantic layer. Experimentalresults demonstrate our method can achieve better performance in the detectionperformance of infrared small targets compared to state-of-the-artdeep-learning based methods.", "output": "EFLNet: Enhancing Feature Learning for Infrared Small Target Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces vox2vec - a contrastive method for self-supervisedlearning (SSL) of voxel-level representations. vox2vec representations aremodeled by a Feature Pyramid Network (FPN): a voxel representation is aconcatenation of the corresponding feature vectors from different pyramidlevels. The FPN is pre-trained to produce similar representations for the samevoxel in different augmented contexts and distinctive representations fordifferent voxels. This results in unified multi-scale representations thatcapture both global semantics (e.g., body part) and local semantics (e.g.,different small organs or healthy versus tumor tissue). We use vox2vec topre-train a FPN on more than 6500 publicly available computed tomographyimages. We evaluate the pre-trained representations by attaching simple headson top of them and training the resulting models for 22 segmentation tasks. Weshow that vox2vec outperforms existing medical imaging SSL techniques in threeevaluation setups: linear and non-linear probing and end-to-end fine-tuning.Moreover, a non-linear head trained on top of the frozen vox2vecrepresentations achieves competitive performance with the FPN trained fromscratch while having 50 times fewer trainable parameters. The code is availableat .", "output": "vox2vec: A Framework for Self-supervised Contrastive Learning of Voxel-level Representations in Medical Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Point cloud completion aims to recover the complete shape based on a partialobservation. Existing methods require either complete point clouds or multiplepartial observations of the same object for learning. In contrast to previousapproaches, we present Partial2Complete (P2C), the first self-supervisedframework that completes point cloud objects using training samples consistingof only a single incomplete point cloud per object. Specifically, our frameworkgroups incomplete point clouds into local patches as input and predicts maskedpatches by learning prior information from different partial objects. We alsopropose Region-Aware Chamfer Distance to regularize shape mismatch withoutlimiting completion capability, and devise the Normal Consistency Constraint toincorporate a local planarity assumption, encouraging the recovered shapesurface to be continuous and complete. In this way, P2C no longer needsmultiple observations or complete point clouds as ground truth. Instead,structural cues are learned from a category-specific dataset to completepartial point clouds of objects. We demonstrate the effectiveness of ourapproach on both synthetic ShapeNet data and real-world ScanNet data, showingthat P2C produces comparable results to methods trained with complete shapes,and outperforms methods learned with multiple partial observations. Code isavailable at ", "output": "P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "To ensure the reliable use of classification systems in medical applications,it is crucial to prevent silent failures. This can be achieved by eitherdesigning classifiers that are robust enough to avoid failures in the firstplace, or by detecting remaining failures using confidence scoring functions(CSFs). A predominant source of failures in image classification isdistribution shifts between training data and deployment data. To understandthe current state of silent failure prevention in medical imaging, we conductthe first comprehensive analysis comparing various CSFs in four biomedicaltasks and a diverse range of distribution shifts. Based on the result that noneof the benchmarked CSFs can reliably prevent silent failures, we conclude thata deeper understanding of the root causes of failures in the data is required.To facilitate this, we introduce SF-Visuals, an interactive analysis tool thatuses latent space clustering to visualize shifts and failures. On the basis ofvarious examples, we demonstrate how this tool can help researchers gaininsight into the requirements for safe application of classification systems inthe medical domain. The open-source benchmark and tool are at:", "output": "Understanding Silent Failures in Medical Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While the design of blind image quality assessment (IQA) algorithms hasimproved significantly, the distribution shift between the training and testingscenarios often leads to a poor performance of these methods at inference time.This motivates the study of test time adaptation (TTA) techniques to improvetheir performance at inference time. Existing auxiliary tasks and lossfunctions used for TTA may not be relevant for quality-aware adaptation of thepre-trained model. In this work, we introduce two novel quality-relevantauxiliary tasks at the batch and sample levels to enable TTA for blind IQA. Inparticular, we introduce a group contrastive loss at the batch level and arelative rank loss at the sample level to make the model quality aware andadapt to the target data. Our experiments reveal that even using a small batchof images from the test distribution helps achieve significant improvement inperformance by updating the batch normalization statistics of the source model.", "output": "Test Time Adaptation for Blind Image Quality Assessment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Semantic inpainting or image completion alludes to the task of inferringarbitrary large missing regions in images based on image semantics. Since theprediction of image pixels requires an indication of high-level context, thismakes it significantly tougher than image completion, which is often moreconcerned with correcting data corruption and removing entire objects from theinput image. On the other hand, image enhancement attempts to eliminateunwanted noise and blur from the image, along with sustaining most of the imagedetails. Efficient image completion and enhancement model should be able torecover the corrupted and masked regions in images and then refine the imagefurther to increase the quality of the output image. Generative AdversarialNetworks (GAN), have turned out to be helpful in picture completion tasks. Inthis chapter, we will discuss the underlying GAN architecture and how they canbe used used for image completion tasks.", "output": "Semantic Image Completion and Enhancement using GANs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training an image captioner without annotated image-sentence pairs has gainedtraction in recent years. Previous approaches can be categorized into twostrategies: crawling sentences from mismatching corpora and aligning them withthe given images as pseudo annotations, or pre-training the captioner usingexternal image-text pairs. However, the aligning setting seems to reach itsperformance limit due to the quality problem of pairs, and pre-trainingrequires significant computational resources. To address these challenges, wepropose a new strategy ``LPM + retrieval-augmented learning\" where the priorknowledge from large pre-trained models (LPMs) is leveraged as supervision, anda retrieval process is integrated to further reinforce its effectiveness.Specifically, we introduce Retrieval-augmented Pseudo Sentence Generation(RaPSG), which adopts an efficient approach to retrieve highly relevant shortregion descriptions from the mismatching corpora and use them to generate avariety of pseudo sentences with distinct representations as well as highquality via LPMs. In addition, a fluency filter and a CLIP-guided trainingobjective are further introduced to facilitate model optimization. Experimentalresults demonstrate that our method surpasses the SOTA pre-training model(Flamingo3B) by achieving a CIDEr score of 78.1 (+5.1) while utilizing only0.3% of its trainable parameters (1.3B VS 33M). Importantly, our approacheliminates the need of computationally expensive pre-training processes onexternal datasets (e.g., the requirement of 312M image-text pairs forFlamingo3B). We further show that with a simple extension, the generated pseudosentences can be deployed as weak supervision to boost the 1% semi-supervisedimage caption benchmark up to 93.4 CIDEr score (+8.9) which showcases theversatility and effectiveness of our approach.", "output": "Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sign Language Translation (SLT) is a challenging task due to its cross-domainnature, involving the translation of visual-gestural language to text. Manyprevious methods employ an intermediate representation, i.e., gloss sequences,to facilitate SLT, thus transforming it into a two-stage task of sign languagerecognition (SLR) followed by sign language translation (SLT). However, thescarcity of gloss-annotated sign language data, combined with the informationbottleneck in the mid-level gloss representation, has hindered the furtherdevelopment of the SLT task. To address this challenge, we propose a novelGloss-Free SLT based on Visual-Language Pretraining (GFSLT-VLP), which improvesSLT by inheriting language-oriented prior knowledge from pre-trained models,without any gloss annotation assistance. Our approach involves two stages: (i)integrating Contrastive Language-Image Pre-training (CLIP) with maskedself-supervised learning to create pre-tasks that bridge the semantic gapbetween visual and textual representations and restore masked sentences, and(ii) constructing an end-to-end architecture with an encoder-decoder-likestructure that inherits the parameters of the pre-trained Visual Encoder andText Decoder from the first stage. The seamless combination of these noveldesigns forms a robust sign language representation and significantly improvesgloss-free sign language translation. In particular, we have achievedunprecedented improvements in terms of BLEU-4 score on the PHOENIX14T dataset(>+5) and the CSL-Daily dataset (>+3) compared to state-of-the-art gloss-freeSLT methods. Furthermore, our approach also achieves competitive results on thePHOENIX14T dataset when compared with most of the gloss-based methods. Our codeis available at ", "output": "Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "33D-aware face generators are commonly trained on 2D real-life face imagedatasets. Nevertheless, existing facial recognition methods often struggle toextract face data captured from various camera angles. Furthermore, in-the-wildimages with diverse body poses introduce a high-dimensional challenge for3D-aware generators, making it difficult to utilize data that contains completeneck and shoulder regions. Consequently, these face image datasets oftencontain only near-frontal face data, which poses challenges for 3D-aware facegenerators to construct textit{full-head} 3D portraits. To this end, we firstcreate the dataset {$it{360}^{circ}$}-textit{Portrait}-textit{HQ}(textit{$it{360}^{circ}$PHQ}), which consists of high-quality single-viewreal portraits annotated with a variety of camera parameters {(the yaw anglesspan the entire $360^{circ}$ range)} and body poses. We then proposetextit{3DPortraitGAN}, the first 3D-aware full-head portrait generator thatlearns a canonical 3D avatar distribution from the body-pose-varioustextit{$it{360}^{circ}$PHQ} dataset with body pose self-learning. Our modelcan generate view-consistent portrait images from all camera angles(${360}^{circ}$) with a full-head 3D representation. We incorporate amesh-guided deformation field into volumetric rendering to produce deformedresults to generate portrait images that conform to the body pose distributionof the dataset using our canonical generator. We integrate two pose predictorsinto our framework to predict more accurate body poses to address the issue ofinaccurately estimated body poses in our dataset. Our experiments show that theproposed framework can generate view-consistent, realistic portrait images withcomplete geometry from all camera angles and accurately predict portrait bodypose.", "output": "Learning Full-Head 3D GANs from a Single-View Portrait Dataset."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "LiDAR-generated point clouds are crucial for perceiving outdoor environments.The segmentation of point clouds is also essential for many applications.Previous research has focused on using self-attention and convolution (localattention) mechanisms individually in semantic segmentation architectures.However, there is limited work on combining the learned representations ofthese attention mechanisms to improve performance. Additionally, existingresearch that combines convolution with self-attention relies on globalattention, which is not practical for processing large point clouds. To addressthese challenges, this study proposes a new architecture, pCTFusion, whichcombines kernel-based convolutions and self-attention mechanisms for betterfeature learning and capturing local and global dependencies in segmentation.The proposed architecture employs two types of self-attention mechanisms, localand global, based on the hierarchical positions of the encoder blocks.Furthermore, the existing loss functions do not consider the semantic andposition-wise importance of the points, resulting in reduced accuracy,particularly at sharp class boundaries. To overcome this, the study models anovel attention-based loss function called Pointwise Geometric Anisotropy(PGA), which assigns weights based on the semantic distribution of points in aneighborhood. The proposed architecture is evaluated on SemanticKITTI outdoordataset and showed a 5-7% improvement in performance compared to thestate-of-the-art architectures. The results are particularly encouraging forminor classes, often misclassified due to class imbalance, lack of space, andneighbor-aware feature encoding. These developed methods can be leveraged forthe segmentation of complex datasets and can drive real-world applications ofLiDAR point cloud.", "output": "pCTFusion: Point Convolution-Transformer Fusion with Semantic Aware Loss for Outdoor LiDAR Point Cloud Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Knowledge amalgamation (KA) aims to learn a compact student model to handlethe joint objective from multiple teacher models that are are specialized fortheir own tasks respectively. Current methods focus on coarsely aligningteachers and students in the common representation space, making it difficultfor the student to learn the proper decision boundaries from a set ofheterogeneous teachers. Besides, the KL divergence in previous works onlyminimizes the probability distribution difference between teachers and thestudent, ignoring the intrinsic characteristics of teachers. Therefore, wepropose a novel Contrastive Knowledge Amalgamation (CKA) framework, whichintroduces contrastive losses and an alignment loss to achieve intra-classcohesion and inter-class separation.Contrastive losses intra- and inter- modelsare designed to widen the distance between representations of differentclasses. The alignment loss is introduced to minimize the sample-leveldistribution differences of teacher-student models in the common representationspace.Furthermore, the student learns heterogeneous unsupervised classificationtasks through soft targets efficiently and flexibly in the task-levelamalgamation. Extensive experiments on benchmarks demonstrate thegeneralization capability of CKA in the amalgamation of specific task as wellas multiple tasks. Comprehensive ablation studies provide a further insightinto our CKA.", "output": "Contrastive Knowledge Amalgamation for Unsupervised Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Depth-aware panoptic segmentation is an emerging topic in computer visionwhich combines semantic and geometric understanding for more robust sceneinterpretation. Recent works pursue unified frameworks to tackle this challengebut mostly still treat it as two individual learning tasks, which limits theirpotential for exploring cross-domain information. We propose a deeply unifiedframework for depth-aware panoptic segmentation, which performs jointsegmentation and depth estimation both in a per-segment manner with identicalobject queries. To narrow the gap between the two tasks, we further design ageometric query enhancement method, which is able to integrate scene geometryinto object queries using latent representations. In addition, we propose abi-directional guidance learning approach to facilitate cross-task featurelearning by taking advantage of their mutual relations. Our method sets the newstate of the art for depth-aware panoptic segmentation on both Cityscapes-DVPSand SemKITTI-DVPS datasets. Moreover, our guidance learning approach is shownto deliver performance improvement even under incomplete supervision labels.", "output": "Towards Deeply Unified Depth-aware Panoptic Segmentation with Bi-directional Guidance Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Residual connections have been proposed as architecture-based inductive biasto mitigate the problem of exploding and vanishing gradients and increase taskperformance in both feed-forward and recurrent networks (RNNs) when trainedwith the backpropagation algorithm. Yet, little is known about how residualconnections in RNNs influence their dynamics and fading memory properties.Here, we introduce weakly coupled residual recurrent networks (WCRNNs) in whichresidual connections result in well-defined Lyapunov exponents and allow forstudying properties of fading memory. We investigate how the residualconnections of WCRNNs influence their performance, network dynamics, and memoryproperties on a set of benchmark tasks. We show that several distinct forms ofresidual connections yield effective inductive biases that result in increasednetwork expressivity. In particular, residual connections that (i) result innetwork dynamics at the proximity of the edge of chaos, (ii) allow networks tocapitalize on characteristic spectral properties of the data, and (iii) resultin heterogeneous memory properties are shown to increase practicalexpressivity. In addition, we demonstrate how our results can be extended tonon-linear residuals and introduce a weakly coupled residual initializationscheme that can be used for Elman RNNs", "output": "Fading memory as inductive bias in residual recurrent networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Fine-grained classification is a particular case of a classification problem,aiming to classify objects that share the visual appearance and can only bedistinguished by subtle differences. Fine-grained classification models areoften deployed to determine animal species or individuals in automated animalmonitoring systems. Precise visual explanations of the model's decision arecrucial to analyze systematic errors. Attention- or gradient-based methods arecommonly used to identify regions in the image that contribute the most to theclassification decision. These methods deliver either too coarse or too noisyexplanations, unsuitable for identifying subtle visual differences reliably.However, perturbation-based methods can precisely identify pixels causallyresponsible for the classification result. Fill-in of the dropout (FIDO)algorithm is one of those methods. It utilizes the concrete dropout (CD) tosample a set of attribution masks and updates the sampling parameters based onthe output of the classification model. A known problem of the algorithm is ahigh variance in the gradient estimates, which the authors have mitigated untilnow by mini-batch updates of the sampling parameters. This paper presents asolution to circumvent these computational instabilities by simplifying the CDsampling and reducing reliance on large mini-batch sizes. First, it allowsestimating the parameters with smaller mini-batch sizes without losing thequality of the estimates but with a reduced computational effort. Furthermore,our solution produces finer and more coherent attribution masks. Finally, weuse the resulting attribution masks to improve the classification performanceof a trained model without additional fine-tuning of the model.", "output": "Simplified Concrete Dropout -- Improving the Generation of Attribution Masks for Fine-grained Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Purpose:Chest X-ray (CXR) is an essential tool and one of the most prescribedimaging to detect pulmonary abnormalities, with a yearly estimate of over 2billion imaging performed worldwide. However, the accurate and timely diagnosisof TB remains an unmet goal. The prevalence of TB is highest inlow-middle-income countries, and the requirement of a portable, automated, andreliable solution is required. In this study, we compared the performance ofDL-based devices on digital and analog CXR. The evaluated DL-based device canbe used in resource-constraint settings. Methods: A total of 10,000 CXRDICOMs(.dcm) and printed photos of the films acquired with three differentcellular phones - Samsung S8, iPhone 8, and iPhone XS along with theirradiological report were retrospectively collected from various sites acrossIndia from April 2020 to March 2021. Results: 10,000 chest X-rays were utilizedto evaluate the DL-based device in identifying radiological signs of TB. TheAUC of qXR for detecting signs of tuberculosis on the original DICOMs datasetwas 0.928 with a sensitivity of 0.841 at a specificity of 0.806. At an optimalthreshold, the difference in the AUC of three cellular smartphones with theoriginal DICOMs is 0.024 (2.55%), 0.048 (5.10%), and 0.038 (1.91%). The minimumdifference demonstrates the robustness of the DL-based device in identifyingradiological signs of TB in both digital and analog CXR.", "output": "Comparative Evaluation of Digital and Analog Chest Radiographs to Identify Tuberculosis using Deep Learning Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Advanced image tampering techniques are increasingly challenging thetrustworthiness of multimedia, leading to the development of Image ManipulationLocalization (IML). But what makes a good IML model? The answer lies in the wayto capture artifacts. Exploiting artifacts requires the model to extractnon-semantic discrepancies between the manipulated and authentic regions, whichneeds to compare differences between these two areas explicitly. With theself-attention mechanism, naturally, the Transformer is the best candidate.Besides, artifacts are sensitive to image resolution, amplified undermulti-scale features, and massive at the manipulation border. Therefore, weformulate the answer to the former question as building a ViT withhigh-resolution capacity, multi-scale feature extraction capability, andmanipulation edge supervision. We term this simple but effective ViT paradigmas the IML-ViT, which has great potential to become a new benchmark for IML.Extensive experiments on five benchmark datasets verified our model outperformsthe state-of-the-art manipulation localization methods. Code and models areavailable at url{", "output": "IML-ViT: Image Manipulation Localization by Vision Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work presents a new unsupervised framework for training deep learningmodels for super-resolution of Sentinel-2 images by fusion of its 10-m and 20-mbands. The proposed scheme avoids the resolution downgrade process needed togenerate training data in the supervised case. On the other hand, a proper lossthat accounts for cycle-consistency between the network prediction and theinput components to be fused is proposed. Despite its unsupervised nature, inour preliminary experiments the proposed scheme has shown promising results incomparison to the supervised approach. Besides, by construction of the proposedloss, the resulting trained network can be ascribed to the class ofmulti-resolution analysis methods.", "output": "A full-resolution training framework for Sentinel-2 image fusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training an effective video action recognition model poses significantcomputational challenges, particularly under limited resource budgets. Currentmethods primarily aim to either reduce model size or utilize pre-trainedmodels, limiting their adaptability to various backbone architectures. Thispaper investigates the issue of over-sampled frames, a prevalent problem inmany approaches yet it has received relatively little attention. Despite theuse of fewer frames being a potential solution, this approach often results ina substantial decline in performance. To address this issue, we propose a novelmethod to restore the intermediate features for two sparsely sampled andadjacent video frames. This feature restoration technique brings a negligibleincrease in computational requirements compared to resource-intensive imageencoders, such as ViT. To evaluate the effectiveness of our method, we conductextensive experiments on four public datasets, including Kinetics-400,ActivityNet, UCF-101, and HMDB-51. With the integration of our method, theefficiency of three commonly used baselines has been improved by over 50%, witha mere 0.5% reduction in recognition accuracy. In addition, our method alsosurprisingly helps improve the generalization ability of the models underzero-shot settings.", "output": "Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurate 3D human pose estimation (3D HPE) is crucial for enabling autonomousvehicles (AVs) to make informed decisions and respond proactively in criticalroad scenarios. Promising results of 3D HPE have been gained in several domainssuch as human-computer interaction, robotics, sports and medical analytics,often based on data collected in well-controlled laboratory environments.Nevertheless, the transfer of 3D HPE methods to AVs has received limitedresearch attention, due to the challenges posed by obtaining accurate 3D poseannotations and the limited suitability of data from other domains.We present a simple yet efficient weakly supervised approach for 3D HPE inthe AV context by employing a high-level sensor fusion between camera and LiDARdata. The weakly supervised setting enables training on the target datasetswithout any 2D/3D keypoint labels by using an off-the-shelf 2D joint extractorand pseudo labels generated from LiDAR to image projections. Our approachoutperforms state-of-the-art results by up to $sim$ 13% on the Waymo OpenDataset in the weakly supervised setting and achieves state-of-the-art resultsin the supervised setting.", "output": "Weakly Supervised Multi-Modal 3D Human Body Pose Estimation for Autonomous Driving."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Self-supervised learning is popular method because of its ability to learnfeatures in images without using its labels and is able to overcome limitedlabeled datasets used in supervised learning. Self-supervised learning works byusing a pretext task which will be trained on the model before being applied toa specific task. There are some examples of pretext tasks used inself-supervised learning in the field of image recognition, namely rotationprediction, solving jigsaw puzzles, and predicting relative positions on image.Previous studies have only used one type of transformation as a pretext task.This raises the question of how it affects if more than one pretext task isused and to use a gating network to combine all pretext tasks. Therefore, wepropose the Gated Self-Supervised Learning method to improve imageclassification which use more than one transformation as pretext task and usesthe Mixture of Expert architecture as a gating network in combining eachpretext task so that the model automatically can study and focus more on themost useful augmentations for classification. We test performance of theproposed method in several scenarios, namely CIFAR imbalance datasetclassification, adversarial perturbations, Tiny-Imagenet datasetclassification, and semi-supervised learning. Moreover, there are Grad-CAM andT-SNE analysis that are used to see the proposed method for identifyingimportant features that influence image classification and representing datafor each class and separating different classes properly. Our code is in", "output": "Mixture of Self-Supervised Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent surge of foundation models in computer vision and natural languageprocessing opens up perspectives in utilizing multi-modal clinical data totrain large models with strong generalizability. Yet pathological imagedatasets often lack biomedical text annotation and enrichment. Guidingdata-efficient image diagnosis from the use of biomedical text knowledgebecomes a substantial interest. In this paper, we propose to Connect Image andText Embeddings (CITE) to enhance pathological image classification. CITEinjects text insights gained from language models pre-trained with a broadrange of biomedical texts, leading to adapt foundation models towardspathological image understanding. Through extensive experiments on thePatchGastric stomach tumor pathological image dataset, we demonstrate that CITEachieves leading performance compared with various baselines especially whentraining data is scarce. CITE offers insights into leveraging in-domain textknowledge to reinforce data-efficient pathological image classification. Codeis available at ", "output": "Text-guided Foundation Model Adaptation for Pathological Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human tissue and its constituent cells form a microenvironment that isfundamentally three-dimensional (3D). However, the standard-of-care inpathologic diagnosis involves selecting a few two-dimensional (2D) sections formicroscopic evaluation, risking sampling bias and misdiagnosis. Diverse methodsfor capturing 3D tissue morphologies have been developed, but they have yet hadlittle translation to clinical practice; manual and computational evaluationsof such large 3D data have so far been impractical and/or unable to providepatient-level clinical insights. Here we present Modality-Agnostic Multipleinstance learning for volumetric Block Analysis (MAMBA), a deep-learning-basedplatform for processing 3D tissue images from diverse imaging modalities andpredicting patient outcomes. Archived prostate cancer specimens were imagedwith open-top light-sheet microscopy or microcomputed tomography and theresulting 3D datasets were used to train risk-stratification networks based on5-year biochemical recurrence outcomes via MAMBA. With the 3D block-basedapproach, MAMBA achieves an area under the receiver operating characteristiccurve (AUC) of 0.86 and 0.74, superior to 2D traditional single-slice-basedprognostication (AUC of 0.79 and 0.57), suggesting superior prognosticationwith 3D morphological features. Further analyses reveal that the incorporationof greater tissue volume improves prognostic performance and mitigates riskprediction variability from sampling bias, suggesting the value of capturinglarger extents of heterogeneous 3D morphology. With the rapid growth andadoption of 3D spatial biology and pathology techniques by researchers andclinicians, MAMBA provides a general and efficient framework for 3D weaklysupervised learning for clinical decision support and can help to reveal novel3D morphological biomarkers for prognosis and therapeutic response.", "output": "Weakly Supervised AI for Efficient Analysis of 3D Pathology Samples."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual AI systems are vulnerable to natural and synthetic physical corruptionin the real-world. Such corruption often arises unexpectedly and alters themodel's performance. In recent years, the primary focus has been on adversarialattacks. However, natural corruptions (e.g., snow, fog, dust) are anomnipresent threat to visual AI systems and should be considered equallyimportant. Many existing works propose interesting solutions to train robustmodels against natural corruption. These works either leverage imageaugmentations, which come with the additional cost of model training, or placesuspicious patches in the scene to design unadversarial examples. In this work,we propose the idea of naturalistic support artifacts (NSA) for robustprediction. The NSAs are shown to be beneficial in scenarios where modelparameters are inaccessible and adding artifacts in the scene is feasible. TheNSAs are natural looking objects generated through artifact training usingDC-GAN to have high visual fidelity in the scene. We test against naturalcorruptions on the Imagenette dataset and observe the improvement in predictionconfidence score by four times. We also demonstrate NSA's capability toincrease adversarial accuracy by 8% on average. Lastly, we qualitativelyanalyze NSAs using saliency maps to understand how they help improve predictionconfidence.", "output": "NSA: Naturalistic Support Artifact to Boost Network Confidence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The demand for efficient 3D model generation techniques has grownexponentially, as manual creation of 3D models is time-consuming and requiresspecialized expertise. While generative models have shown potential in creating3D textured shapes from 2D images, their applicability in 3D industries islimited due to the lack of a well-defined camera distribution in real-worldscenarios, resulting in low-quality shapes. To overcome this limitation, wepropose GET3D--, the first method that directly generates textured 3D shapesfrom 2D images with unknown pose and scale. GET3D-- comprises a 3D shapegenerator and a learnable camera sampler that captures the 6D external changeson the camera. In addition, We propose a novel training schedule to stablyoptimize both the shape generator and camera sampler in a unified framework. Bycontrolling external variations using the learnable camera sampler, our methodcan generate aligned shapes with clear textures. Extensive experimentsdemonstrate the efficacy of GET3D--, which precisely fits the 6D camera posedistribution and generates high-quality shapes on both synthetic and realisticunconstrained datasets.", "output": "GET3D--: Learning GET3D from Unconstrained Image Collections."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the medical field, federated learning commonly deals with highlyimbalanced datasets, including skin lesions and gastrointestinal images.Existing federated methods under highly imbalanced datasets primarily focus onoptimizing a global model without incorporating the intra-class variations thatcan arise in medical imaging due to different populations, findings, andscanners. In this paper, we study the inter-client intra-class variations withpublicly available self-supervised auxiliary networks. Specifically, we findthat employing a shared auxiliary pre-trained model, like MoCo-V2, locally onevery client yields consistent divergence measurements. Based on thesefindings, we derive a dynamic balanced model aggregation via self-supervisedpriors (MAS) to guide the global model optimization. Fed-MAS can be utilizedwith different local learning methods for effective model aggregation toward ahighly robust and unbiased global model. Our code is available aturl{", "output": "Federated Model Aggregation via Self-Supervised Priors for Highly Imbalanced Medical Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the overwhelming trend of mask image modeling led by MAE, generativepre-training has shown a remarkable potential to boost the performance offundamental models in 2D vision. However, in 3D vision, the over-reliance onTransformer-based backbones and the unordered nature of point clouds haverestricted the further development of generative pre-training. In this paper,we propose a novel 3D-to-2D generative pre-training method that is adaptable toany point cloud model. We propose to generate view images from differentinstructed poses via the cross-attention mechanism as the pre-training scheme.Generating view images has more precise supervision than its point cloudcounterpart, thus assisting 3D backbones to have a finer comprehension of thegeometrical structure and stereoscopic relations of the point cloud.Experimental results have proved the superiority of our proposed 3D-to-2Dgenerative pre-training over previous pre-training methods. Our method is alsoeffective in boosting the performance of architecture-oriented approaches,achieving state-of-the-art performance when fine-tuning on ScanObjectNNclassification and ShapeNetPart segmentation tasks. Code is available at", "output": "Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Simulating camera sensors is a crucial task in autonomous driving. Althoughneural radiance fields are exceptional at synthesizing photorealistic views indriving simulations, they still fail in generating extrapolated views. Thispaper proposes to incorporate map priors into neural radiance fields tosynthesize out-of-trajectory driving views with semantic road consistency. Thekey insight is that map information can be utilized as a prior to guide thetraining of the radiance fields with uncertainty. Specifically, we utilize thecoarse ground surface as uncertain information to supervise the density fieldand warp depth with uncertainty from unknown camera poses to ensure multi-viewconsistency. Experimental results demonstrate that our approach can producesemantic consistency in deviated views for vehicle camera simulation.", "output": "MapNeRF: Incorporating Map Priors into Neural Radiance Fields for Driving View Simulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the increased deployment of machine learning models in variousreal-world applications, researchers and practitioners alike have emphasizedthe need for explanations of model behaviour. To this end, two broad strategieshave been outlined in prior literature to explain models. Post hoc explanationmethods explain the behaviour of complex black-box models by highlightingfeatures that are critical to model predictions; however, prior work has shownthat these explanations may not be faithful, and even more concerning is ourinability to verify them. Specifically, it is nontrivial to evaluate if a givenattribution is correct with respect to the underlying model. Inherentlyinterpretable models, on the other hand, circumvent these issues by explicitlyencoding explanations into model architecture, meaning their explanations arenaturally faithful and verifiable, but they often exhibit poor predictiveperformance due to their limited expressive power. In this work, we aim tobridge the gap between the aforementioned strategies by proposing VerifiabilityTuning (VerT), a method that transforms black-box models into models thatnaturally yield faithful and verifiable feature attributions. We begin byintroducing a formal theoretical framework to understand verifiability and showthat attributions produced by standard models cannot be verified. We thenleverage this framework to propose a method to build verifiable models andfeature attributions out of fully trained black-box models. Finally, we performextensive experiments on semi-synthetic and real-world datasets, and show thatVerT produces models that (1) yield explanations that are correct andverifiable and (2) are faithful to the original black-box models they are meantto explain.", "output": "Verifiable Feature Attributions: A Bridge between Post Hoc Explainability and Inherent Interpretability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Google's Bard has emerged as a formidable competitor to OpenAI's ChatGPT inthe field of conversational AI. Notably, Bard has recently been updated tohandle visual inputs alongside text prompts during conversations. Given Bard'simpressive track record in handling textual inputs, we explore its capabilitiesin understanding and interpreting visual data (images) conditioned by textquestions. This exploration holds the potential to unveil new insights andchallenges for Bard and other forthcoming multi-modal Generative models,especially in addressing complex computer vision problems that demand accuratevisual and language understanding. Specifically, in this study, we focus on 15diverse task scenarios encompassing regular, camouflaged, medical, under-waterand remote sensing data to comprehensively evaluate Bard's performance. Ourprimary finding indicates that Bard still struggles in these vision scenarios,highlighting the significant gap in vision-based understanding that needs to bebridged in future developments. We expect that this empirical study will provevaluable in advancing future models, leading to enhanced capabilities incomprehending and interpreting fine-grained visual data. Our project isreleased on ", "output": "How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deepfake detection methods have shown promising results in recognizingforgeries within a given dataset, where training and testing take place on thein-distribution dataset. However, their performance deteriorates significantlywhen presented with unseen samples. As a result, a reliable deepfake detectionsystem must remain impartial to forgery types, appearance, and quality forguaranteed generalizable detection performance. Despite various attempts toenhance cross-dataset generalization, the problem remains challenging,particularly when testing against common post-processing perturbations, such asvideo compression or blur. Hence, this study introduces a deepfake detectionframework, leveraging a self-supervised pre-training model that deliversexceptional generalization ability, withstanding common corruptions andenabling feature explainability. The framework comprises three key components:a feature extractor based on vision Transformer architecture that ispre-trained via self-supervised contrastive learning methodology, a graphconvolution network coupled with a Transformer discriminator, and a graphTransformer relevancy map that provides a better understanding of manipulatedregions and further explains the model's decision. To assess the effectivenessof the proposed framework, several challenging experiments are conducted,including in-data distribution performance, cross-dataset, cross-manipulationgeneralization, and robustness against common post-production perturbations.The results achieved demonstrate the remarkable effectiveness of the proposeddeepfake detection framework, surpassing the current state-of-the-artapproaches.", "output": "Self-Supervised Graph Transformer for Deepfake Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Inspired by deep convolution segmentation algorithms, scene text detectorsbreak the performance ceiling of datasets steadily. However, these methodsoften encounter threshold selection bottlenecks and have poor performance ontext instances with extreme aspect ratios. In this paper, we propose toautomatically learn the discriminate segmentation threshold, whichdistinguishes text pixels from background pixels for segmentation-based scenetext detectors and then further reduces the time-consuming manual parameteradjustment. Besides, we design a Global-information Enhanced Feature PyramidNetwork (GE-FPN) for capturing text instances with macro size and extremeaspect ratios. Following the GE-FPN, we introduce a cascade optimizationstructure to further refine the text instances. Finally, together with theproposed threshold learning strategy and text detection structure, we design anAdaptive Segmentation Network (ASNet) for scene text detection. Extensiveexperiments are carried out to demonstrate that the proposed ASNet can achievethe state-of-the-art performance on four text detection benchmarks, i.e., ICDAR2015, MSRA-TD500, ICDAR 2017 MLT and CTW1500. The ablation experiments alsoverify the effectiveness of our contributions.", "output": "Adaptive Segmentation Network for Scene Text Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent inversion methods have shown that real images can be inverted intoStyleGAN's latent space and numerous edits can be achieved on those imagesthanks to the semantically rich feature representations of well-trained GANmodels. However, extensive research has also shown that image inversion ischallenging due to the trade-off between high-fidelity reconstruction andeditability. In this paper, we tackle an even more difficult task, invertingerased images into GAN's latent space for realistic inpaintings and editings.Furthermore, by augmenting inverted latent codes with different latent samples,we achieve diverse inpaintings. Specifically, we propose to learn an encoderand mixing network to combine encoded features from erased images withStyleGAN's mapped features from random samples. To encourage the mixing networkto utilize both inputs, we train the networks with generated data via a novelset-up. We also utilize higher-rate features to prevent color inconsistenciesbetween the inpainted and unerased parts. We run extensive experiments andcompare our method with state-of-the-art inversion and inpainting methods.Qualitative metrics and visual comparisons show significant improvements.", "output": "Diverse Inpainting and Editing with GAN Inversion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The gradual nature of a diffusion process that synthesizes samples in smallincrements constitutes a key ingredient of Denoising Diffusion ProbabilisticModels (DDPM), which have presented unprecedented quality in image synthesisand been recently explored in the motion domain. In this work, we propose toadapt the gradual diffusion concept (operating along a diffusion time-axis)into the temporal-axis of the motion sequence. Our key idea is to extend theDDPM framework to support temporally varying denoising, thereby entangling thetwo axes. Using our special formulation, we iteratively denoise a motion bufferthat contains a set of increasingly-noised poses, which auto-regressivelyproduces an arbitrarily long stream of frames. With a stationary diffusiontime-axis, in each diffusion step we increment only the temporal-axis of themotion such that the framework produces a new, clean frame which is removedfrom the beginning of the buffer, followed by a newly drawn noise vector thatis appended to it. This new mechanism paves the way towards a new framework forlong-term motion synthesis with applications to character animation and otherdomains.", "output": "TEDi: Temporally-Entangled Diffusion for Long-Term Motion Synthesis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Handwriting recognition is a challenging and critical problem in the fieldsof pattern recognition and machine learning, with applications spanning a widerange of domains. In this paper, we focus on the specific issue of recognizingoffline Arabic handwritten text. Existing approaches typically utilize acombination of convolutional neural networks for image feature extraction andrecurrent neural networks for temporal modeling, with connectionist temporalclassification used for text generation. However, these methods suffer from alack of parallelization due to the sequential nature of recurrent neuralnetworks. Furthermore, these models cannot account for linguistic rules,necessitating the use of an external language model in the post-processingstage to boost accuracy. To overcome these issues, we introduce two alternativearchitectures, namely the Transformer Transducer and the standardsequence-to-sequence Transformer, and compare their performance in terms ofaccuracy and speed. Our approach can model language dependencies and reliesonly on the attention mechanism, thereby making it more parallelizable and lesscomplex. We employ pre-trained Transformers for both image understanding andlanguage modeling. Our evaluation on the Arabic KHATT dataset demonstrates thatour proposed method outperforms the current state-of-the-art approaches forrecognizing offline Arabic handwritten text.", "output": "A Transformer-based Approach for Arabic Offline Handwritten Text Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Prompt tuning and adapter tuning have shown great potential in transferringpre-trained vision-language models (VLMs) to various downstream tasks. In thiswork, we design a new type of tuning method, termed as regularized mask tuning,which masks the network parameters through a learnable selection. Inspired byneural pathways, we argue that the knowledge required by a downstream taskalready exists in the pre-trained weights but just gets concealed in theupstream pre-training stage. To bring the useful knowledge back into light, wefirst identify a set of parameters that are important to a given downstreamtask, then attach a binary mask to each parameter, and finally optimize thesemasks on the downstream data with the parameters frozen. When updating themask, we introduce a novel gradient dropout strategy to regularize theparameter selection, in order to prevent the model from forgetting oldknowledge and overfitting the downstream data. Experimental results on 11datasets demonstrate the consistent superiority of our method over previousalternatives. It is noteworthy that we manage to deliver 18.73% performanceimprovement compared to the zero-shot CLIP via masking an average of only 2.56%parameters. Furthermore, our method is synergistic with most existingparameter-efficient tuning methods and can boost the performance on top ofthem. Project page can be found here (", "output": "Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Inferring the depth of transparent or mirror (ToM) surfaces represents a hardchallenge for either sensors, algorithms, or deep networks. We propose a simplepipeline for learning to estimate depth properly for such surfaces with neuralnetworks, without requiring any ground-truth annotation. We unveil how toobtain reliable pseudo labels by in-painting ToM objects in images andprocessing them with a monocular depth estimation model. These labels can beused to fine-tune existing monocular or stereo networks, to let them learn howto deal with ToM surfaces. Experimental results on the Booster dataset show thedramatic improvements enabled by our remarkably simple proposal.", "output": "Learning Depth Estimation for Transparent and Mirror Surfaces."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce PointOdyssey, a large-scale synthetic dataset, and datageneration framework, for the training and evaluation of long-term fine-grainedtracking algorithms. Our goal is to advance the state-of-the-art by placingemphasis on long videos with naturalistic motion. Toward the goal ofnaturalism, we animate deformable characters using real-world motion capturedata, we build 3D scenes to match the motion capture environments, and werender camera viewpoints using trajectories mined via structure-from-motion onreal videos. We create combinatorial diversity by randomizing characterappearance, motion profiles, materials, lighting, 3D assets, and atmosphericeffects. Our dataset currently includes 104 videos, averaging 2,000 frameslong, with orders of magnitude more correspondence annotations than prior work.We show that existing methods can be trained from scratch in our dataset andoutperform the published variants. Finally, we introduce modifications to thePIPs point tracking method, greatly widening its temporal receptive field,which improves its performance on PointOdyssey as well as on two real-worldbenchmarks. Our data and code are publicly available at:", "output": "PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Nowadays, autonomous cars can drive smoothly in ordinary cases, and it iswidely recognized that realistic sensor simulation will play a critical role insolving remaining corner cases by simulating them. To this end, we propose anautonomous driving simulator based upon neural radiance fields (NeRFs).Compared with existing works, ours has three notable features: (1)Instance-aware. Our simulator models the foreground instances and backgroundenvironments separately with independent networks so that the static (e.g.,size and appearance) and dynamic (e.g., trajectory) properties of instances canbe controlled separately. (2) Modular. Our simulator allows flexible switchingbetween different modern NeRF-related backbones, sampling strategies, inputmodalities, etc. We expect this modular design to boost academic progress andindustrial deployment of NeRF-based autonomous driving simulation. (3)Realistic. Our simulator set new state-of-the-art photo-realism results giventhe best module selection. Our simulator will be open-sourced while most of ourcounterparts are not. Project page: ", "output": "MARS: An Instance-aware, Modular and Realistic Simulator for Autonomous Driving."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurate depth estimation under out-of-distribution (OoD) scenarios, such asadverse weather conditions, sensor failure, and noise contamination, isdesirable for safety-critical applications. Existing depth estimation systems,however, suffer inevitably from real-world corruptions and perturbations andare struggled to provide reliable depth predictions under such cases. In thispaper, we summarize the winning solutions from the RoboDepth Challenge -- anacademic competition designed to facilitate and advance robust OoD depthestimation. This challenge was developed based on the newly established KITTI-Cand NYUDepth2-C benchmarks. We hosted two stand-alone tracks, with an emphasison robust self-supervised and robust fully-supervised depth estimation,respectively. Out of more than two hundred participants, nine unique andtop-performing solutions have appeared, with novel designs ranging from thefollowing aspects: spatial- and frequency-domain augmentations, masked imagemodeling, image restoration and super-resolution, adversarial training,diffusion-based noise suppression, vision-language pre-training, learned modelensembling, and hierarchical feature enhancement. Extensive experimentalanalyses along with insightful observations are drawn to better understand therationale behind each design. We hope this challenge could lay a solidfoundation for future research on robust and reliable depth estimation andbeyond. The datasets, competition toolkit, workshop recordings, and source codefrom the winning teams are publicly available on the challenge website.", "output": "The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The goal of Online Domain Adaptation for semantic segmentation is to handleunforeseeable domain changes that occur during deployment, like sudden weatherevents. However, the high computational costs associated with brute-forceadaptation make this paradigm unfeasible for real-world applications. In thispaper we propose HAMLET, a Hardware-Aware Modular Least Expensive Trainingframework for real-time domain adaptation. Our approach includes ahardware-aware back-propagation orchestration agent (HAMT) and a dedicateddomain-shift detector that enables active control over when and how the modelis adapted (LT). Thanks to these advancements, our approach is capable ofperforming semantic segmentation while simultaneously adapting at more than29FPS on a single consumer-grade GPU. Our framework's encouraging accuracy andspeed trade-off is demonstrated on OnDA and SHIFT benchmarks throughexperimental results.", "output": "To Adapt or Not to Adapt? Real-Time Adaptation for Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Acoustic matching aims to re-synthesize an audio clip to sound as if it wererecorded in a target acoustic environment. Existing methods assume access topaired training data, where the audio is observed in both source and targetenvironments, but this limits the diversity of training data or requires theuse of simulated data or heuristics to create paired samples. We propose aself-supervised approach to visual acoustic matching where training samplesinclude only the target scene image and audio -- without acousticallymismatched source audio for reference. Our approach jointly learns todisentangle room acoustics and re-synthesize audio into the target environment,via a conditional GAN framework and a novel metric that quantifies the level ofresidual acoustic information in the de-biased audio. Training with eitherin-the-wild web data or simulated data, we demonstrate it outperforms thestate-of-the-art on multiple challenging datasets and a wide variety ofreal-world audio and environments.", "output": "Self-Supervised Visual Acoustic Matching."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data augmentation is a powerful technique to improve performance inapplications such as image and text classification tasks. Yet, there is littlerigorous understanding of why and how various augmentations work. In this work,we consider a family of linear transformations and study their effects on theridge estimator in an over-parametrized linear regression setting. First, weshow that transformations that preserve the labels of the data can improveestimation by enlarging the span of the training data. Second, we show thattransformations that mix data can improve estimation by playing aregularization effect. Finally, we validate our theoretical insights on MNIST.Based on the insights, we propose an augmentation scheme that searches over thespace of transformations by how uncertain the model is about the transformeddata. We validate our proposed scheme on image and text datasets. For example,our method outperforms random sampling methods by 1.24% on CIFAR-100 usingWide-ResNet-28-10. Furthermore, we achieve comparable accuracy to the SoTAAdversarial AutoAugment on CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets.", "output": "On the Generalization Effects of Linear Transformations in Data Augmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generating 3D dances from music is an emerged research task that benefits alot of applications in vision and graphics. Previous works treat this task assequence generation, however, it is challenging to render a music-alignedlong-term sequence with high kinematic complexity and coherent movements. Inthis paper, we reformulate it by a two-stage process, ie, a key pose generationand then an in-between parametric motion curve prediction, where the key posesare easier to be synchronized with the music beats and the parametric curvescan be efficiently regressed to render fluent rhythm-aligned movements. Wenamed the proposed method as DanceFormer, which includes two cascadingkinematics-enhanced transformer-guided networks (called DanTrans) that tackleeach stage, respectively. Furthermore, we propose a large-scale musicconditioned 3D dance dataset, called PhantomDance, that is accurately labeledby experienced animators rather than reconstruction or motion capture. Thisdataset also encodes dances as key poses and parametric motion curves apartfrom pose sequences, thus benefiting the training of our DanceFormer. Extensiveexperiments demonstrate that the proposed method, even trained by existingdatasets, can generate fluent, performative, and music-matched 3D dances thatsurpass previous works quantitatively and qualitatively. Moreover, the proposedDanceFormer, together with the PhantomDance dataset( are seamlessly compatible withindustrial animation software, thus facilitating the adaptation for variousdownstream applications.", "output": "DanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robust detection of vulnerable road users is a safety critical requirementfor the deployment of autonomous vehicles in heterogeneous traffic. One of themost complex outstanding challenges is that of partial occlusion where a targetobject is only partially available to the sensor due to obstruction by anotherforeground object. A number of leading pedestrian detection benchmarks provideannotation for partial occlusion, however each benchmark varies greatly intheir definition of the occurrence and severity of occlusion. Recent researchdemonstrates that a high degree of subjectivity is used to classify occlusionlevel in these cases and occlusion is typically categorized into 2 to 3 broadcategories such as partially and heavily occluded. This can lead to inaccurateor inconsistent reporting of pedestrian detection model performance dependingon which benchmark is used. This research introduces a novel, objectivebenchmark for partially occluded pedestrian detection to facilitate theobjective characterization of pedestrian detection models. Characterization iscarried out on seven popular pedestrian detection models for a range ofocclusion levels from 0-99%, in order to demonstrate the efficacy and increasedanalysis capabilities of the proposed characterization method. Resultsdemonstrate that pedestrian detection performance degrades, and the number offalse negative detections increase as pedestrian occlusion level increases. Ofthe seven popular pedestrian detection routines characterized, CenterNet hasthe greatest overall performance, followed by SSDlite. RetinaNet has the lowestoverall detection performance across the range of occlusion levels.", "output": "The Impact of Partial Occlusion on Pedestrian Detectability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One powerful paradigm in visual navigation is to predict actions fromobservations directly. Training such an end-to-end system allowsrepresentations useful for downstream tasks to emerge automatically. However,the lack of inductive bias makes this system data inefficient. We hypothesize asufficient representation of the current view and the goal view for anavigation policy can be learned by predicting the location and size of a cropof the current view that corresponds to the goal. We further show that trainingsuch random crop prediction in a self-supervised fashion purely on syntheticnoise images transfers well to natural home images. The learned representationcan then be bootstrapped to learn a navigation policy efficiently with littleinteraction data. The code is available at ", "output": "Visual Pre-training for Navigation: What Can We Learn from Noise?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we address panoramic semantic segmentation which isunder-explored due to two critical challenges: (1) image distortions and objectdeformations on panoramas; (2) lack of semantic annotations in the 360-degreeimagery. To tackle these problems, first, we propose the upgraded Transformerfor Panoramic Semantic Segmentation, i.e., Trans4PASS+, equipped withDeformable Patch Embedding (DPE) and Deformable MLP (DMLPv2) modules forhandling object deformations and image distortions whenever (before or afteradaptation) and wherever (shallow or deep levels). Second, we enhance theMutual Prototypical Adaptation (MPA) strategy via pseudo-label rectificationfor unsupervised domain adaptive panoramic segmentation. Third, aside fromPinhole-to-Panoramic (Pin2Pan) adaptation, we create a new dataset (SynPASS)with 9,080 panoramic images, facilitating Synthetic-to-Real (Syn2Real)adaptation scheme in 360-degree imagery. Extensive experiments are conducted,which cover indoor and outdoor scenarios, and each of them is investigated withPin2Pan and Syn2Real regimens. Trans4PASS+ achieves state-of-the-artperformances on four domain adaptive panoramic semantic segmentationbenchmarks. Code is available at ", "output": "Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Image editing and compositing have become ubiquitous in entertainment, fromdigital art to AR and VR experiences. To produce beautiful composites, thecamera needs to be geometrically calibrated, which can be tedious and requiresa physical calibration target. In place of the traditional multi-imagecalibration process, we propose to infer the camera calibration parameters suchas pitch, roll, field of view, and lens distortion directly from a single imageusing a deep convolutional neural network. We train this network usingautomatically generated samples from a large-scale panorama dataset, yieldingcompetitive accuracy in terms of standard `2 error. However, we argue thatminimizing such standard error metrics might not be optimal for manyapplications. In this work, we investigate human sensitivity to inaccuracies ingeometric camera calibration. To this end, we conduct a large-scale humanperception study where we ask participants to judge the realism of 3D objectscomposited with correct and biased camera calibration parameters. Based on thisstudy, we develop a new perceptual measure for camera calibration anddemonstrate that our deep calibration network outperforms previous single-imagebased calibration methods both on standard metrics as well as on this novelperceptual measure. Finally, we demonstrate the use of our calibration networkfor several applications, including virtual object insertion, image retrieval,and compositing. A demonstration of our approach is available at .", "output": "A Deep Perceptual Measure for Lens and Camera Calibration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Stochastic restoration algorithms allow to explore the space of solutionsthat correspond to the degraded input. In this paper we reveal additionalfundamental advantages of stochastic methods over deterministic ones, whichfurther motivate their use. First, we prove that any restoration algorithm thatattains perfect perceptual quality and whose outputs are consistent with theinput must be a posterior sampler, and is thus required to be stochastic.Second, we illustrate that while deterministic restoration algorithms mayattain high perceptual quality, this can be achieved only by filling up thespace of all possible source images using an extremely sensitive mapping, whichmakes them highly vulnerable to adversarial attacks. Indeed, we show thatenforcing deterministic models to be robust to such attacks profoundly hinderstheir perceptual quality, while robustifying stochastic models hardlyinfluences their perceptual quality, and improves their output variability.These findings provide a motivation to foster progress in stochasticrestoration methods, paving the way to better recovery algorithms.", "output": "Reasons for the Superiority of Stochastic Estimators over Deterministic Ones: Robustness, Consistency and Perceptual Quality."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The pedestrian trajectory prediction task is an essential component ofintelligent systems. Its applications include but are not limited to autonomousdriving, robot navigation, and anomaly detection of monitoring systems. Due tothe diversity of motion behaviors and the complex social interactions amongpedestrians, accurately forecasting their future trajectory is challenging.Existing approaches commonly adopt GANs or CVAEs to generate diversetrajectories. However, GAN-based methods do not directly model data in a latentspace, which may make them fail to have full support over the underlying datadistribution; CVAE-based methods optimize a lower bound on the log-likelihoodof observations, which may cause the learned distribution to deviate from theunderlying distribution. The above limitations make existing approaches oftengenerate highly biased or inaccurate trajectories. In this paper, we propose anovel generative flow based framework with dual graphormer for pedestriantrajectory prediction (STGlow). Different from previous approaches, our methodcan more precisely model the underlying data distribution by optimizing theexact log-likelihood of motion behaviors. Besides, our method has clearphysical meanings for simulating the evolution of human motion behaviors. Theforward process of the flow gradually degrades complex motion behavior intosimple behavior, while its reverse process represents the evolution of simplebehavior into complex motion behavior. Further, we introduce a dual graphormercombining with the graph structure to more adequately model the temporaldependencies and the mutual spatial interactions. Experimental results onseveral benchmarks demonstrate that our method achieves much better performancecompared to previous state-of-the-art approaches.", "output": "STGlow: A Flow-based Generative Framework with Dual Graphormer for Pedestrian Trajectory Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a new method for exemplar-free class incremental training of ViTs.The main challenge of exemplar-free continual learning is maintainingplasticity of the learner without causing catastrophic forgetting of previouslylearned tasks. This is often achieved via exemplar replay which can helprecalibrate previous task classifiers to the feature drift which occurs whenlearning new tasks. Exemplar replay, however, comes at the cost of retainingsamples from previous tasks which for many applications may not be possible. Toaddress the problem of continual ViT training, we first propose gatedclass-attention to minimize the drift in the final ViT transformer block. Thismask-based gating is applied to class-attention mechanism of the lasttransformer block and strongly regulates the weights crucial for previoustasks. Importantly, gated class-attention does not require the task-ID duringinference, which distinguishes it from other parameter isolation methods.Secondly, we propose a new method of feature drift compensation thataccommodates feature drift in the backbone when learning new tasks. Thecombination of gated class-attention and cascaded feature drift compensationallows for plasticity towards new tasks while limiting forgetting of previousones. Extensive experiments performed on CIFAR-100, Tiny-ImageNet andImageNet100 demonstrate that our exemplar-free method obtains competitiveresults when compared to rehearsal based ViT methods.", "output": "Exemplar-free Continual Learning of Vision Transformers via Gated Class-Attention and Cascaded Feature Drift Compensation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a learning-based method, namely GeoUDF,to tackle the long-standingand challenging problem of reconstructing a discrete surface from a sparsepoint cloud.To be specific, we propose a geometry-guided learning method forUDF and its gradient estimation that explicitly formulates the unsigneddistance of a query point as the learnable affine averaging of its distances tothe tangent planes of neighboring points on the surface. Besides,we model thelocal geometric structure of the input point clouds by explicitly learning aquadratic polynomial for each point. This not only facilitates upsampling theinput sparse point cloud but also naturally induces unoriented normal, whichfurther augments UDF estimation. Finally, to extract triangle meshes from thepredicted UDF we propose a customized edge-based marching cube module. Weconduct extensive experiments and ablation studies to demonstrate thesignificant advantages of our method over state-of-the-art methods in terms ofreconstruction accuracy, efficiency, and generality. The source code ispublicly available at ", "output": "GeoUDF: Surface Reconstruction from 3D Point Clouds via Geometry-guided Distance Representation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Mixup is a popular data augmentation technique for training deep neuralnetworks where additional samples are generated by linearly interpolating pairsof inputs and their labels. This technique is known to improve thegeneralization performance in many learning paradigms and applications. In thiswork, we first analyze Mixup and show that it implicitly regularizes infinitelymany directional derivatives of all orders. Based on this new insight, wepropose an improved version of Mixup, theoretically justified to deliver bettergeneralization performance than the vanilla Mixup. To demonstrate theeffectiveness of the proposed method, we conduct experiments across variousdomains such as images, tabular data, speech, and graphs. Our results show thatthe proposed method improves Mixup across multiple datasets using a variety ofarchitectures, for instance, exhibiting an improvement over Mixup by 0.8% inImageNet top-1 accuracy.", "output": "MixupE: Understanding and Improving Mixup from Directional Derivative Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformers have recently gained attention in the computer vision domain dueto their ability to model long-range dependencies. However, the self-attentionmechanism, which is the core part of the Transformer model, usually suffersfrom quadratic computational complexity with respect to the number of tokens.Many architectures attempt to reduce model complexity by limiting theself-attention mechanism to local regions or by redesigning the tokenizationprocess. In this paper, we propose DAE-Former, a novel method that seeks toprovide an alternative perspective by efficiently designing the self-attentionmechanism. More specifically, we reformulate the self-attention mechanism tocapture both spatial and channel relations across the whole feature dimensionwhile staying computationally efficient. Furthermore, we redesign the skipconnection path by including the cross-attention module to ensure the featurereusability and enhance the localization power. Our method outperformsstate-of-the-art methods on multi-organ cardiac and skin lesion segmentationdatasets without requiring pre-training weights. The code is publicly availableat ", "output": "DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present Factor Fields, a novel framework for modeling and representingsignals. Factor Fields decomposes a signal into a product of factors, eachrepresented by a classical or neural field representation which operates ontransformed input coordinates. This decomposition results in a unifiedframework that accommodates several recent signal representations includingNeRF, Plenoxels, EG3D, Instant-NGP, and TensoRF. Additionally, our frameworkallows for the creation of powerful new signal representations, such as the\"Dictionary Field\" (DiF) which is a second contribution of this paper. Ourexperiments show that DiF leads to improvements in approximation quality,compactness, and training time when compared to previous fast reconstructionmethods. Experimentally, our representation achieves better image approximationquality on 2D image regression tasks, higher geometric quality whenreconstructing 3D signed distance fields, and higher compactness for radiancefield reconstruction tasks. Furthermore, DiF enables generalization to unseenimages/3D scenes by sharing bases across signals during training which greatlybenefits use cases such as image regression from sparse observations andfew-shot radiance field reconstruction.", "output": "Factor Fields: A Unified Framework for Neural Fields and Beyond."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper explores the application of emerging machine learning methods fromimage super-resolution (SR) to the task of statistical downscaling. Wespecifically focus on convolutional neural network-based Generative AdversarialNetworks (GANs). Our GANs are conditioned on low-resolution (LR) inputs togenerate high-resolution (HR) surface winds emulating Weather Research andForecasting (WRF) model simulations over North America. Unlike traditional SRmodels, where LR inputs are idealized coarsened versions of the HR images, WRFemulation involves using non-idealized LR and HR pairs resulting inshared-scale mismatches due to internal variability. Our study builds uponcurrent SR-based statistical downscaling by experimenting with a novelfrequency-separation (FS) approach from the computer vision field. To assessthe skill of SR models, we carefully select evaluation metrics, and focus onperformance measures based on spatial power spectra. Our analyses reveal howGAN configurations influence spatial structures in the generated fields,particularly biases in spatial variability spectra. Using power spectra toevaluate the FS experiments reveals that successful applications of FS incomputer vision do not translate to climate fields. However, the FS experimentsdemonstrate the sensitivity of power spectra to a commonly used GAN-based SRobjective function, which helps interpret and understand its role indetermining spatial structures. This result motivates the development of anovel partial frequency-separation scheme as a promising configuration option.We also quantify the influence on GAN performance of non-idealized LR fieldsresulting from internal variability. Furthermore, we conduct a spectra-basedfeature-importance experiment allowing us to explore the dependence of thespatial structure of generated fields on different physically relevant LRcovariates.", "output": "Algorithmic Hallucinations of Near-Surface Winds: Statistical Downscaling with Generative Adversarial Networks to Convection-Permitting Scales."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Self-supervised learning (SSL) strategies have demonstrated remarkableperformance in various recognition tasks. However, both our preliminaryinvestigation and recent studies suggest that they may be less effective inlearning representations for fine-grained visual recognition (FGVR) since manyfeatures helpful for optimizing SSL objectives are not suitable forcharacterizing the subtle differences in FGVR. To overcome this issue, wepropose learning an additional screening mechanism to identify discriminativeclues commonly seen across instances and classes, dubbed as common rationalesin this paper. Intuitively, common rationales tend to correspond to thediscriminative patterns from the key parts of foreground objects. We show thata common rationale detector can be learned by simply exploiting the GradCAMinduced from the SSL objective without using any pre-trained object parts orsaliency detectors, making it seamlessly to be integrated with the existing SSLprocess. Specifically, we fit the GradCAM with a branch with limited fittingcapacity, which allows the branch to capture the common rationales and discardthe less common discriminative patterns. At the test stage, the branchgenerates a set of spatial weights to selectively aggregate featuresrepresenting an instance. Extensive experimental results on four visual tasksdemonstrate that the proposed method can lead to a significant improvement indifferent evaluation settings.", "output": "Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Low-count PET is an efficient way to reduce radiation exposure andacquisition time, but the reconstructed images often suffer from lowsignal-to-noise ratio (SNR), thus affecting diagnosis and other downstreamtasks. Recent advances in deep learning have shown great potential in improvinglow-count PET image quality, but acquiring a large, centralized, and diversedataset from multiple institutions for training a robust model is difficult dueto privacy and security concerns of patient data. Moreover, low-count PET dataat different institutions may have different data distribution, thus requiringpersonalized models. While previous federated learning (FL) algorithms enablemulti-institution collaborative training without the need of aggregating localdata, addressing the large domain shift in the application ofmulti-institutional low-count PET denoising remains a challenge and is stillhighly under-explored. In this work, we propose FedFTN, a personalizedfederated learning strategy that addresses these challenges. FedFTN uses alocal deep feature transformation network (FTN) to modulate the feature outputsof a globally shared denoising network, enabling personalized low-count PETdenoising for each institution. During the federated learning process, only thedenoising network's weights are communicated and aggregated, while the FTNremains at the local institutions for feature transformation. We evaluated ourmethod using a large-scale dataset of multi-institutional low-count PET imagingdata from three medical centers located across three continents, and showedthat FedFTN provides high-quality low-count PET images, outperforming previousbaseline FL reconstruction methods across all low-count levels at all threeinstitutions.", "output": "FedFTN: Personalized Federated Learning with Deep Feature Transformation Network for Multi-institutional Low-count PET Denoising."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we present the USTC FLICAR Dataset, which is dedicated to thedevelopment of simultaneous localization and mapping and precise 3Dreconstruction of the workspace for heavy-duty autonomous aerial work robots.In recent years, numerous public datasets have played significant roles in theadvancement of autonomous cars and unmanned aerial vehicles (UAVs). However,these two platforms differ from aerial work robots: UAVs are limited in theirpayload capacity, while cars are restricted to two-dimensional movements. Tofill this gap, we create the \"Giraffe\" mapping robot based on a bucket truck,which is equipped with a variety of well-calibrated and synchronized sensors:four 3D LiDARs, two stereo cameras, two monocular cameras, Inertial MeasurementUnits (IMUs), and a GNSS/INS system. A laser tracker is used to record themillimeter-level ground truth positions. We also make its ground twin, the\"Okapi\" mapping robot, to gather data for comparison. The proposed datasetextends the typical autonomous driving sensing suite to aerial scenes,demonstrating the potential of combining autonomous driving perception systemswith bucket trucks to create a versatile autonomous aerial working platform.Moreover, based on the Segment Anything Model (SAM), we produce the SemanticFLICAR dataset, which provides fine-grained semantic segmentation annotationsfor multimodal continuous data in both temporal and spatial dimensions. Thedataset is available for download at: ", "output": "USTC FLICAR: A Sensors Fusion Dataset of LiDAR-Inertial-Camera for Heavy-duty Autonomous Aerial Work Robots."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have proven to be highly effective in generatinghigh-quality images. However, adapting large pre-trained diffusion models tonew domains remains an open challenge, which is critical for real-worldapplications. This paper proposes DiffFit, a parameter-efficient strategy tofine-tune large pre-trained diffusion models that enable fast adaptation to newdomains. DiffFit is embarrassingly simple that only fine-tunes the bias termand newly-added scaling factors in specific layers, yet resulting insignificant training speed-up and reduced model storage costs. Compared withfull fine-tuning, DiffFit achieves 2$times$ training speed-up and only needsto store approximately 0.12% of the total model parameters. Intuitivetheoretical analysis has been provided to justify the efficacy of scalingfactors on fast adaptation. On 8 downstream datasets, DiffFit achieves superioror competitive performances compared to the full fine-tuning while being moreefficient. Remarkably, we show that DiffFit can adapt a pre-trainedlow-resolution generative model to a high-resolution one by adding minimalcost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of3.02 on ImageNet 512$times$512 benchmark by fine-tuning only 25 epochs from apublic pre-trained ImageNet 256$times$256 checkpoint while being 30$times$more training efficient than the closest competitor.", "output": "DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Weakly-supervised temporal action localization aims to identify and localizethe action instances in the untrimmed videos with only video-level actionlabels. When humans watch videos, we can adapt our abstract-level knowledgeabout actions in different video scenarios and detect whether some actions areoccurring. In this paper, we mimic how humans do and bring a new perspectivefor locating and identifying multiple actions in a video. We propose a networknamed VQK-Net with a video-specific query-key attention modeling that learns aunique query for each action category of each input video. The learned queriesnot only contain the actions' knowledge features at the abstract level but alsohave the ability to fit this knowledge into the target video scenario, and theywill be used to detect the presence of the corresponding action along thetemporal dimension. To better learn these action category queries, we exploitnot only the features of the current input video but also the correlationbetween different videos through a novel video-specific action category querylearner worked with a query similarity loss. Finally, we conduct extensiveexperiments on three commonly used datasets (THUMOS14, ActivityNet1.2, andActivityNet1.3) and achieve state-of-the-art performance.", "output": "Video-Specific Query-Key Attention Modeling for Weakly-Supervised Temporal Action Localization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Atmospheric Turbulence (AT) correction is a challenging restoration task asit consists of two distortions: geometric distortion and spatially variantblur. Diffusion models have shown impressive accomplishments in photo-realisticimage synthesis and beyond. In this paper, we propose a novel deep conditionaldiffusion model under a variational inference framework to solve the ATcorrection problem. We use this framework to improve performance by learninglatent prior information from the input and degradation processes. We use thelearned information to further condition the diffusion model. Experiments areconducted in a comprehensive synthetic AT dataset. We show that the proposedframework achieves good quantitative and qualitative results.", "output": "Atmospheric Turbulence Correction via Variational Deep Diffusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current adversarial attacks on motion estimation, or optical flow, optimizesmall per-pixel perturbations, which are unlikely to appear in the real world.In contrast, adverse weather conditions constitute a much more realistic threatscenario. Hence, in this work, we present a novel attack on motion estimationthat exploits adversarially optimized particles to mimic weather effects likesnowflakes, rain streaks or fog clouds. At the core of our attack framework isa differentiable particle rendering system that integrates particles (i)consistently over multiple time steps (ii) into the 3D space (iii) with aphoto-realistic appearance. Through optimization, we obtain adversarial weatherthat significantly impacts the motion estimation. Surprisingly, methods thatpreviously showed good robustness towards small per-pixel perturbations areparticularly vulnerable to adversarial weather. At the same time, augmentingthe training with non-optimized weather increases a method's robustness towardsweather effects and improves generalizability at almost no additional cost. Ourcode will be available at ", "output": "Distracting Downpour: Adversarial Weather Attacks for Motion Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although Domain Generalization (DG) problem has been fast-growing in the 2Dimage tasks, its exploration on 3D point cloud data is still insufficient andchallenged by more complex and uncertain cross-domain variances with uneveninter-class modality distribution. In this paper, different from previous 2D DGworks, we focus on the 3D DG problem and propose a Single-dataset UnifiedGeneralization (SUG) framework that only leverages a single source dataset toalleviate the unforeseen domain differences faced by a well-trained sourcemodel. Specifically, we first design a Multi-grained Sub-domain Alignment (MSA)method, which can constrain the learned representations to be domain-agnosticand discriminative, by performing a multi-grained feature alignment processbetween the splitted sub-domains from the single source dataset. Then, aSample-level Domain-aware Attention (SDA) strategy is presented, which canselectively enhance easy-to-adapt samples from different sub-domains accordingto the sample-level inter-domain distance to avoid the negative transfer.Experiments demonstrate that our SUG can boost the generalization ability forunseen target domains, even outperforming the existing unsupervised domainadaptation methods that have to access extensive target domain data. Our codeis available at ", "output": "SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a fully automatic registration method of dental cone-beamcomputed tomography (CBCT) and face scan data. It can be used for a digitalplatform of 3D jaw-teeth-face models in a variety of applications, including 3Ddigital treatment planning and orthognathic surgery. Difficulties in accuratelymerging facial scans and CBCT images are due to the different image acquisitionmethods and limited area of correspondence between the two facial surfaces. Inaddition, it is difficult to use machine learning techniques because they useface-related 3D medical data with radiation exposure, which are difficult toobtain for training. The proposed method addresses these problems by reusing anexisting machine-learning-based 2D landmark detection algorithm in anopen-source library and developing a novel mathematical algorithm thatidentifies paired 3D landmarks from knowledge of the corresponding 2Dlandmarks. A main contribution of this study is that the proposed method doesnot require annotated training data of facial landmarks because it uses apre-trained facial landmark detection algorithm that is known to be robust andgeneralized to various 2D face image models. Note that this reduces a 3Dlandmark detection problem to a 2D problem of identifying the correspondinglandmarks on two 2D projection images generated from two different projectionangles. Here, the 3D landmarks for registration were selected from thesub-surfaces with the least geometric change under the CBCT and face scanenvironments. For the final fine-tuning of the registration, the IterativeClosest Point method was applied, which utilizes geometrical information aroundthe 3D landmarks. The experimental results show that the proposed methodachieved an averaged surface distance error of 0.74 mm for three pairs of CBCTand face scan datasets.", "output": "Automatic 3D Registration of Dental CBCT and Face Scan Data using 2D Projection Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent works show that the data distribution in a network's latent space isuseful for estimating classification uncertainty and detectingOut-of-distribution (OOD) samples. To obtain a well-regularized latent spacethat is conducive for uncertainty estimation, existing methods bring insignificant changes to model architectures and training procedures. In thispaper, we present a lightweight, fast, and high-performance regularizationmethod for Mahalanobis distance-based uncertainty prediction, and that requiresminimal changes to the network's architecture. To derive Gaussian latentrepresentation favourable for Mahalanobis Distance calculation, we introduce aself-supervised representation learning method that separates in-classrepresentations into multiple Gaussians. Classes with non-Gaussianrepresentations are automatically identified and dynamically clustered intomultiple new classes that are approximately Gaussian. Evaluation on standardOOD benchmarks shows that our method achieves state-of-the-art results on OODdetection with minimal inference time, and is very competitive on predictiveprobability calibration. Finally, we show the applicability of our method to areal-life computer vision use case on microorganism classification.", "output": "Gaussian Latent Representations for Uncertainty Estimation using Mahalanobis Distance in Deep Classifiers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Image captured under low-light conditions presents unpleasing artifacts,which debilitate the performance of feature extraction for many upstream visualtasks. Low-light image enhancement aims at improving brightness and contrast,and further reducing noise that corrupts the visual quality. Recently, manyimage restoration methods based on Swin Transformer have been proposed andachieve impressive performance. However, on one hand, trivially employing SwinTransformer for low-light image enhancement would expose some artifacts,including over-exposure, brightness imbalance and noise corruption, etc. On theother hand, it is impractical to capture image pairs of low-light images andcorresponding ground-truth, i.e. well-exposed image in same visual scene. Inthis paper, we propose a dual-branch network based on Swin Transformer, guidedby a signal-to-noise ratio prior map which provides the spatial-varyinginformation for low-light image enhancement. Moreover, we leverage unsupervisedlearning to construct the optimization objective based on Retinex model, toguide the training of proposed network. Experimental results demonstrate thatthe proposed model is competitive with the baseline models.", "output": "Unsupervised Low Light Image Enhancement Using SNR-Aware Swin Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Art curatorial practice is characterized by the presentation of an artcollection in a knowledgeable way. Machine processes are characterized by theircapacity to manage and analyze large amounts of data. This paper envisages AIcuration and audience interaction to explore the implications of contemporarymachine learning models for the curatorial world. This project was developedfor the occasion of the 2023 Helsinki Art Biennial, entitled New Directions MayEmerge. We use the Helsinki Art Museum (HAM) collection to re-imagine the cityof Helsinki through the lens of machine perception. We use visual-textualmodels to place indoor artworks in public spaces, assigning fictionalcoordinates based on similarity scores. We transform the space that eachartwork inhabits in the city by generating synthetic 360 art panoramas. Weguide the generation estimating depth values from 360 panoramas at each artworklocation, and machine-generated prompts of the artworks. The result of thisproject is an AI curation that places the artworks in their imagined physicalspace, blurring the lines of artwork, context, and machine perception. The workis virtually presented as a web-based installation on this link, where users can navigate an alternative version ofthe city while exploring and interacting with its cultural heritage at scale.", "output": "AI Art Curation: Re-imagining the city of Helsinki in occasion of its Biennial."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, encoders like ViT (vision transformer) and ResNet have been trainedon vast datasets and utilized as perceptual metrics for comparing sketches andimages, as well as multi-domain encoders in a zero-shot setting. However, therehas been limited effort to quantify the granularity of these encoders. Our workaddresses this gap by focusing on multi-modal 2D projections of individual 3Dinstances. This task holds crucial implications for retrieval and sketch-basedmodeling. We show that in a zero-shot setting, the more abstract the sketch,the higher the likelihood of incorrect image matches. Even within the samesketch domain, sketches of the same object drawn in different styles, forexample by distinct individuals, might not be accurately matched. One of thekey findings of our research is that meticulous fine-tuning on one class of 3Dshapes can lead to improved performance on other shape classes, reaching orsurpassing the accuracy of supervised methods. We compare and discuss severalfine-tuning strategies. Additionally, we delve deeply into how the scale of anobject in a sketch influences the similarity of features at different networklayers, helping us identify which network layers provide the most accuratematching. Significantly, we discover that ViT and ResNet perform best whendealing with similar object scales. We believe that our work will have asignificant impact on research in the sketch domain, providing insights andguidance on how to adopt large pretrained models as perceptual losses.", "output": "Fine-Tuned but Zero-Shot 3D Shape Sketch View Similarity and Retrieval."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Autonomous vehicles and Advanced Driving Assistance Systems (ADAS) have thepotential to radically change the way we travel. Many such vehicles currentlyrely on segmentation and object detection algorithms to detect and trackobjects around its surrounding. The data collected from the vehicles are oftensent to cloud servers to facilitate continual/life-long learning of thesealgorithms. Considering the bandwidth constraints, the data is compressedbefore sending it to servers, where it is typically decompressed for trainingand analysis. In this work, we propose the use of a learning-based compressionCodec to reduce the overhead in latency incurred for the decompressionoperation in the standard pipeline. We demonstrate that the learned compressedrepresentation can also be used to perform tasks like semantic segmentation inaddition to decompression to obtain the images. We experimentally validate theproposed pipeline on the Cityscapes dataset, where we achieve a compressionfactor up to $66 times$ while preserving the information required to performsegmentation with a dice coefficient of $0.84$ as compared to $0.88$ achievedusing decompressed images while reducing the overall compute by $11%$.", "output": "Exploiting Richness of Learned Compressed Representation of Images for Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Extracting image semantics effectively and assigning corresponding labels tomultiple objects or attributes for natural images is challenging due to thecomplex scene contents and confusing label dependencies. Recent works havefocused on modeling label relationships with graph and understanding objectregions using class activation maps (CAM). However, these methods ignore thecomplex intra- and inter-category relationships among specific semanticfeatures, and CAM is prone to generate noisy information. To this end, wepropose a novel semantic-aware dual contrastive learning framework thatincorporates sample-to-sample contrastive learning (SSCL) as well asprototype-to-sample contrastive learning (PSCL). Specifically, we leveragesemantic-aware representation learning to extract category-related localdiscriminative features and construct category prototypes. Then based on SSCL,label-level visual representations of the same category are aggregatedtogether, and features belonging to distinct categories are separated.Meanwhile, we construct a novel PSCL module to narrow the distance betweenpositive samples and category prototypes and push negative samples away fromthe corresponding category prototypes. Finally, the discriminative label-levelfeatures related to the image content are accurately captured by the jointtraining of the above three parts. Experiments on five challenging large-scalepublic datasets demonstrate that our proposed method is effective andoutperforms the state-of-the-art methods. Code and supplementary materials arereleased on ", "output": "Semantic-Aware Dual Contrastive Learning for Multi-label Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While LiDAR sensors have been succesfully applied to 3D object detection, theaffordability of radar and camera sensors has led to a growing interest infusiong radars and cameras for 3D object detection. However, previousradar-camera fusion models have not been able to fully utilize radarinformation in that initial 3D proposals were generated based on the camerafeatures only and the instance-level fusion is subsequently conducted. In thispaper, we propose radar-camera multi-level fusion (RCM-Fusion), which fusesradar and camera modalities at both the feature-level and instance-level tofully utilize radar information. At the feature-level, we propose a RadarGuided BEV Encoder which utilizes radar Bird's-Eye-View (BEV) features totransform image features into precise BEV representations and then adaptivelycombines the radar and camera BEV features. At the instance-level, we propose aRadar Grid Point Refinement module that reduces localization error byconsidering the characteristics of the radar point clouds. The experimentsconducted on the public nuScenes dataset demonstrate that our proposedRCM-Fusion offers 11.8% performance gain in nuScenes detection score (NDS) overthe camera-only baseline model and achieves state-of-the-art performaces amongradar-camera fusion methods in the nuScenes 3D object detection benchmark. Codewill be made publicly available.", "output": "RCM-Fusion: Radar-Camera Multi-Level Fusion for 3D Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Semantic segmentation is a common task in autonomous driving to understandthe surrounding environment. Driveable Area Segmentation and Lane Detection areparticularly important for safe and efficient navigation on the road. However,original semantic segmentation models are computationally expensive and requirehigh-end hardware, which is not feasible for embedded systems in autonomousvehicles. This paper proposes a lightweight model for the driveable area andlane line segmentation. TwinLiteNet is designed cheaply but achieves accurateand efficient segmentation results. We evaluate TwinLiteNet on the BDD100Kdataset and compare it with modern models. Experimental results show that ourTwinLiteNet performs similarly to existing approaches, requiring significantlyfewer computational resources. Specifically, TwinLiteNet achieves a mIoU scoreof 91.3% for the Drivable Area task and 31.08% IoU for the Lane Detection taskwith only 0.4 million parameters and achieves 415 FPS on GPU RTX A5000.Furthermore, TwinLiteNet can run in real-time on embedded devices with limitedcomputing power, especially since it achieves 60FPS on Jetson Xavier NX, makingit an ideal solution for self-driving vehicles. Code is available:url{", "output": "TwinLiteNet: An Efficient and Lightweight Model for Driveable Area and Lane Segmentation in Self-Driving Cars."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Weakly-supervised change detection (WSCD) aims to detect pixel-level changeswith only image-level annotations. Owing to its label efficiency, WSCD isdrawing increasing attention recently. However, current WSCD methods oftenencounter the challenge of change missing and fabricating, i.e., theinconsistency between image-level annotations and pixel-level predictions.Specifically, change missing refer to the situation that the WSCD model failsto predict any changed pixels, even though the image-level label indicateschanged, and vice versa for change fabricating. To address this challenge, inthis work, we leverage global-scale and local-scale priors in WSCD and proposetwo components: a Dilated Prior (DP) decoder and a Label Gated (LG) constraint.The DP decoder decodes samples with the changed image-level label, skipssamples with the unchanged label, and replaces them with an all-unchangedpixel-level label. The LG constraint is derived from the correspondence betweenchanged representations and image-level labels, penalizing the model when itmispredicts the change status. Additionally, we develop TransWCD, a simple yetpowerful transformer-based model, showcasing the potential of weakly-supervisedlearning in change detection. By integrating the DP decoder and LG constraintinto TransWCD, we form TransWCD-DL. Our proposed TransWCD and TransWCD-DLachieve significant +6.33% and +9.55% F1 score improvements over thestate-of-the-art methods on the WHU-CD dataset, respectively. Some performancemetrics even exceed several fully-supervised change detection (FSCD)competitors. Code will be available at", "output": "Exploring Effective Priors and Efficient Models for Weakly-Supervised Change Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spiking neural networks (SNNs) are brain-inspired energy-efficient modelsthat encode information in spatiotemporal dynamics. Recently, deep SNNs traineddirectly have shown great success in achieving high performance onclassification tasks with very few time steps. However, how to design adirectly-trained SNN for the regression task of object detection still remainsa challenging problem. To address this problem, we propose EMS-YOLO, a noveldirectly-trained SNN framework for object detection, which is the first trialto train a deep SNN with surrogate gradients for object detection rather thanANN-SNN conversion strategies. Specifically, we design a full-spike residualblock, EMS-ResNet, which can effectively extend the depth of thedirectly-trained SNN with low power consumption. Furthermore, we theoreticallyanalyze and prove the EMS-ResNet could avoid gradient vanishing or exploding.The results demonstrate that our approach outperforms the state-of-the-artANN-SNN conversion methods (at least 500 time steps) in extremely fewer timesteps (only 4 time steps). It is shown that our model could achieve comparableperformance to the ANN with the same architecture while consuming 5.83 timesless energy on the frame-based COCO Dataset and the event-based Gen1 Dataset.", "output": "Deep Directly-Trained Spiking Neural Networks for Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human-centric visual understanding is an important desideratum for effectivehuman-robot interaction. In order to navigate crowded public places, socialrobots must be able to interpret the activity of the surrounding humans. Thispaper addresses one key aspect of human-centric visual understanding,multi-person pose estimation. Achieving good performance on multi-person poseestimation in crowded scenes is difficult due to the challenges of occludedjoints and instance separation. In order to tackle these challenges andovercome the limitations of image features in representing invisible bodyparts, we propose a novel prompt-based pose inference strategy called LAMP(Language Assisted Multi-person Pose estimation). By utilizing the textrepresentations generated by a well-trained language model (CLIP), LAMP canfacilitate the understanding of poses on the instance and joint levels, andlearn more robust visual representations that are less susceptible toocclusion. This paper demonstrates that language-supervised training boosts theperformance of single-stage multi-person pose estimation, and bothinstance-level and joint-level prompts are valuable for training. The code isavailable at ", "output": "LAMP: Leveraging Language Prompts for Multi-person Pose Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper provides a comprehensive analysis of information leakage duringdistance evaluation, with an emphasis on threshold-based obfuscated distance(i.e., Fuzzy Matcher). Leakage can occur due to a malware infection or the useof a weakly privacy-preserving matcher, exemplified by side channel attacks orpartially obfuscated designs. We provide an exhaustive catalog of informationleakage scenarios as well as their impacts on the security concerning dataprivacy. Each of the scenarios leads to generic attacks whose impacts areexpressed in terms of computational costs, hence allowing the establishment ofupper bounds on the security level.", "output": "A Comprehensive Analysis on the Leakage of Fuzzy Matchers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "3D anomaly detection is an emerging and vital computer vision task inindustrial manufacturing (IM). Recently many advanced algorithms have beenpublished, but most of them cannot meet the needs of IM. There are severaldisadvantages: i) difficult to deploy on production lines since theiralgorithms heavily rely on large pre-trained models; ii) hugely increasestorage overhead due to overuse of memory banks; iii) the inference speedcannot be achieved in real-time. To overcome these issues, we propose an easyand deployment-friendly network (called EasyNet) without using pre-trainedmodels and memory banks: firstly, we design a multi-scale multi-modalityfeature encoder-decoder to accurately reconstruct the segmentation maps ofanomalous regions and encourage the interaction between RGB images and depthimages; secondly, we adopt a multi-modality anomaly segmentation network toachieve a precise anomaly map; thirdly, we propose an attention-basedinformation entropy fusion module for feature fusion during inference, makingit suitable for real-time deployment. Extensive experiments show that EasyNetachieves an anomaly detection AUROC of 92.6% without using pre-trained modelsand memory banks. In addition, EasyNet is faster than existing methods, with ahigh frame rate of 94.55 FPS on a Tesla V100 GPU.", "output": "EasyNet: An Easy Network for 3D Industrial Anomaly Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-agent embodied tasks have recently been studied in complex indoorvisual environments. Collaboration among multiple agents can improve workefficiency and has significant practical value. However, most of the existingresearch focuses on homogeneous multi-agent tasks. Compared with homogeneousagents, heterogeneous agents can leverage their different capabilities toallocate corresponding sub-tasks and cooperate to complete complex tasks.Heterogeneous multi-agent tasks are common in real-world scenarios, and thecollaboration strategy among heterogeneous agents is a challenging andimportant problem to be solved. To study collaboration among heterogeneousagents, we propose the heterogeneous multi-agent tidying-up task, in whichmultiple heterogeneous agents with different capabilities collaborate with eachother to detect misplaced objects and place them in reasonable locations. Thisis a demanding task since it requires agents to make the best use of theirdifferent capabilities to conduct reasonable task planning and complete thewhole task. To solve this task, we build a heterogeneous multi-agent tidying-upbenchmark dataset in a large number of houses with multiple rooms based onProcTHOR-10K. We propose the hierarchical decision model based on misplacedobject detection, reasonable receptacle prediction, as well as thehandshake-based group communication mechanism. Extensive experiments areconducted to demonstrate the effectiveness of the proposed model. The project'swebsite and videos of experiments can be found at ", "output": "Heterogeneous Embodied Multi-Agent Collaboration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Palmprint recently shows great potential in recognition applications as it isa privacy-friendly and stable biometric. However, the lack of large-scalepublic palmprint datasets limits further research and development of palmprintrecognition. In this paper, we propose a novel realistic pseudo-palmprintgeneration (RPG) model to synthesize palmprints with massive identities. Wefirst introduce a conditional modulation generator to improve the intra-classdiversity. Then an identity-aware loss is proposed to ensure identityconsistency against unpaired training. We further improve the B'ezier palmcreases generation strategy to guarantee identity independence. Extensiveexperimental results demonstrate that synthetic pretraining significantlyboosts the recognition model performance. For example, our model improves thestate-of-the-art B'ezierPalm by more than $5%$ and $14%$ in terms ofTAR@FAR=1e-6 under the $1:1$ and $1:3$ Open-set protocol. When accessing only$10%$ of the real training data, our method still outperforms ArcFace with$100%$ real training data, indicating that we are closer to real-data-freepalmprint recognition.", "output": "RPG-Palm: Realistic Pseudo-data Generation for Palmprint Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Shape generation is the practice of producing 3D shapes as variousrepresentations for 3D content creation. Previous studies on 3D shapegeneration have focused on shape quality and structure, without or lessconsidering the importance of semantic information. Consequently, suchgenerative models often fail to preserve the semantic consistency of shapestructure or enable manipulation of the semantic attributes of shapes duringgeneration. In this paper, we proposed a novel semantic generative model named3D Semantic Subspace Traverser that utilizes semantic attributes forcategory-specific 3D shape generation and editing. Our method utilizes implicitfunctions as the 3D shape representation and combines a novel latent-space GANwith a linear subspace model to discover semantic dimensions in the locallatent space of 3D shapes. Each dimension of the subspace corresponds to aparticular semantic attribute, and we can edit the attributes of generatedshapes by traversing the coefficients of those dimensions. Experimental resultsdemonstrate that our method can produce plausible shapes with complexstructures and enable the editing of semantic attributes. The code and trainedmodels are available at", "output": "3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Medical radiography segmentation, and specifically dental radiography, ishighly limited by the cost of labeling which requires specific expertise andlabor-intensive annotations. In this work, we propose a straightforwardpre-training method for semantic segmentation leveraging Denoising DiffusionProbabilistic Models (DDPM), which have shown impressive results for generativemodeling. Our straightforward approach achieves remarkable performance in termsof label efficiency and does not require architectural modifications betweenpre-training and downstream tasks. We propose to first pre-train a Unet byexploiting the DDPM training objective, and then fine-tune the resulting modelon a segmentation task. Our experimental results on the segmentation of dentalradiographs demonstrate that the proposed method is competitive withstate-of-the-art pre-training methods.", "output": "Pre-Training with Diffusion models for Dental Radiography segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose a novel method for single-view 3D style transferthat generates a unique 3D object with both shape and texture transfer. Ourfocus lies primarily on birds, a popular subject in 3D reconstruction, forwhich no existing single-view 3D transfer methods have been developed.Themethod we propose seeks to generate a 3D mesh shape and texture of a bird fromtwo single-view images. To achieve this, we introduce a novel shape transfergenerator that comprises a dual residual gated network (DRGNet), and amulti-layer perceptron (MLP). DRGNet extracts the features of source and targetimages using a shared coordinate gate unit, while the MLP generates spatialcoordinates for building a 3D mesh. We also introduce a semantic UV texturetransfer module that implements textural style transfer using semantic UVsegmentation, which ensures consistency in the semantic meaning of thetransferred regions. This module can be widely adapted to many existingapproaches. Finally, our method constructs a novel 3D bird using adifferentiable renderer. Experimental results on the CUB dataset verify thatour method achieves state-of-the-art performance on the single-view 3D styletransfer task. Code is available in ", "output": "Creative Birds: Self-Supervised Single-View 3D Style Transfer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recognizing handwritten digits is a challenging task primarily due to thediversity of writing styles and the presence of noisy images. The widely usedMNIST dataset, which is commonly employed as a benchmark for this task,includes distorted digits with irregular shapes, incomplete strokes, andvarying skew in both the training and testing datasets. Consequently, thesefactors contribute to reduced accuracy in digit recognition. To overcome thischallenge, we propose a two-stage deep learning approach. In the first stage,we create a simple neural network to identify distorted digits within thetraining set. This model serves to detect and filter out such distorted andambiguous images. In the second stage, we exclude these identified images fromthe training dataset and proceed to retrain the model using the filtereddataset. This process aims to improve the classification accuracy andconfidence levels while mitigating issues of underfitting and overfitting. Ourexperimental results demonstrate the effectiveness of the proposed approach,achieving an accuracy rate of over 99.5% on the testing dataset. Thissignificant improvement showcases the potential of our method in enhancingdigit classification accuracy. In our future work, we intend to explore thescalability of this approach and investigate techniques to further enhanceaccuracy by reducing the size of the training data.", "output": "Pruning Distorted Images in MNIST Handwritten Digits."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Mobile edge computing (MEC) is essential for next-generation mobile networkapplications that prioritize various performance metrics, including delays andenergy consumption. However, conventional single-objective scheduling solutionscannot be directly applied to practical systems in which the preferences ofthese applications (i.e., the weights of different objectives) are oftenunknown or challenging to specify in advance. In this study, we address thisissue by formulating a multi-objective offloading problem for MEC with multipleedges to minimize expected long-term energy consumption and transmission delaywhile considering unknown preferences as parameters. To address the challengeof unknown preferences, we design a multi-objective (deep) reinforcementlearning (MORL)-based resource scheduling scheme with proximal policyoptimization (PPO). In addition, we introduce a well-designed state encodingmethod for constructing features for multiple edges in MEC systems, asophisticated reward function for accurately computing the utilities of delayand energy consumption. Simulation results demonstrate that our proposed MORLscheme enhances the hypervolume of the Pareto front by up to 233.1% compared tobenchmarks. Our full framework is available at", "output": "Multi-objective Deep Reinforcement Learning for Mobile Edge Computing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural operations that rely on neighborhood information are much moreexpensive when deployed on point clouds than on grid data due to the irregulardistances between points in a point cloud. In a grid, on the other hand, we cancompute the kernel only once and reuse it for all query positions. As a result,operations that rely on neighborhood information scale much worse for pointclouds than for grid data, specially for large inputs and large neighborhoods.In this work, we address the scalability issue of point cloud methods bytackling its root cause: the irregularity of the data. We propose learnablegridification as the first step in a point cloud processing pipeline totransform the point cloud into a compact, regular grid. Thanks togridification, subsequent layers can use operations defined on regular grids,e.g., Conv3D, which scale much better than native point cloud methods. We thenextend gridification to point cloud to point cloud tasks, e.g., segmentation,by adding a learnable de-gridification step at the end of the point cloudprocessing pipeline to map the compact, regular grid back to its original pointcloud form. Through theoretical and empirical analysis, we show that gridifiednetworks scale better in terms of memory and time than networks directlyapplied on raw point cloud data, while being able to achieve competitiveresults. Our code is publicly available at", "output": "Learned Gridification for Efficient Point Cloud Processing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Optimization methods are essential in solving complex problems across variousdomains. In this research paper, we introduce a novel optimization methodcalled Gaussian Crunching Search (GCS). Inspired by the behaviour of particlesin a Gaussian distribution, GCS aims to efficiently explore the solution spaceand converge towards the global optimum. We present a comprehensive analysis ofGCS, including its working mechanism, and potential applications. Throughexperimental evaluations and comparisons with existing optimization methods, wehighlight the advantages and strengths of GCS. This research paper serves as avaluable resource for researchers, practitioners, and students interested inoptimization, providing insights into the development and potential of GaussianCrunching Search as a new and promising approach.", "output": "A new derivative-free optimization method: Gaussian Crunching Search."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This study presents an ensemble model combining LSTM, BiLSTM, CNN, GRU, andGloVe to classify gene mutations using Kaggle's Personalized Medicine:Redefining Cancer Treatment dataset. The results were compared againstwell-known transformers like as BERT, Electra, Roberta, XLNet, Distilbert, andtheir LSTM ensembles. Our model outperformed all other models in terms ofaccuracy, precision, recall, F1 score, and Mean Squared Error. Surprisingly, italso needed less training time, resulting in a perfect combination ofperformance and efficiency. This study demonstrates the utility of ensemblemodels for difficult tasks such as gene mutation classification.", "output": "A Hybrid Machine Learning Model for Classifying Gene Mutations in Cancer using LSTM, BiLSTM, CNN, GRU, and GloVe."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Convolutional neural networks (CNNs) have been shown to both extract moreinformation than the traditional two-point statistics from cosmological fields,and marginalise over astrophysical effects extremely well. However, CNNsrequire large amounts of training data, which is potentially problematic in thedomain of expensive cosmological simulations, and it is difficult to interpretthe network. In this work we apply the learnable scattering transform, a kindof convolutional neural network that uses trainable wavelets as filters, to theproblem of cosmological inference and marginalisation over astrophysicaleffects. We present two models based on the scattering transform, oneconstructed for performance, and one constructed for interpretability, andperform a comparison with a CNN. We find that scattering architectures are ableto outperform a CNN, significantly in the case of small training data samples.Additionally we present a lightweight scattering network that is highlyinterpretable.", "output": "Learnable wavelet neural networks for cosmological inference."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cardiac cine MRI is the gold standard for cardiac functional assessment, butthe inherently slow acquisition process creates the necessity of reconstructionapproaches for accelerated undersampled acquisitions. Several regularizationapproaches that exploit spatial-temporal redundancy have been proposed toreconstruct undersampled cardiac cine MRI. More recently, methods based onsupervised deep learning have been also proposed to further accelerateacquisition and reconstruction. However, these techniques rely on usually largedataset for training, which are not always available. In this work, we proposean unsupervised approach based on implicit neural field representations forcardiac cine MRI (so called NF-cMRI). The proposed method was evaluated inin-vivo undersampled golden-angle radial multi-coil acquisitions forundersampling factors of 26x and 52x, achieving good image quality, andcomparable spatial and improved temporal depiction than a state-of-the-artreconstruction technique.", "output": "Unsupervised reconstruction of accelerated cardiac cine MRI using Neural Fields."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Distributionally Robust Optimization (DRO), which aims to find an optimaldecision that minimizes the worst case cost over the ambiguity set ofprobability distribution, has been widely applied in diverse applications,e.g., network behavior analysis, risk management, etc. However, existing DROtechniques face three key challenges: 1) how to deal with the asynchronousupdating in a distributed environment; 2) how to leverage the priordistribution effectively; 3) how to properly adjust the degree of robustnessaccording to different scenarios. To this end, we propose an asynchronousdistributed algorithm, named Asynchronous Single-looP alternatIve gRadientprojEction (ASPIRE) algorithm with the itErative Active SEt method (EASE) totackle the federated distributionally robust optimization (FDRO) problem.Furthermore, a new uncertainty set, i.e., constrained D-norm uncertainty set,is developed to effectively leverage the prior distribution and flexiblycontrol the degree of robustness. Finally, our theoretical analysis elucidatesthat the proposed algorithm is guaranteed to converge and the iterationcomplexity is also analyzed. Extensive empirical studies on real-world datasetsdemonstrate that the proposed method can not only achieve fast convergence, andremain robust against data heterogeneity as well as malicious attacks, but alsotradeoff robustness with performance.", "output": "Federated Distributionally Robust Optimization with Non-Convex Objectives: Algorithm and Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Ranking functions that are used in decision systems often produce disparateresults for different populations because of bias in the underlying data.Addressing, and compensating for, these disparate outcomes is a criticalproblem for fair decision-making. Recent compensatory measures have mostlyfocused on opaque transformations of the ranking functions to satisfy fairnessguarantees or on the use of quotas or set-asides to guarantee a minimum numberof positive outcomes to members of underrepresented groups. In this paper wepropose easily explainable data-driven compensatory measures for rankingfunctions. Our measures rely on the generation of bonus points given to membersof underrepresented groups to address disparity in the ranking function. Thebonus points can be set in advance, and can be combined, allowing forconsidering the intersections of representations and giving better transparencyto stakeholders. We propose efficient sampling-based algorithms to calculatethe number of bonus points to minimize disparity. We validate our algorithmsusing real-world school admissions and recidivism datasets, and compare ourresults with that of existing fair ranking algorithms.", "output": "Explainable Disparity Compensation for Efficient Fair Ranking."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The complex nature of big biological systems pushed some scientists toclassify its understanding under the inconceivable missions. Different leveledchallenges complicated this task, one of is the prediction of a protein'sfunction. In recent years, significant progress has been made in this fieldthrough the development of various machine learning approaches. However, mostexisting methods formulate the task as a multi-classification problem, i.eassigning predefined labels to proteins. In this work, we propose a novelapproach, textbf{Prot2Text}, which predicts a protein function's in a freetext style, moving beyond the conventional binary or categoricalclassifications. By combining Graph Neural Networks(GNNs) and Large LanguageModels(LLMs), in an encoder-decoder framework, our model effectively integratesdiverse data types including proteins' sequences, structures, and textualannotations. This multimodal approach allows for a holistic representation ofproteins' functions, enabling the generation of detailed and accuratedescriptions. To evaluate our model, we extracted a multimodal protein datasetfrom SwissProt, and demonstrate empirically the effectiveness of Prot2Text.These results highlight the transformative impact of multimodal models,specifically the fusion of GNNs and LLMs, empowering researchers with powerfultools for more accurate prediction of proteins' functions. The code, the modelsand a demo will be publicly released.", "output": "Prot2Text: Multimodal Protein's Function Generation with GNNs and Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This study presents a machine learning model based on the Naive Bayesclassifier for predicting the level of depression in university students, theobjective was to improve prediction accuracy using a machine learning modelinvolving 70% training data and 30% validation data based on the Naive Bayesclassifier, the collected data includes factors associated with depression from519 university students, the results showed an accuracy of 78.03%, highsensitivity in detecting positive cases of depression, especially at moderateand severe levels, and significant specificity in correctly classifyingnegative cases, these findings highlight the effectiveness of the model inearly detection and treatment of depression, benefiting vulnerable sectors andcontributing to the improvement of mental health in the student population.", "output": "Prediction of depression status in college students using a Naive Bayes classifier based machine learning model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper analyzes representations of continuous piecewise linear functionswith infinite width, finite cost shallow neural networks using the rectifiedlinear unit (ReLU) as an activation function. Through its integralrepresentation, a shallow neural network can be identified by the correspondingsigned, finite measure on an appropriate parameter space. We map these measureson the parameter space to measures on the projective $n$-sphere cross$mathbb{R}$, allowing points in the parameter space to be bijectively mappedto hyperplanes in the domain of the function. We prove a conjecture of Ongie etal. that every continuous piecewise linear function expressible with this kindof infinite width neural network is expressible as a finite width shallow ReLUneural network.", "output": "Piecewise Linear Functions Representable with Infinite Width Shallow ReLU Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This study provides a comprehensive time series analysis of dailyindustry-specific, country-wise CO$_2$ emissions from January 2019 to February2023. The research focuses on the Power, Industry, Ground Transport, DomesticAviation, and International Aviation sectors in European countries (EU27 & UK,Italy, Germany, Spain) and India, utilizing near-real-time activity data fromthe Carbon Monitor research initiative. To identify regular emission patterns,the data from the year 2020 is excluded due to the disruptive effects caused bythe COVID-19 pandemic. The study then performs a principal component analysis(PCA) to determine the key contributors to CO$_2$ emissions. The analysisreveals that the Power, Industry, and Ground Transport sectors account for asignificant portion of the variance in the dataset. A 7-day moving averageddataset is employed for further analysis to facilitate robust predictions. Thisdataset captures both short-term and long-term trends and enhances the qualityof the data for prediction purposes. The study utilizes Long Short-Term Memory(LSTM) models on the 7-day moving averaged dataset to effectively predictemissions and provide insights for policy decisions, mitigation strategies, andclimate change efforts. During the training phase, the stability andconvergence of the LSTM models are ensured, which guarantees their reliabilityin the testing phase. The evaluation of the loss function indicates thisreliability. The model achieves high efficiency, as demonstrated by $R^2$values ranging from 0.8242 to 0.995 for various countries and sectors.Furthermore, there is a proposal for utilizing scandium andboron/aluminium-based thin films as exceptionally efficient materials forcapturing CO$_2$ (with a binding energy range from -3.0 to -3.5 eV). Thesematerials are shown to surpass the affinity of graphene and boron nitridesheets in this regard.", "output": "Forecasting, capturing and activation of carbon-dioxide (CO$_2$): Integration of Time Series Analysis, Machine Learning, and Material Design."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the development of Big data technology, data analysis has becomeincreasingly important. Traditional clustering algorithms such as K-means arehighly sensitive to the initial centroid selection and perform poorly onnon-convex datasets. In this paper, we address these problems by proposing adata-driven Bregman divergence parameter optimization clustering algorithm(DBGSA), which combines the Universal Gravitational Algorithm to bring similarpoints closer in the dataset. We construct a gravitational coefficient equationwith a special property that gradually reduces the influence factor as theiteration progresses. Furthermore, we introduce the Bregman divergencegeneralized power mean information loss minimization to identify clustercenters and build a hyperparameter identification optimization model, whicheffectively solves the problems of manual adjustment and uncertainty in theimproved dataset. Extensive experiments are conducted on four simulateddatasets and six real datasets. The results demonstrate that DBGSAsignificantly improves the accuracy of various clustering algorithms by anaverage of 63.8% compared to other similar approaches like enhanced clusteringalgorithms and improved datasets. Additionally, a three-dimensional grid searchwas established to compare the effects of different parameter values withinthreshold conditions, and it was discovered the parameter set provided by ourmodel is optimal. This finding provides strong evidence of the high accuracyand robustness of the algorithm.", "output": "DBGSA: A Novel Data Adaptive Bregman Clustering Algorithm."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Supervised classification algorithms are used to solve a growing number ofreal-life problems around the globe. Their performance is strictly connectedwith the quality of labels used in training. Unfortunately, acquiringgood-quality annotations for many tasks is infeasible or too expensive to bedone in practice. To tackle this challenge, active learning algorithms arecommonly employed to select only the most relevant data for labeling. However,this is possible only when the quality and quantity of labels acquired fromexperts are sufficient. Unfortunately, in many applications, a trade-offbetween annotating individual samples by multiple annotators to increase labelquality vs. annotating new samples to increase the total number of labeledinstances is necessary. In this paper, we address the issue of faulty dataannotations in the context of active learning. In particular, we propose twonovel annotation unification algorithms that utilize unlabeled parts of thesample space. The proposed methods require little to no intersection betweensamples annotated by different experts. Our experiments on four public datasetsindicate the robustness and superiority of the proposed methods in both, theestimation of the annotator's reliability, and the assignment of actual labels,against the state-of-the-art algorithms and the simple majority voting.", "output": "Robust Assignment of Labels for Active Learning with Sparse and Noisy Annotations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep edge intelligence aims to deploy deep learning models that demandcomputationally expensive training in the edge network with limitedcomputational power. Moreover, many deep edge intelligence applications requirehandling distributed data that cannot be transferred to a central server due toprivacy concerns. Decentralized learning methods, such as federated learning,offer solutions where models are learned collectively by exchanging learnedweights. However, they often require complex models that edge devices may nothandle and multiple rounds of network communication to achieve state-of-the-artperformances. This study proposes a convolutional ensemble learning approach,coined EdgeConvEns, that facilitates training heterogeneous weak models on edgeand learning to ensemble them where data on edge are heterogeneouslydistributed. Edge models are implemented and trained independently onField-Programmable Gate Array (FPGA) devices with various computationalcapacities. Learned data representations are transferred to a central serverwhere the ensemble model is trained with the learned features received from theedge devices to boost the overall prediction performance. Extensive experimentsdemonstrate that the EdgeConvEns can outperform the state-of-the-artperformance with fewer communications and less data in various trainingscenarios.", "output": "EdgeConvEns: Convolutional Ensemble Learning for Edge Intelligence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-Task Learning (MTL) aims to learn multiple tasks simultaneously whileexploiting their mutual relationships. By using shared resources tosimultaneously calculate multiple outputs, this learning paradigm has thepotential to have lower memory requirements and inference times compared to thetraditional approach of using separate methods for each task. Previous work inMTL has mainly focused on fully-supervised methods, as task relationships cannot only be leveraged to lower the level of data-dependency of those methodsbut they can also improve performance. However, MTL introduces a set ofchallenges due to a complex optimisation scheme and a higher labelingrequirement. This review focuses on how MTL could be utilised under differentpartial supervision settings to address these challenges. First, this reviewanalyses how MTL traditionally uses different parameter sharing techniques totransfer knowledge in between tasks. Second, it presents the differentchallenges arising from such a multi-objective optimisation scheme. Third, itintroduces how task groupings can be achieved by analysing task relationships.Fourth, it focuses on how partially supervised methods applied to MTL cantackle the aforementioned challenges. Lastly, this review presents theavailable datasets, tools and benchmarking results of such methods.", "output": "When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) collaboratively models user data in a decentralizedway. However, in the real world, non-identical and independent datadistributions (non-IID) among clients hinder the performance of FL due to threeissues, i.e., (1) the class statistics shifting, (2) the insufficienthierarchical information utilization, and (3) the inconsistency in aggregatingclients. To address the above issues, we propose HyperFed which contains threemain modules, i.e., hyperbolic prototype Tammes initialization (HPTI),hyperbolic prototype learning (HPL), and consistent aggregation (CA). Firstly,HPTI in the server constructs uniformly distributed and fixed class prototypes,and shares them with clients to match class statistics, further guidingconsistent feature representation for local clients. Secondly, HPL in eachclient captures the hierarchical information in local data with the supervisionof shared class prototypes in the hyperbolic model space. Additionally, CA inthe server mitigates the impact of the inconsistent deviations from clients toserver. Extensive studies of four datasets prove that HyperFed is effective inenhancing the performance of FL under the non-IID set.", "output": "HyperFed: Hyperbolic Prototypes Exploration with Consistent Aggregation for Non-IID Data in Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Decoding EEG signals for imagined speech is a challenging task due to thehigh-dimensional nature of the data and low signal-to-noise ratio. In recentyears, denoising diffusion probabilistic models (DDPMs) have emerged aspromising approaches for representation learning in various domains. Our studyproposes a novel method for decoding EEG signals for imagined speech usingDDPMs and a conditional autoencoder named Diff-E. Results indicate that Diff-Esignificantly improves the accuracy of decoding EEG signals for imagined speechcompared to traditional machine learning techniques and baseline models. Ourfindings suggest that DDPMs can be an effective tool for EEG signal decoding,with potential implications for the development of brain-computer interfacesthat enable communication through imagined speech.", "output": "Diff-E: Diffusion-based Learning for Decoding Imagined Speech EEG."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The isomorphism problem is a fundamental problem in network analysis, whichinvolves capturing both low-order and high-order structural information. Interms of extracting low-order structural information, graph isomorphismalgorithms analyze the structural equivalence to reduce the solver spacedimension, which demonstrates its power in many applications, such as proteindesign, chemical pathways, and community detection. For the more commonlyoccurring high-order relationships in real-life scenarios, the problem ofhypergraph isomorphism, which effectively captures these high-order structuralrelationships, cannot be straightforwardly addressed using graph isomorphismmethods. Besides, the existing hypergraph kernel methods may suffer from highmemory consumption or inaccurate sub-structure identification, thus yieldingsub-optimal performance. In this paper, to address the abovementioned problems,we first propose the hypergraph Weisfiler-Lehman test algorithm for thehypergraph isomorphism test problem by generalizing the Weisfiler-Lehman testalgorithm from graphs to hypergraphs. Secondly, based on the presentedalgorithm, we propose a general hypergraph Weisfieler-Lehman kernel frameworkand implement two instances, which are Hypergraph Weisfeiler-Lehamn SubtreeKernel and Hypergraph Weisfeiler-Lehamn Hyperedge Kernel. In order to fulfillour research objectives, a comprehensive set of experiments was meticulouslydesigned, including seven graph classification datasets and 12 hypergraphclassification datasets. Results on hypergraph classification datasets showsignificant improvements compared to other typical kernel-based methods, whichdemonstrates the effectiveness of the proposed methods. In our evaluation, wefound that our proposed methods outperform the second-best method in terms ofruntime, running over 80 times faster when handling complex hypergraphstructures.", "output": "Hypergraph Isomorphism Computation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, using neural networks to simulate spatio-temporal dynamics hasreceived a lot of attention. However, most existing methods adopt puredata-driven black-box models, which have limited accuracy and interpretability.By combining trainable difference operators with black-box models, we propose anew hybrid architecture explicitly embedded with partial prior knowledge of theunderlying PDEs named PDE-Net++. Furthermore, we introduce two distinct optionscalled the trainable flipping difference layer (TFDL) and the trainable dynamicdifference layer (TDDL) for the difference operators. Numerous numericalexperiments have demonstrated that PDE-Net++ has superior prediction accuracyand better extrapolation performance than black-box models.", "output": "Learning to simulate partially known spatio-temporal dynamics with trainable difference operators."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In machine learning, generative modeling aims to learn to generate new datastatistically similar to the training data distribution. In this paper, wesurvey learning generative models under limited data, few shots and zero shot,referred to as Generative Modeling under Data Constraint (GM-DC). This is animportant topic when data acquisition is challenging, e.g. healthcareapplications. We discuss background, challenges, and propose two taxonomies:one on GM-DC tasks and another on GM-DC approaches. Importantly, we studyinteractions between different GM-DC tasks and approaches. Furthermore, wehighlight research gaps, research trends, and potential avenues for futureexploration. Project website: ", "output": "A Survey on Generative Modeling with Limited Data, Few Shots, and Zero Shot."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In latest years, deep learning has gained a leading role in the pansharpeningof multiresolution images. Given the lack of ground truth data, most deeplearning-based methods carry out supervised training in a reduced-resolutiondomain. However, models trained on downsized images tend to perform poorly onhigh-resolution target images. For this reason, several research groups are nowturning to unsupervised training in the full-resolution domain, through thedefinition of appropriate loss functions and training paradigms. In thiscontext, we have recently proposed a full-resolution training framework whichcan be applied to many existing architectures.Here, we propose a new deep learning-based pansharpening model that fullyexploits the potential of this approach and provides cutting-edge performance.Besides architectural improvements with respect to previous work, such as theuse of residual attention modules, the proposed model features a novel lossfunction that jointly promotes the spectral and spatial quality of thepansharpened data. In addition, thanks to a new fine-tuning strategy, itimproves inference-time adaptation to target images. Experiments on a largevariety of test images, performed in challenging scenarios, demonstrate thatthe proposed method compares favorably with the state of the art both in termsof numerical results and visual output. Code is available online at", "output": "Unsupervised Deep Learning-based Pansharpening with Jointly-Enhanced Spectral and Spatial Fidelity."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is often useful to perform integration over learned functions representedby neural networks. However, this integration is usually performed numerically,as analytical integration over learned functions (especially neural networks)is generally viewed as intractable. In this work, we present a method forrepresenting the analytical integral of a learned function $f$. This allows theexact integral of a neural network to be computed, and enables constrainedneural networks to be parametrised by applying constraints directly to theintegral. Crucially, we also introduce a method to constrain $f$ to bepositive, a necessary condition for many applications (e.g. probabilitydistributions, distance metrics, etc). Finally, we introduce severalapplications where our fixed-integral neural network (FINN) can be utilised.", "output": "Fixed Integral Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Big data and machine learning tools have jointly empowered humans in makingdata-driven decisions. However, many of them capture empirical associationsthat might be spurious due to confounding factors and subgroup heterogeneity.The famous Simpson's paradox is such a phenomenon where aggregated andsubgroup-level associations contradict with each other, causing cognitiveconfusions and difficulty in making adequate interpretations and decisions.Existing tools provide little insights for humans to locate, reason about, andprevent pitfalls of spurious association in practice. We propose VISPUR, avisual analytic system that provides a causal analysis framework and ahuman-centric workflow for tackling spurious associations. These include aCONFOUNDER DASHBOARD, which can automatically identify possible confoundingfactors, and a SUBGROUP VIEWER, which allows for the visualization andcomparison of diverse subgroup patterns that likely or potentially result in amisinterpretation of causality. Additionally, we propose a REASONINGSTORYBOARD, which uses a flow-based approach to illustrate paradoxicalphenomena, as well as an interactive DECISION DIAGNOSIS panel that helps ensureaccountable decision-making. Through an expert interview and a controlled userexperiment, our qualitative and quantitative results demonstrate that theproposed \"de-paradox\" workflow and the designed visual analytic system areeffective in helping human users to identify and understand spuriousassociations, as well as to make accountable causal decisions.", "output": "VISPUR: Visual Aids for Identifying and Interpreting Spurious Associations in Data-Driven Decisions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Armoured vehicles are specialized and complex pieces of machinery designed tooperate in high-stress environments, often in combat or tactical situations.This study proposes a predictive maintenance-based ensemble system that aids inpredicting potential maintenance needs based on sensor data collected fromthese vehicles. The proposed model's architecture involves various models suchas Light Gradient Boosting, Random Forest, Decision Tree, Extra Tree Classifierand Gradient Boosting to predict the maintenance requirements of the vehiclesaccurately. In addition, K-fold cross validation, along with TOPSIS analysis,is employed to evaluate the proposed ensemble model's stability. The resultsindicate that the proposed system achieves an accuracy of 98.93%, precision of99.80% and recall of 99.03%. The algorithm can effectively predict maintenanceneeds, thereby reducing vehicle downtime and improving operational efficiency.Through comparisons between various algorithms and the suggested ensemble, thisstudy highlights the potential of machine learning-based predictive maintenancesolutions.", "output": "Predictive Maintenance of Armoured Vehicles using Machine Learning Approaches."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent work has proposed and explored using coreset techniques for quantumalgorithms that operate on classical data sets to accelerate the applicabilityof these algorithms on near-term quantum devices. We apply these ideas toQuantum Boltzmann Machines (QBM) where gradient-based steps which require Gibbsstate sampling are the main computational bottleneck during training. By usinga coreset in place of the full data set, we try to minimize the number of stepsneeded and accelerate the overall training time. In a regime wherecomputational time on quantum computers is a precious resource, we propose thismight lead to substantial practical savings. We evaluate this approach on 6x6binary images from an augmented bars and stripes data set using a QBM with 36visible units and 8 hidden units. Using an Inception score inspired metric, wecompare QBM training times with and without using coresets.", "output": "Training Quantum Boltzmann Machines with Coresets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent work has shown that Machine Learning (ML) programs are error-prone andcalled for contracts for ML code. Contracts, as in the design by contractmethodology, help document APIs and aid API users in writing correct code. Thequestion is: what kinds of contracts would provide the most help to API users?We are especially interested in what kinds of contracts help API users catcherrors at earlier stages in the ML pipeline. We describe an empirical study ofposts on Stack Overflow of the four most often-discussed ML libraries:TensorFlow, Scikit-learn, Keras, and PyTorch. For these libraries, our studyextracted 413 informal (English) API specifications. We used thesespecifications to understand the following questions. What are the root causesand effects behind ML contract violations? Are there common patterns of MLcontract violations? When does understanding ML contracts require an advancedlevel of ML software expertise? Could checking contracts at the API level helpdetect the violations in early ML pipeline stages? Our key findings are thatthe most commonly needed contracts for ML APIs are either checking constraintson single arguments of an API or on the order of API calls. The softwareengineering community could employ existing contract mining approaches to minethese contracts to promote an increased understanding of ML APIs. We also noteda need to combine behavioral and temporal contract mining approaches. We reporton categories of required ML contracts, which may help designers of contractlanguages.", "output": "What Kinds of Contracts Do ML APIs Need?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we bound a machine's ability to learn based on computationallimitations implied by physicality. We start by considering the informationprocessing capacity (IPC), a normalized measure of the expected squared errorof a collection of signals to a complete basis of functions. We use the IPC tomeasure the degradation under noise of the performance of reservoir computers,a particular kind of recurrent network, when constrained by physicalconsiderations. First, we show that the IPC is at most a polynomial in thesystem size $n$, even when considering the collection of $2^n$ possiblepointwise products of the $n$ output signals. Next, we argue that thisdegradation implies that the family of functions represented by the reservoirrequires an exponential number of samples to learn in the presence of thereservoir's noise. Finally, we conclude with a discussion of the performance ofthe same collection of $2^n$ functions without noise when being used for binaryclassification.", "output": "Limits to Reservoir Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Purpose: This study evaluated the out-of-domain performance andgeneralization capabilities of automated medical image segmentation models,with a particular focus on adaptation to new image acquisitions and diseasetype.Materials: Datasets from both non-contrast and contrast-enhanced abdominal CTscans of healthy patients and those with polycystic kidney disease (PKD) wereused. A total of 400 images (100 non-contrast controls, 100 contrast controls,100 non-contrast PKD, 100 contrast PKD) were utilized for training/validationof models to segment kidneys, livers, and spleens, and the final models werethen tested on 100 non-contrast CT images of patients affected by PKD.Performance was evaluated using Dice, Jaccard, TPR, and Precision.Results: Models trained on a diverse range of data showed no worseperformance than models trained exclusively on in-domain data when tested onin-domain data. For instance, the Dice similarity of the model trained on 25%from each dataset was found to be non-inferior to the model trained purely onin-domain data.Conclusions: The results indicate that broader training examplessignificantly enhances model generalization and out-of-domain performance,thereby improving automated segmentation tools' applicability in clinicalsettings. The study's findings provide a roadmap for future research to adopt adata-centric approach in medical image AI model development.", "output": "Role of Image Acquisition and Patient Phenotype Variations in Automatic Segmentation Model Generalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graphs are a representation of structured data that captures therelationships between sets of objects. With the ubiquity of available networkdata, there is increasing industrial and academic need to quickly analyzegraphs with billions of nodes and trillions of edges. A common first step fornetwork understanding is Graph Embedding, the process of creating a continuousrepresentation of nodes in a graph. A continuous representation is often moreamenable, especially at scale, for solving downstream machine learning taskssuch as classification, link prediction, and clustering. A high-performancegraph embedding architecture leveraging Tensor Processing Units (TPUs) withconfigurable amounts of high-bandwidth memory is presented that simplifies thegraph embedding problem and can scale to graphs with billions of nodes andtrillions of edges. We verify the embedding space quality on real and syntheticlarge-scale datasets.", "output": "HUGE: Huge Unsupervised Graph Embeddings with TPUs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This study introduces and empirically tests a novel predictive model fordigital information engagement (IE) - the READ model, an acronym for the fourpivotal attributes of engaging information: Representativeness, Ease-of-use,Affect, and Distribution. Conceptualized within the theoretical framework ofCumulative Prospect Theory, the model integrates key cognitive biases withcomputational linguistics and natural language processing to develop amultidimensional perspective on information engagement. A rigorous testingprotocol was implemented, involving 50 randomly selected pairs of synonymouswords (100 words in total) from the WordNet database. These words' engagementlevels were evaluated through a large-scale online survey (n = 80,500) toderive empirical IE metrics. The READ attributes for each word were thencomputed and their predictive efficacy examined. The findings affirm the READmodel's robustness, accurately predicting a word's IE level and distinguishingthe more engaging word from a pair of synonyms with an 84% accuracy rate. TheREAD model's potential extends across various domains, including business,education, government, and healthcare, where it could enhance contentengagement and inform AI language model development and generative text work.Future research should address the model's scalability and adaptability acrossdifferent domains and languages, thereby broadening its applicability andefficacy.", "output": "A Predictive Model of Digital Information Engagement: Forecasting User Engagement With English Words by Incorporating Cognitive Biases, Computational Linguistics and Natural Language Processing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent work in the field of speech enhancement (SE) has involved the use ofself-supervised speech representations (SSSRs) as feature transformations inloss functions. However, in prior work, very little attention has been paid tothe relationship between the language of the audio used to train theself-supervised representation and that used to train the SE system.Enhancement models trained using a loss function which incorporates aself-supervised representation that shares exactly the language of the noisydata used to train the SE system show better performance than those which donot match exactly. This may lead to enhancement systems which are languagespecific and as such do not generalise well to unseen languages, unlike modelstrained using traditional spectrogram or time domain loss functions. In thiswork, SE models are trained and tested on a number of different languages, withself-supervised representations which themselves are trained using differentlanguage combinations and with differing network structures as loss functionrepresentations. These models are then tested across unseen languages and theirperformances are analysed. It is found that the training language of theself-supervised representation appears to have a minor effect on enhancementperformance, the amount of training data of a particular language, however,greatly affects performance.", "output": "The Effect of Spoken Language on Speech Enhancement using Self-Supervised Speech Representation Loss Functions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Rapid growth of applying Machine Learning (ML) in different domains,especially in safety-critical areas, increases the need for reliable MLcomponents, i.e., a software component operating based on ML. Understanding thebugs characteristics and maintenance challenges in ML-based systems can helpdevelopers of these systems to identify where to focus maintenance and testingefforts, by giving insights into the most error-prone components, most commonbugs, etc. In this paper, we investigate the characteristics of bugs inML-based software systems and the difference between ML and non-ML bugs fromthe maintenance viewpoint. We extracted 447,948 GitHub repositories that usedone of the three most popular ML frameworks, i.e., TensorFlow, Keras, andPyTorch. After multiple filtering steps, we select the top 300 repositorieswith the highest number of closed issues. We manually investigate the extractedrepositories to exclude non-ML-based systems. Our investigation involved amanual inspection of 386 sampled reported issues in the identified ML-basedsystems to indicate whether they affect ML components or not. Our analysisshows that nearly half of the real issues reported in ML-based systems are MLbugs, indicating that ML components are more error-prone than non-MLcomponents. Next, we thoroughly examined 109 identified ML bugs to identifytheir root causes, symptoms, and calculate their required fixing time. Theresults also revealed that ML bugs have significantly different characteristicscompared to non-ML bugs, in terms of the complexity of bug-fixing (number ofcommits, changed files, and changed lines of code). Based on our results,fixing ML bugs are more costly and ML components are more error-prone, comparedto non-ML bugs and non-ML components respectively. Hence, paying a significantattention to the reliability of the ML components is crucial in ML-basedsystems.", "output": "Bug Characterization in Machine Learning-based Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper details the challenges in applying two computer vision systems, anEfficientDET supervised learning model and the unsupervised RX spectralclassifier, to 98.9 GB of drone imagery from the Wu-Murad wilderness search andrescue (WSAR) effort in Japan and identifies 3 directions for future research.There have been at least 19 proposed approaches and 3 datasets aimed atlocating missing persons in drone imagery, but only 3 approaches (2unsupervised and 1 of an unknown structure) are referenced in the literature ashaving been used in an actual WSAR operation. Of these proposed approaches, theEfficientDET architecture and the unsupervised spectral RX classifier wereselected as the most appropriate for this setting. The EfficientDET model wasapplied to the HERIDAL dataset and despite achieving performance that isstatistically equivalent to the state-of-the-art, the model fails to translateto the real world in terms of false positives (e.g., identifying tree limbs androcks as people), and false negatives (e.g., failing to identify members of thesearch team). The poor results in practice for algorithms that showed goodresults on datasets suggest 3 areas of future research: more realistic datasetsfor wilderness SAR, computer vision models that are capable of seamlesslyhandling the variety of imagery that can be collected during actual WSARoperations, and better alignment on performance measures.", "output": "Open Problems in Computer Vision for Wilderness SAR and The Search for Patricia Wu-Murad."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Here we develop variants of SGD (stochastic gradient descent) with anadaptive step size that make use of the sampled loss values. In particular, wefocus on solving a finite sum-of-terms problem, also known as empirical riskminimization. We first detail an idealized adaptive method called$texttt{SPS}_+$ that makes use of the sampled loss values and assumesknowledge of the sampled loss at optimality. This $texttt{SPS}_+$ is a minormodification of the SPS (Stochastic Polyak Stepsize) method, where the stepsize is enforced to be positive. We then show that $texttt{SPS}_+$ achievesthe best known rates of convergence for SGD in the Lipschitz non-smooth. Wethen move onto to develop $texttt{FUVAL}$, a variant of $texttt{SPS}_+$ wherethe loss values at optimality are gradually learned, as opposed to being given.We give three viewpoints of $texttt{FUVAL}$, as a projection based method, asa variant of the prox-linear method, and then as a particular online SGDmethod. We then present a convergence analysis of $texttt{FUVAL}$ andexperimental results. The shortcomings of our work is that the convergenceanalysis of $texttt{FUVAL}$ shows no advantage over SGD. Another shortcommingis that currently only the full batch version of $texttt{FUVAL}$ shows a minoradvantages of GD (Gradient Descent) in terms of sensitivity to the step size.The stochastic version shows no clear advantage over SGD. We conjecture thatlarge mini-batches are required to make $texttt{FUVAL}$ competitive.Currently the new $texttt{FUVAL}$ method studied in this paper does notoffer any clear theoretical or practical advantage. We have chosen to make thisdraft available online nonetheless because of some of the analysis techniqueswe use, such as the non-smooth analysis of $texttt{SPS}_+$, and also to showan apparently interesting approach that currently does not work.", "output": "Function Value Learning: Adaptive Learning Rates Based on the Polyak Stepsize and Function Splitting in ERM."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Community detection is one of the most critical problems in modern networkscience. Its applications can be found in various fields, from protein modelingto social network analysis. Recently, many papers appeared studying the problemof overlapping community detection, where each node of a network may belong toseveral communities. In this work, we consider Mixed-Membership StochasticBlock Model (MMSB) first proposed by Airoldi et al. (2008). MMSB provides quitea general setting for modeling overlapping community structure in graphs. Thecentral question of this paper is to reconstruct relations between communitiesgiven an observed network. We compare different approaches and establish theminimax lower bound on the estimation error. Then, we propose a new estimatorthat matches this lower bound. Theoretical results are proved under fairlygeneral conditions on the considered model. Finally, we illustrate the theoryin a series of experiments.", "output": "Optimal Estimation in Mixed-Membership Stochastic Block Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Wide neural networks are biased towards learning certain functions,influencing both the rate of convergence of gradient descent (GD) and thefunctions that are reachable with GD in finite training time. As such, there isa great need for methods that can modify this bias according to the task athand. To that end, we introduce Modified Spectrum Kernels (MSKs), a novelfamily of constructed kernels that can be used to approximate kernels withdesired eigenvalues for which no closed form is known. We leverage the dualitybetween wide neural networks and Neural Tangent Kernels and propose apreconditioned gradient descent method, which alters the trajectory of GD. As aresult, this allows for a polynomial and, in some cases, exponential trainingspeedup without changing the final solution. Our method is both computationallyefficient and simple to implement.", "output": "Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel's Spectrum."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents an efficient algorithm to solve the sleeping bandit withmultiple plays problem in the context of an online recommendation system. Theproblem involves bounded, adversarial loss and unknown i.i.d. distributions forarm availability. The proposed algorithm extends the sleeping bandit algorithmfor single arm selection and is guaranteed to achieve theoretical performancewith regret upper bounded by $bigO(kN^2sqrt{Tlog T})$, where $k$ is thenumber of arms selected per time step, $N$ is the total number of arms, and $T$is the time horizon.", "output": "Adversarial Sleeping Bandit Problems with Multiple Plays: Algorithm and Ranking Application."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Relational tables, where each row corresponds to an entity and each columncorresponds to an attribute, have been the standard for tables in relationaldatabases. However, such a standard cannot be taken for granted when dealingwith tables \"in the wild\". Our survey of real spreadsheet-tables and web-tablesshows that over 30% of such tables do not conform to the relational standard,for which complex table-restructuring transformations are needed before thesetables can be queried easily using SQL-based analytics tools. Unfortunately,the required transformations are non-trivial to program, which has become asubstantial pain point for technical and non-technical users alike, asevidenced by large numbers of forum questions in places like StackOverflow andExcel/Tableau forums.We develop an Auto-Tables system that can automatically synthesize pipelineswith multi-step transformations (in Python or other languages), to transformnon-relational tables into standard relational forms for downstream analytics,obviating the need for users to manually program transformations. We compile anextensive benchmark for this new task, by collecting 194 real test cases fromuser spreadsheets and online forums. Our evaluation suggests that Auto-Tablescan successfully synthesize transformations for over 70% of test cases atinteractive speeds, without requiring any input from users, making this aneffective tool for both technical and non-technical users to prepare data foranalytics.", "output": "Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While reinforcement learning algorithms have had great success in the fieldof autonomous navigation, they cannot be straightforwardly applied to the realautonomous systems without considering the safety constraints. The later arecrucial to avoid unsafe behaviors of the autonomous vehicle on the road. Tohighlight the importance of these constraints, in this study, we compare twolearnable navigation policies: safe and unsafe. The safe policy takes theconstraints into account, while the other does not. We show that the safepolicy is able to generate trajectories with more clearance (distance to theobstacles) and makes less collisions while training without sacrificing theoverall performance.", "output": "Evaluation of Safety Constraints in Autonomous Navigation with Deep Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The UNet architecture, based on Convolutional Neural Networks (CNN), hasdemonstrated its remarkable performance in medical image analysis. However, itfaces challenges in capturing long-range dependencies due to the limitedreceptive fields and inherent bias of convolutional operations. Recently,numerous transformer-based techniques have been incorporated into the UNetarchitecture to overcome this limitation by effectively capturing globalfeature correlations. However, the integration of the Transformer modules mayresult in the loss of local contextual information during the global featurefusion process. To overcome these challenges, we propose a 2D medical imagesegmentation model called Multi-scale Cross Perceptron Attention Network(MCPA). The MCPA consists of three main components: an encoder, a decoder, anda Cross Perceptron. The Cross Perceptron first captures the local correlationsusing multiple Multi-scale Cross Perceptron modules, facilitating the fusion offeatures across scales. The resulting multi-scale feature vectors are thenspatially unfolded, concatenated, and fed through a Global Perceptron module tomodel global dependencies. Furthermore, we introduce a Progressive Dual-branchStructure to address the semantic segmentation of the image involving finertissue structures. This structure gradually shifts the segmentation focus ofMCPA network training from large-scale structural features to moresophisticated pixel-level features. We evaluate our proposed MCPA model onseveral publicly available medical image datasets from different tasks anddevices, including the open large-scale dataset of CT (Synapse), MRI (ACDC),fundus camera (DRIVE, CHASE_DB1, HRF), and OCTA (ROSE). The experimentalresults show that our MCPA model achieves state-of-the-art performance. Thecode is available at", "output": "MCPA: Multi-scale Cross Perceptron Attention Network for 2D Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Traffic forecasting, which aims to predict traffic conditions based onhistorical observations, has been an enduring research topic and is widelyrecognized as an essential component of intelligent transportation. Recentproposals on Spatial-Temporal Graph Neural Networks (STGNNs) have madesignificant progress by combining sequential models with graph convolutionnetworks. However, due to high complexity issues, STGNNs only focus onshort-term traffic forecasting, e.g., 1-hour forecasting, while ignoring morepractical long-term forecasting. In this paper, we make the first attempt toexplore long-term traffic forecasting, e.g., 1-day forecasting. To this end, wefirst reveal its unique challenges in exploiting multi-scale representations.Then, we propose a novel Hierarchical U-net TransFormer (HUTFormer) to addressthe issues of long-term traffic forecasting. HUTFormer consists of ahierarchical encoder and decoder to jointly generate and utilize multi-scalerepresentations of traffic data. Specifically, for the encoder, we proposewindow self-attention and segment merging to extract multi-scalerepresentations from long-term traffic data. For the decoder, we design across-scale attention mechanism to effectively incorporate multi-scalerepresentations. In addition, HUTFormer employs an efficient input embeddingstrategy to address the complexity issues. Extensive experiments on fourtraffic datasets show that the proposed HUTFormer significantly outperformsstate-of-the-art traffic forecasting and long time series forecastingbaselines.", "output": "HUTFormer: Hierarchical U-Net Transformer for Long-Term Traffic Forecasting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent approaches in source separation leverage semantic information abouttheir input mixtures and constituent sources that when used in conditionalseparation models can achieve impressive performance. Most approaches alongthese lines have focused on simple descriptions, which are not always usefulfor varying types of input mixtures. In this work, we present an approach inwhich a model, given an input mixture and partial semantic information about atarget source, is trained to extract additional semantic data. We then leveragethis pre-trained model to improve the separation performance of an uncoupledmulti-conditional separation network. Our experiments demonstrate that theseparation performance of this multi-conditional model is significantlyimproved, approaching the performance of an oracle model with complete semanticinformation. Furthermore, our approach achieves performance levels that arecomparable to those of the best performing specialized single conditionalmodels, thus providing an easier to use alternative.", "output": "Complete and separate: Conditional separation with missing target source attribute completion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Augmentation techniques and sampling strategies are crucial in contrastivelearning, but in most existing works, augmentation techniques require carefuldesign, and their sampling strategies can only capture a small amount ofintrinsic supervision information. Additionally, the existing methods requirecomplex designs to obtain two different representations of the data. Toovercome these limitations, we propose a novel framework called theSelf-Contrastive Graph Diffusion Network (SCGDN). Our framework consists of twomain components: the Attentional Module (AttM) and the Diffusion Module (DiFM).AttM aggregates higher-order structure and feature information to get anexcellent embedding, while DiFM balances the state of each node in the graphthrough Laplacian diffusion learning and allows the cooperative evolution ofadjacency and feature information in the graph. Unlike existing methodologies,SCGDN is an augmentation-free approach that avoids \"sampling bias\" and semanticdrift, without the need for pre-training. We conduct a high-quality sampling ofsamples based on structure and feature information. If two nodes are neighbors,they are considered positive samples of each other. If two disconnected nodesare also unrelated on $k$NN graph, they are considered negative samples foreach other. The contrastive objective reasonably uses our proposed samplingstrategies, and the redundancy reduction term minimizes redundant informationin the embedding and can well retain more discriminative information. In thisnovel framework, the graph self-contrastive learning paradigm gives expressionto a powerful force. SCGDN effectively balances between preserving high-orderstructure information and avoiding overfitting. The results manifest that SCGDNcan consistently generate outperformance over both the contrastive methods andthe classical methods.", "output": "Self-Contrastive Graph Diffusion Network."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a theoretical framework for studying the imitation of stochastic,non-Markovian, potentially multi-modal (i.e. \"complex\" ) expert demonstrationsin nonlinear dynamical systems. Our framework invokes low-level controllers -either learned or implicit in position-command control - to stabilize imitationpolicies around expert demonstrations. We show that with (a) a suitablelow-level stability guarantee and (b) a stochastic continuity property of thelearned policy we call \"total variation continuity\" (TVC), an imitator thataccurately estimates actions on the demonstrator's state distribution closelymatches the demonstrator's distribution over entire trajectories. We then showthat TVC can be ensured with minimal degradation of accuracy by combining apopular data-augmentation regimen with a novel algorithmic trick: addingaugmentation noise at execution time. We instantiate our guarantees forpolicies parameterized by diffusion models and prove that if the learneraccurately estimates the score of the (noise-augmented) expert policy, then thedistribution of imitator trajectories is close to the demonstrator distributionin a natural optimal transport distance. Our analysis constructs intricatecouplings between noise-augmented trajectories, a technique that may be ofindependent interest. We conclude by empirically validating our algorithmicrecommendations.", "output": "Imitating Complex Trajectories: Bridging Low-Level Stability and High-Level Behavior."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the field of phase change phenomena, the lack of accessible and diversedatasets suitable for machine learning (ML) training poses a significantchallenge. Existing experimental datasets are often restricted, with limitedavailability and sparse ground truth data, impeding our understanding of thiscomplex multi-physics phenomena. To bridge this gap, we present the BubbleMLDataset( which leverages physics-drivensimulations to provide accurate ground truth information for various boilingscenarios, encompassing nucleate pool boiling, flow boiling, and sub-cooledboiling. This extensive dataset covers a wide range of parameters, includingvarying gravity conditions, flow rates, sub-cooling levels, and wall superheat,comprising 51 simulations. BubbleML is validated against experimentalobservations and trends, establishing it as an invaluable resource for MLresearch. Furthermore, we showcase its potential to facilitate exploration ofdiverse downstream tasks by introducing two benchmarks: (a) optical flowanalysis to capture bubble dynamics, and (b) operator networks for learningtemperature dynamics. The BubbleML dataset and its benchmarks serve as acatalyst for advancements in ML-driven research on multi-physics phase changephenomena, enabling the development and comparison of state-of-the-arttechniques and models.", "output": "BubbleML: A Multi-Physics Dataset and Benchmarks for Machine Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "AB testing aids business operators with their decision making, and isconsidered the gold standard method for learning from data to improve digitaluser experiences. However, there is usually a gap between the requirements ofpractitioners, and the constraints imposed by the statistical hypothesistesting methodologies commonly used for analysis of AB tests. These include thelack of statistical power in multivariate designs with many factors,correlations between these factors, the need of sequential testing for earlystopping, and the inability to pool knowledge from past tests. Here, we proposea solution that applies hierarchical Bayesian estimation to address the abovelimitations. In comparison to current sequential AB testing methodology, weincrease statistical power by exploiting correlations between factors, enablingsequential testing and progressive early stopping, without incurring excessivefalse positive risk. We also demonstrate how this methodology can be extendedto enable the extraction of composite global learnings from past AB tests, toaccelerate future tests. We underpin our work with a solid theoreticalframework that articulates the value of hierarchical estimation. We demonstrateits utility using both numerical simulations and a large set of real-world ABtests. Together, these results highlight the practical value of our approachfor statistical inference in the technology industry.", "output": "Rapid and Scalable Bayesian AB Testing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With advances in generative artificial intelligence (AI), it is now possibleto produce realistic-looking automated reports for preliminary reads ofradiology images. This can expedite clinical workflows, improve accuracy andreduce overall costs. However, it is also well-known that such models oftenhallucinate, leading to false findings in the generated reports. In this paper,we propose a new method of fact-checking of AI-generated reports using theirassociated images. Specifically, the developed examiner differentiates real andfake sentences in reports by learning the association between an image andsentences describing real or potentially fake findings. To train such anexaminer, we first created a new dataset of fake reports by perturbing thefindings in the original ground truth radiology reports associated with images.Text encodings of real and fake sentences drawn from these reports are thenpaired with image encodings to learn the mapping to real/fake labels. Theutility of such an examiner is demonstrated for verifying automaticallygenerated reports by detecting and removing fake sentences. Future generativeAI approaches can use the resulting tool to validate their reports leading to amore responsible use of AI in expediting clinical workflows.", "output": "Fact-Checking of AI-Generated Reports."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We prove that black-box variational inference (BBVI) with control variates,particularly the sticking-the-landing (STL) estimator, converges at a geometric(traditionally called \"linear\") rate under perfect variational familyspecification. In particular, we prove a quadratic bound on the gradientvariance of the STL estimator, one which encompasses misspecified variationalfamilies. Combined with previous works on the quadratic variance condition,this directly implies convergence of BBVI with the use of projected stochasticgradient descent. We also improve existing analysis on the regular closed-formentropy gradient estimators, which enables comparison against the STL estimatorand provides explicit non-asymptotic complexity guarantees for both.", "output": "Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "How to accurately measure the relevance and redundancy of features is anage-old challenge in the field of feature selection. However, existingfilter-based feature selection methods cannot directly measure redundancy forcontinuous data. In addition, most methods rely on manually specifying thenumber of features, which may introduce errors in the absence of expertknowledge. In this paper, we propose a non-parametric feature selectionalgorithm based on maximum inter-class variation and minimum redundancy,abbreviated as MVMR-FS. We first introduce supervised and unsupervised kerneldensity estimation on the features to capture their similarities anddifferences in inter-class and overall distributions. Subsequently, we presentthe criteria for maximum inter-class variation and minimum redundancy (MVMR),wherein the inter-class probability distributions are employed to reflectfeature relevance and the distances between overall probability distributionsare used to quantify redundancy. Finally, we employ an AGA to search for thefeature subset that minimizes the MVMR. Compared with ten state-of-the-artmethods, MVMR-FS achieves the highest average accuracy and improves theaccuracy by 5% to 11%.", "output": "MVMR-FS : Non-parametric feature selection algorithm based on Maximum inter-class Variation and Minimum Redundancy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we study the denoising diffusion probabilistic model (DDPM) inwavelet space, instead of pixel space, for visual synthesis. Considering thewavelet transform represents the image in spatial and frequency domains, wecarefully design a novel architecture SFUNet to effectively capture thecorrelation for both domains. Specifically, in the standard denoising U-Net forpixel data, we supplement the 2D convolutions and spatial-only attention layerswith our spatial frequency-aware convolution and attention modules to jointlymodel the complementary information from spatial and frequency domains inwavelet data. Our new architecture can be used as a drop-in replacement to thepixel-based network and is compatible with the vanilla DDPM training process.By explicitly modeling the wavelet signals, we find our model is able togenerate images with higher quality on CIFAR-10, FFHQ, LSUN-Bedroom, andLSUN-Church datasets, than the pixel-based counterpart.", "output": "Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "State-of-the-art neural networks require extreme computational power totrain. It is therefore natural to wonder whether they are optimally trained.Here we apply a recent advancement in stochastic thermodynamics which allowsbounding the speed at which one can go from the initial weight distribution tothe final distribution of the fully trained network, based on the ratio oftheir Wasserstein-2 distance and the entropy production rate of the dynamicalprocess connecting them. Considering both gradient-flow and Langevin trainingdynamics, we provide analytical expressions for these speed limits for linearand linearizable neural networks e.g. Neural Tangent Kernel (NTK). Remarkably,given some plausible scaling assumptions on the NTK spectra and spectraldecomposition of the labels -- learning is optimal in a scaling sense. Ourresults are consistent with small-scale experiments with Convolutional NeuralNetworks (CNNs) and Fully Connected Neural networks (FCNs) on CIFAR-10, showinga short highly non-optimal regime followed by a longer optimal regime.", "output": "Speed Limits for Deep Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Heatwaves and bushfires cause substantial impacts on society and ecosystemsacross the globe. Accurate information of heat extremes is needed to supportthe development of actionable mitigation and adaptation strategies. Regionalclimate models are commonly used to better understand the dynamics of theseevents. These models have very large input parameter sets, and the parameterswithin the physics schemes substantially influence the model's performance.However, parameter sensitivity analysis (SA) of regional models for heatextremes is largely unexplored. Here, we focus on the southeast Australianregion, one of the global hotspots of heat extremes. In southeast AustraliaWeather Research and Forecasting (WRF) model is the widely used regional modelto simulate extreme weather events across the region. Hence in this study, wefocus on the sensitivity of WRF model parameters to surface meteorologicalvariables such as temperature, relative humidity, and wind speed during twoextreme heat events over southeast Australia. Due to the presence of multipleparameters and their complex relationship with output variables, a machinelearning (ML) surrogate-based global sensitivity analysis method is consideredfor the SA. The ML surrogate-based Sobol SA is used to identify the sensitivityof 24 adjustable parameters in seven different physics schemes of the WRFmodel. Results show that out of these 24, only three parameters, namely thescattering tuning parameter, multiplier of saturated soil water content, andprofile shape exponent in the momentum diffusivity coefficient, are importantfor the considered meteorological variables. These SA results are consistentfor the two different extreme heat events. Further, we investigated thephysical significance of sensitive parameters. This study's results will helpin further optimising WRF parameters to improve model simulation.", "output": "Machine Learning based Parameter Sensitivity of Regional Climate Models -- A Case Study of the WRF Model for Heat Extremes over Southeast Australia."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many studies have proposed machine-learning (ML) models for malware detectionand classification, reporting an almost-perfect performance. However, theyassemble ground-truth in different ways, use diverse static- anddynamic-analysis techniques for feature extraction, and even differ on whatthey consider a malware family. As a consequence, our community still lacks anunderstanding of malware classification results: whether they are tied to thenature and distribution of the collected dataset, to what extent the number offamilies and samples in the training dataset influence performance, and howwell static and dynamic features complement each other.This work sheds light on those open questions. by investigating the keyfactors influencing ML-based malware detection and classification. For this, wecollect the largest balanced malware dataset so far with 67K samples from 670families (100 samples each), and train state-of-the-art models for malwaredetection and family classification using our dataset. Our results reveal thatstatic features perform better than dynamic features, and that combining bothonly provides marginal improvement over static features. We discover nocorrelation between packing and classification accuracy, and that missingbehaviors in dynamically-extracted features highly penalize their performance.We also demonstrate how a larger number of families to classify make theclassification harder, while a higher number of samples per family increasesaccuracy. Finally, we find that models trained on a uniform distribution ofsamples per family better generalize on unseen data.", "output": "Decoding the Secrets of Machine Learning in Malware Classification: A Deep Dive into Datasets, Feature Extraction, and Model Performance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Algorithmic fairness has been a serious concern and received lots of interestin machine learning community. In this paper, we focus on the bipartite rankingscenario, where the instances come from either the positive or negative classand the goal is to learn a ranking function that ranks positive instanceshigher than negative ones. While there could be a trade-off between fairnessand performance, we propose a model agnostic post-processing framework xOrderfor achieving fairness in bipartite ranking and maintaining the algorithmclassification performance. In particular, we optimize a weighted sum of theutility as identifying an optimal warping path across different protectedgroups and solve it through a dynamic programming process. xOrder is compatiblewith various classification models and ranking fairness metrics, includingsupervised and unsupervised fairness metrics. In addition to binary groups,xOrder can be applied to multiple protected groups. We evaluate our proposedalgorithm on four benchmark data sets and two real-world patient electronichealth record repositories. xOrder consistently achieves a better balancebetween the algorithm utility and ranking fairness on a variety of datasetswith different metrics. From the visualization of the calibrated rankingscores, xOrder mitigates the score distribution shifts of different groupscompared with baselines. Moreover, additional analytical results verify thatxOrder achieves a robust performance when faced with fewer samples and a biggerdifference between training and testing ranking score distributions.", "output": "Bipartite Ranking Fairness through a Model Agnostic Ordering Adjustment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The ever-growing use of wind energy makes necessary the optimization ofturbine operations through pitch angle controllers and their maintenance withearly fault detection. It is crucial to have accurate and robust modelsimitating the behavior of wind turbines, especially to predict the generatedpower as a function of the wind speed. Existing empirical and physics-basedmodels have limitations in capturing the complex relations between the inputvariables and the power, aggravated by wind variability. Data-driven methodsoffer new opportunities to enhance wind turbine modeling of large datasets byimproving accuracy and efficiency. In this study, we used physics-informedneural networks to reproduce historical data coming from 4 turbines in a windfarm, while imposing certain physical constraints to the model. The developedmodels for regression of the power, torque, and power coefficient as outputvariables showed great accuracy for both real data and physical equationsgoverning the system. Lastly, introducing an efficient evidential layerprovided uncertainty estimations of the predictions, proved to be consistentwith the absolute error, and made possible the definition of a confidenceinterval in the power curve.", "output": "Prediction of wind turbines power with physics-informed neural networks and evidential uncertainty quantification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Time series forecasting lies at the core of important real-world applicationsin many fields of science and engineering. The abundance of large time seriesdatasets that consist of complex patterns and long-term dependencies has led tothe development of various neural network architectures. Graph neural networkapproaches, which jointly learn a graph structure based on the correlation ofraw values of multivariate time series while forecasting, have recently seengreat success. However, such solutions are often costly to train and difficultto scale. In this paper, we propose TimeGNN, a method that learns dynamictemporal graph representations that can capture the evolution of inter-seriespatterns along with the correlations of multiple series. TimeGNN achievesinference times 4 to 80 times faster than other state-of-the-art graph-basedmethods while achieving comparable forecasting performance", "output": "TimeGNN: Temporal Dynamic Graph Learning for Time Series Forecasting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "To ensure the reliable use of classification systems in medical applications,it is crucial to prevent silent failures. This can be achieved by eitherdesigning classifiers that are robust enough to avoid failures in the firstplace, or by detecting remaining failures using confidence scoring functions(CSFs). A predominant source of failures in image classification isdistribution shifts between training data and deployment data. To understandthe current state of silent failure prevention in medical imaging, we conductthe first comprehensive analysis comparing various CSFs in four biomedicaltasks and a diverse range of distribution shifts. Based on the result that noneof the benchmarked CSFs can reliably prevent silent failures, we conclude thata deeper understanding of the root causes of failures in the data is required.To facilitate this, we introduce SF-Visuals, an interactive analysis tool thatuses latent space clustering to visualize shifts and failures. On the basis ofvarious examples, we demonstrate how this tool can help researchers gaininsight into the requirements for safe application of classification systems inthe medical domain. The open-source benchmark and tool are at:", "output": "Understanding Silent Failures in Medical Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Complex interactions between two opposing agents frequently occur in domainsof machine learning, game theory, and other application domains. Quantitativelyanalyzing the strategies involved can provide an objective basis fordecision-making. One such critical scenario is shot-taking in football, wheredecisions, such as whether the attacker should shoot or pass the ball andwhether the defender should attempt to block the shot, play a crucial role inthe outcome of the game. However, there are currently no effective data-drivenand/or theory-based approaches to analyzing such situations. To address thisissue, we proposed a novel framework to analyze such scenarios based on gametheory, where we estimate the expected payoff with machine learning (ML)models, and additional features for ML models were extracted with atheory-based shot block model. Conventionally, successes or failures (1 or 0)are used as payoffs, while a success shot (goal) is extremely rare in football.Therefore, we proposed the Expected Probability of Shot On Target (xSOT) metricto evaluate players' actions even if the shot results in no goal; this allowsfor effective differentiation and comparison between different shots and evenenables counterfactual shot situation analysis. In our experiments, we havevalidated the framework by comparing it with baseline and ablated models.Furthermore, we have observed a high correlation between the xSOT and existingmetrics. This alignment of information suggests that xSOT provides valuableinsights. Lastly, as an illustration, we studied optimal strategies in theWorld Cup 2022 and analyzed a shot situation in EURO 2020.", "output": "A Strategic Framework for Optimal Decisions in Football 1-vs-1 Shot-Taking Situations: An Integrated Approach of Machine Learning, Theory-Based Modeling, and Game Theory."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Semantic inpainting or image completion alludes to the task of inferringarbitrary large missing regions in images based on image semantics. Since theprediction of image pixels requires an indication of high-level context, thismakes it significantly tougher than image completion, which is often moreconcerned with correcting data corruption and removing entire objects from theinput image. On the other hand, image enhancement attempts to eliminateunwanted noise and blur from the image, along with sustaining most of the imagedetails. Efficient image completion and enhancement model should be able torecover the corrupted and masked regions in images and then refine the imagefurther to increase the quality of the output image. Generative AdversarialNetworks (GAN), have turned out to be helpful in picture completion tasks. Inthis chapter, we will discuss the underlying GAN architecture and how they canbe used used for image completion tasks.", "output": "Semantic Image Completion and Enhancement using GANs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose FLARE, the first fingerprinting mechanism to verify whether asuspected Deep Reinforcement Learning (DRL) policy is an illegitimate copy ofanother (victim) policy. We first show that it is possible to findnon-transferable, universal adversarial masks, i.e., perturbations, to generateadversarial examples that can successfully transfer from a victim policy to itsmodified versions but not to independently trained policies. FLARE employsthese masks as fingerprints to verify the true ownership of stolen DRL policiesby measuring an action agreement value over states perturbed via such masks.Our empirical evaluations show that FLARE is effective (100% action agreementon stolen copies) and does not falsely accuse independent policies (no falsepositives). FLARE is also robust to model modification attacks and cannot beeasily evaded by more informed adversaries without negatively impacting agentperformance. We also show that not all universal adversarial masks are suitablecandidates for fingerprints due to the inherent characteristics of DRLpolicies. The spatio-temporal dynamics of DRL problems and sequentialdecision-making process make characterizing the decision boundary of DRLpolicies more difficult, as well as searching for universal masks that capturethe geometry of it.", "output": "FLARE: Fingerprinting Deep Reinforcement Learning Agents using Universal Adversarial Masks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As public consciousness regarding the collection and use of personalinformation by corporations grows, it is of increasing importance thatconsumers be active participants in the curation of corporate datasets. Inlight of this, data governance frameworks such as the General Data ProtectionRegulation (GDPR) have outlined the right to be forgotten as a key principleallowing individuals to request that their personal data be deleted from thedatabases and models used by organizations. To achieve forgetting in practice,several machine unlearning methods have been proposed to address thecomputational inefficiencies of retraining a model from scratch with eachunlearning request. While efficient online alternatives to retraining, it isunclear how these methods impact other properties critical to real-worldapplications, such as fairness. In this work, we propose the first fair machineunlearning method that can provably and efficiently unlearn data instanceswhile preserving group fairness. We derive theoretical results whichdemonstrate that our method can provably unlearn data instances whilemaintaining fairness objectives. Extensive experimentation with real-worlddatasets highlight the efficacy of our method at unlearning data instanceswhile preserving fairness.", "output": "Fair Machine Unlearning: Data Removal while Mitigating Disparities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There is a growing awareness of the harmful effects of distribution shift onthe performance of deployed machine learning models. Consequently, there is agrowing interest in detecting these shifts before associated costs have time toaccumulate. However, desiderata of crucial importance to the practicabledeployment of sequential shift detectors are typically overlooked by existingworks, precluding their widespread adoption. We identify three such desiderata,highlight existing works relevant to their satisfaction, and recommendimpactful directions for future research.", "output": "Towards Practicable Sequential Shift Detectors."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Non-intrusive load monitoring (NILM) identifies the status and powerconsumption of various household appliances by disaggregating the total powerusage signal of an entire house. Efficient and accurate load monitoringfacilitates user profile establishment, intelligent household energymanagement, and peak load shifting. This is beneficial for both the end-usersand utilities by improving the overall efficiency of a power distributionnetwork. Existing approaches mainly focus on developing an individual model foreach appliance. Those approaches typically rely on a large amount ofhousehold-labeled data which is hard to collect. In this paper, we propose amulti-appliance-task framework with a training-efficient sample augmentation(SA) scheme that boosts the disaggregation performance with limited labeleddata. For each appliance, we develop a shared-hierarchical split structure forits regression and classification tasks. In addition, we also propose atwo-dimensional attention mechanism in order to capture spatio-temporalcorrelations among all appliances. With only one-day training data and limitedappliance operation profiles, the proposed SA algorithm can achieve comparabletest performance to the case of training with the full dataset. Finally,simulation results show that our proposed approach features a significantlyimproved performance over many baseline models. The relative errors can bereduced by more than 50% on average. The codes of this work are available at", "output": "MATNilm: Multi-appliance-task Non-intrusive Load Monitoring with Limited Labeled Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a new large-scale emotion-labeled symbolic music datasetconsisting of 12k MIDI songs. To create this dataset, we first trained emotionclassification models on the GoEmotions dataset, achieving state-of-the-artresults with a model half the size of the baseline. We then applied thesemodels to lyrics from two large-scale MIDI datasets. Our dataset covers a widerange of fine-grained emotions, providing a valuable resource to explore theconnection between music and emotions and, especially, to develop models thatcan generate music based on specific emotions. Our code for inference, trainedmodels, and datasets are available online.", "output": "Emotion4MIDI: a Lyrics-based Emotion-Labeled Symbolic Music Dataset."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Autonomous systems in the road transportation network require intelligentmechanisms that cope with uncertainty to foresee the future. In this paper, wepropose a multi-stage probabilistic approach for trajectory forecasting:trajectory transformation to displacement space, clustering of displacementtime series, trajectory proposals, and ranking proposals. We introduce a newdeep feature clustering method, underlying self-conditioned GAN, which copesbetter with distribution shifts than traditional methods. Additionally, wepropose novel distance-based ranking proposals to assign probabilities to thegenerated trajectories that are more efficient yet accurate than an auxiliaryneural network. The overall system surpasses context-free deep generativemodels in human and road agents trajectory data while performing similarly topoint estimators when comparing the most probable trajectory.", "output": "Likely, Light, and Accurate Context-Free Clusters-based Trajectory Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Residual connections have been proposed as architecture-based inductive biasto mitigate the problem of exploding and vanishing gradients and increase taskperformance in both feed-forward and recurrent networks (RNNs) when trainedwith the backpropagation algorithm. Yet, little is known about how residualconnections in RNNs influence their dynamics and fading memory properties.Here, we introduce weakly coupled residual recurrent networks (WCRNNs) in whichresidual connections result in well-defined Lyapunov exponents and allow forstudying properties of fading memory. We investigate how the residualconnections of WCRNNs influence their performance, network dynamics, and memoryproperties on a set of benchmark tasks. We show that several distinct forms ofresidual connections yield effective inductive biases that result in increasednetwork expressivity. In particular, residual connections that (i) result innetwork dynamics at the proximity of the edge of chaos, (ii) allow networks tocapitalize on characteristic spectral properties of the data, and (iii) resultin heterogeneous memory properties are shown to increase practicalexpressivity. In addition, we demonstrate how our results can be extended tonon-linear residuals and introduce a weakly coupled residual initializationscheme that can be used for Elman RNNs", "output": "Fading memory as inductive bias in residual recurrent networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Normalising Flows are generative models characterised by their invertiblearchitecture. However, the requirement of invertibility imposes constraints ontheir expressiveness, necessitating a large number of parameters and innovativearchitectural designs to achieve satisfactory outcomes. Whilst flow-basedmodels predominantly rely on neural-network-based transformations forexpressive designs, alternative transformation methods have received limitedattention. In this work, we present Ferumal flow, a novel kernelisednormalising flow paradigm that integrates kernels into the framework. Ourresults demonstrate that a kernelised flow can yield competitive or superiorresults compared to neural network-based flows whilst maintaining parameterefficiency. Kernelised flows excel especially in the low-data regime, enablingflexible non-parametric density estimation in applications with sparse dataavailability.", "output": "Kernelised Normalising Flows."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Counterfactual examples have emerged as an effective approach to producesimple and understandable post-hoc explanations. In the context of graphclassification, previous work has focused on generating counterfactualexplanations by manipulating the most elementary units of a graph, i.e.,removing an existing edge, or adding a non-existing one. In this paper, weclaim that such language of explanation might be too fine-grained, and turn ourattention to some of the main characterizing features of real-world complexnetworks, such as the tendency to close triangles, the existence of recurringmotifs, and the organization into dense modules. We thus define a generaldensity-based counterfactual search framework to generate instance-levelcounterfactual explanations for graph classifiers, which can be instantiatedwith different notions of dense substructures. In particular, we show twospecific instantiations of this general framework: a method that searches forcounterfactual graphs by opening or closing triangles, and a method driven bymaximal cliques. We also discuss how the general method can be instantiated toexploit any other notion of dense substructures, including, for instance, agiven taxonomy of nodes. We evaluate the effectiveness of our approaches in 7brain network datasets and compare the counterfactual statements generatedaccording to several widely-used metrics. Results confirm that adopting asemantic-relevant unit of change like density is essential to define versatileand interpretable counterfactual explanation methods.", "output": "Counterfactual Explanations for Graph Classification Through the Lenses of Density."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Turbulence parametrizations will remain a necessary building block inkilometer-scale Earth system models. In convective boundary layers, where themean vertical gradients of conserved properties such as potential temperatureand moisture are approximately zero, the standard ansatz which relatesturbulent fluxes to mean vertical gradients via an eddy diffusivity has to beextended by mass flux parametrizations for the typically asymmetric up- anddowndrafts in the atmospheric boundary layer. In this work, we present aparametrization for a dry convective boundary layer based on a generativeadversarial network. The model incorporates the physics of self-similar layergrowth following from the classical mixed layer theory by Deardorff. Thisenhances the training data base of the generative machine learning algorithmand thus significantly improves the predicted statistics of the syntheticallygenerated turbulence fields at different heights inside the boundary layer. Thealgorithm training is based on fully three-dimensional direct numericalsimulation data. Differently to stochastic parametrizations, our model is ableto predict the highly non-Gaussian transient statistics of buoyancyfluctuations, vertical velocity, and buoyancy flux at different heights thusalso capturing the fastest thermals penetrating into the stabilized top region.The results of our generative algorithm agree with standard two-equation ormulti-plume stochastic mass-flux schemes. The present parametrization providesadditionally the granule-type horizontal organization of the turbulentconvection which cannot be obtained in any of the other model closures. Ourwork paves the way to efficient data-driven convective parametrizations inother natural flows, such as moist convection, upper ocean mixing, orconvection in stellar interiors.", "output": "Generative convective parametrization of dry atmospheric boundary layer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Representing source code in a generic input format is crucial to automatesoftware engineering tasks, e.g., applying machine learning algorithms toextract information. Visualizing code representations can further enable humanexperts to gain an intuitive insight into the code. Unfortunately, as of today,there is no universal tool that can simultaneously visualise different types ofcode representations. In this paper, we introduce a tool, CodeLens, whichprovides a visual interaction environment that supports various representationmethods and helps developers understand and explore them. CodeLens is designedto support multiple programming languages, such as Java, Python, andJavaScript, and four types of code representations, including sequence oftokens, abstract syntax tree (AST), data flow graph (DFG), and control flowgraph (CFG). By using CodeLens, developers can quickly visualize the specificcode representation and also obtain the represented inputs for models of code.The Web-based interface of CodeLens is available at The demonstration video can be found at ", "output": "CodeLens: An Interactive Tool for Visualizing Code Representations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work introduces TRON, a scalable session-based Transformer Recommenderusing Optimized Negative-sampling. Motivated by the scalability and performancelimitations of prevailing models such as SASRec and GRU4Rec+, TRON integratestop-k negative sampling and listwise loss functions to enhance itsrecommendation accuracy. Evaluations on relevant large-scale e-commercedatasets show that TRON improves upon the recommendation quality of currentmethods while maintaining training speeds similar to SASRec. A live A/B testyielded an 18.14% increase in click-through rate over SASRec, highlighting thepotential of TRON in practical settings. For further research, we provideaccess to our source code at and an anonymizeddataset at ", "output": "Scaling Session-Based Transformer Recommendations using Optimized Negative Sampling and Loss Functions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual AI systems are vulnerable to natural and synthetic physical corruptionin the real-world. Such corruption often arises unexpectedly and alters themodel's performance. In recent years, the primary focus has been on adversarialattacks. However, natural corruptions (e.g., snow, fog, dust) are anomnipresent threat to visual AI systems and should be considered equallyimportant. Many existing works propose interesting solutions to train robustmodels against natural corruption. These works either leverage imageaugmentations, which come with the additional cost of model training, or placesuspicious patches in the scene to design unadversarial examples. In this work,we propose the idea of naturalistic support artifacts (NSA) for robustprediction. The NSAs are shown to be beneficial in scenarios where modelparameters are inaccessible and adding artifacts in the scene is feasible. TheNSAs are natural looking objects generated through artifact training usingDC-GAN to have high visual fidelity in the scene. We test against naturalcorruptions on the Imagenette dataset and observe the improvement in predictionconfidence score by four times. We also demonstrate NSA's capability toincrease adversarial accuracy by 8% on average. Lastly, we qualitativelyanalyze NSAs using saliency maps to understand how they help improve predictionconfidence.", "output": "NSA: Naturalistic Support Artifact to Boost Network Confidence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Performance Benchmarking of HPC systems is an ongoing effort that seeks toprovide information that will allow for increased performance and improve thejob schedulers that manage these systems. We develop a benchmarking tool thatutilizes machine learning models and gathers performance data onGPU-accelerated nodes while they perform material segmentation analysis. Thebenchmark uses a ML model that has been converted from Caffe to PyTorch usingthe MMdnn toolkit and the MINC-2500 dataset. Performance data is gathered ontwo ERDC DSRC systems, Onyx and Vulcanite. The data reveals that whileVulcanite has faster model times in a large number of benchmarks, and it isalso more subject to some environmental factors that can cause performancesslower than Onyx. In contrast the model times from Onyx are consistent acrossbenchmarks.", "output": "Benchmarking Performance of Deep Learning Model for Material Segmentation on Two HPC Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graphs can be leveraged to model polyphonic multitrack symbolic music, wherenotes, chords and entire sections may be linked at different levels of themusical hierarchy by tonal and rhythmic relationships. Nonetheless, there is alack of works that consider graph representations in the context of deeplearning systems for music generation. This paper bridges this gap byintroducing a novel graph representation for music and a deep VariationalAutoencoder that generates the structure and the content of musical graphsseparately, one after the other, with a hierarchical architecture that matchesthe structural priors of music. By separating the structure and content ofmusical graphs, it is possible to condition generation by specifying whichinstruments are played at certain times. This opens the door to a new form ofhuman-computer interaction in the context of music co-creation. After trainingthe model on existing MIDI datasets, the experiments show that the model isable to generate appealing short and long musical sequences and torealistically interpolate between them, producing music that is tonally andrhythmically consistent. Finally, the visualization of the embeddings showsthat the model is able to organize its latent space in accordance with knownmusical concepts.", "output": "Graph-based Polyphonic Multitrack Music Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data profiling is an essential process in modern data-driven industries. Oneof its critical components is the discovery and validation of complexstatistics, including functional dependencies, data constraints, associationrules, and others.However, most existing data profiling systems that focus on complexstatistics do not provide proper integration with the tools used bycontemporary data scientists. This creates a significant barrier to theadoption of these tools in the industry. Moreover, existing systems were notcreated with industrial-grade workloads in mind. Finally, they do not aim toprovide descriptive explanations, i.e. why a given pattern is not found. It isa significant issue as it is essential to understand the underlying reasons fora specific pattern's absence to make informed decisions based on the data.Because of that, these patterns are effectively rest in thin air: theirapplication scope is rather limited, they are rarely used by the broaderpublic. At the same time, as we are going to demonstrate in this presentation,complex statistics can be efficiently used to solve many classic data qualityproblems.Desbordante is an open-source data profiler that aims to close this gap. Itis built with emphasis on industrial application: it is efficient, scalable,resilient to crashes, and provides explanations. Furthermore, it providesseamless Python integration by offloading various costly operations to the C++core, not only mining.In this demonstration, we show several scenarios that allow end users tosolve different data quality problems. Namely, we showcase typo detection, datadeduplication, and data anomaly detection scenarios.", "output": "Solving Data Quality Problems with Desbordante: a Demo."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large Language Models for Code (Code LLM) are flourishing. New and powerfulmodels are released on a weekly basis, demonstrating remarkable performance onthe code generation task. Various approaches have been proposed to boost thecode generation performance of pre-trained Code LLMs, such as supervisedfine-tuning, instruction tuning, reinforcement learning, etc. In this paper, wepropose a novel RRTF (Rank Responses to align Test&Teacher Feedback) framework,which can effectively and efficiently boost pre-trained large language modelsfor code generation. Under this framework, we present PanGu-Coder2, whichachieves 62.20% pass@1 on the OpenAI HumanEval benchmark. Furthermore, throughan extensive evaluation on CoderEval and LeetCode benchmarks, we show thatPanGu-Coder2 consistently outperforms all previous Code LLMs.", "output": "PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose a computationally efficient framework for intervalreachability of neural network controlled systems. Our approach builds uponinclusion functions for the neural network controller and the open-loop system.We observe that many state-of-the-art neural network verifiers can produceinclusion functions for neural networks. We introduce and analyze a new classof inclusion functions for the open-loop dynamics based on bounds of thefunction Jacobian that is particularly suitable for capturing the interactionsbetween systems and neural network controllers. Next, for any dynamical system,we use inclusion functions to construct an embedding system with twice thenumber of states as the original system. We show that a single trajectory ofthis embedding system provides hyper-rectangular over-approximations ofreachable sets. We then propose two approaches for constructing a closed-loopembedding system for a neural network controlled dynamical system that accountsfor the interaction between the system and the controller in different ways.The interconnection-based approach accounts for the worst-case evolution ofeach coordinate separately by substituting the neural network inclusionfunction into the open-loop embedding system. The interaction-based approachuses the newly introduced class of Jacobian-based inclusion functions to fullycapture first-order interactions between the system and the controller.Finally, we implement our approach in a Python framework calledtexttt{ReachMM} and show that on several existing benchmarks, our methodsoutperform the existing approaches in the literature. We also demonstrate thescalability of our method on a vehicle platooning example with up to $200$states.", "output": "Efficient Interaction-Aware Interval Analysis of Neural Network Feedback Loops."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The continuous dynamics of natural systems has been effectively modelledusing Neural Ordinary Differential Equations (Neural ODEs). However, foraccurate and meaningful predictions, it is crucial that the models follow theunderlying rules or laws that govern these systems. In this work, we propose aself-adaptive penalty algorithm for Neural ODEs to enable modelling ofconstrained natural systems. The proposed self-adaptive penalty function candynamically adjust the penalty parameters. The explicit introduction of priorknowledge helps to increase the interpretability of Neural ODE -based models.We validate the proposed approach by modelling three natural systems with priorknowledge constraints: population growth, chemical reaction evolution, anddamped harmonic oscillator motion. The numerical experiments and a comparisonwith other penalty Neural ODE approaches and emph{vanilla} Neural ODE,demonstrate the effectiveness of the proposed self-adaptive penalty algorithmfor Neural ODEs in modelling constrained natural systems. Moreover, theself-adaptive penalty approach provides more accurate and robust models withreliable and meaningful predictions.", "output": "A Self-Adaptive Penalty Method for Integrating Prior Knowledge Constraints into Neural ODEs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As the network scale increases, existing fully distributed solutions start tolag behind the real-world challenges such as (1) slow information propagation,(2) network communication failures, and (3) external adversarial attacks. Inthis paper, we focus on hierarchical system architecture and address theproblem of non-Bayesian learning over networks that are vulnerable tocommunication failures and adversarial attacks. On network communication, weconsider packet-dropping link failures.We first propose a hierarchical robust push-sum algorithm that can achieveaverage consensus despite frequent packet-dropping link failures. We provide asparse information fusion rule between the parameter server and arbitrarilyselected network representatives. Then, interleaving the consensus update stepwith a dual averaging update with Kullback-Leibler (KL) divergence as theproximal function, we obtain a packet-dropping fault-tolerant non-Bayesianlearning algorithm with provable convergence guarantees.On external adversarial attacks, we consider Byzantine attacks in which thecompromised agents can send maliciously calibrated messages to others(including both the agents and the parameter server). To avoid the curse ofdimensionality of Byzantine consensus, we solve the non-Bayesian learningproblem via running multiple dynamics, each of which only involves Byzantineconsensus with scalar inputs. To facilitate resilient information propagationacross sub-networks, we use a novel Byzantine-resilient gossiping-type rule atthe parameter server.", "output": "Network Fault-tolerant and Byzantine-resilient Social Learning via Collaborative Hierarchical Non-Bayesian Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper seeks to solve Multi-Source Domain Adaptation (MSDA), which aimsto mitigate data distribution shifts when transferring knowledge from multiplelabeled source domains to an unlabeled target domain. We propose a novel MSDAframework based on dictionary learning and optimal transport. We interpret eachdomain in MSDA as an empirical distribution. As such, we express each domain asa Wasserstein barycenter of dictionary atoms, which are empiricaldistributions. We propose a novel algorithm, DaDiL, for learning viamini-batches: (i) atom distributions; (ii) a matrix of barycentric coordinates.Based on our dictionary, we propose two novel methods for MSDA: DaDil-R, basedon the reconstruction of labeled samples in the target domain, and DaDiL-E,based on the ensembling of classifiers learned on atom distributions. Weevaluate our methods in 3 benchmarks: Caltech-Office, Office 31, and CRWU,where we improved previous state-of-the-art by 3.15%, 2.29%, and 7.71% inclassification performance. Finally, we show that interpolations in theWasserstein hull of learned atoms provide data that can generalize to thetarget domain.", "output": "Multi-Source Domain Adaptation through Dataset Dictionary Learning in Wasserstein Space."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the medical field, federated learning commonly deals with highlyimbalanced datasets, including skin lesions and gastrointestinal images.Existing federated methods under highly imbalanced datasets primarily focus onoptimizing a global model without incorporating the intra-class variations thatcan arise in medical imaging due to different populations, findings, andscanners. In this paper, we study the inter-client intra-class variations withpublicly available self-supervised auxiliary networks. Specifically, we findthat employing a shared auxiliary pre-trained model, like MoCo-V2, locally onevery client yields consistent divergence measurements. Based on thesefindings, we derive a dynamic balanced model aggregation via self-supervisedpriors (MAS) to guide the global model optimization. Fed-MAS can be utilizedwith different local learning methods for effective model aggregation toward ahighly robust and unbiased global model. Our code is available aturl{", "output": "Federated Model Aggregation via Self-Supervised Priors for Highly Imbalanced Medical Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We use a combination of unsupervised clustering and sparsity-promotinginference algorithms to learn locally dominant force balances that explainmacroscopic pattern formation in self-organized active particle systems. Theself-organized emergence of macroscopic patterns from microscopic interactionsbetween self-propelled particles can be widely observed nature. Althoughhydrodynamic theories help us better understand the physical basis of thisphenomenon, identifying a sufficient set of local interactions that shape,regulate, and sustain self-organized structures in active particle systemsremains challenging. We investigate a classic hydrodynamic model ofself-propelled particles that produces a wide variety of patterns, like astersand moving density bands. Our data-driven analysis shows that propagating bandsare formed by local alignment interactions driven by density gradients, whilesteady-state asters are shaped by a mechanism of splay-induced negativecompressibility arising from strong particle interactions. Our method alsoreveals analogous physical principles of pattern formation in a system wherethe speed of the particle is influenced by local density. This demonstrates theability of our method to reveal physical commonalities across models. Thephysical mechanisms inferred from the data are in excellent agreement withanalytical scaling arguments and experimental observations.", "output": "Learning locally dominant force balances in active particle systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the overwhelming trend of mask image modeling led by MAE, generativepre-training has shown a remarkable potential to boost the performance offundamental models in 2D vision. However, in 3D vision, the over-reliance onTransformer-based backbones and the unordered nature of point clouds haverestricted the further development of generative pre-training. In this paper,we propose a novel 3D-to-2D generative pre-training method that is adaptable toany point cloud model. We propose to generate view images from differentinstructed poses via the cross-attention mechanism as the pre-training scheme.Generating view images has more precise supervision than its point cloudcounterpart, thus assisting 3D backbones to have a finer comprehension of thegeometrical structure and stereoscopic relations of the point cloud.Experimental results have proved the superiority of our proposed 3D-to-2Dgenerative pre-training over previous pre-training methods. Our method is alsoeffective in boosting the performance of architecture-oriented approaches,achieving state-of-the-art performance when fine-tuning on ScanObjectNNclassification and ShapeNetPart segmentation tasks. Code is available at", "output": "Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning often faces the challenge of efficiently processing dynamicinputs, such as sensor data or user inputs. For example, an AI writingassistant is required to update its suggestions in real time as a document isedited. Re-running the model each time is expensive, even with compressiontechniques like knowledge distillation, pruning, or quantization. Instead, wetake an incremental computing approach, looking to reuse calculations as theinputs change. However, the dense connectivity of conventional architecturesposes a major obstacle to incremental computation, as even minor input changescascade through the network and restrict information reuse. To address this, weuse vector quantization to discretize intermediate values in the network, whichfilters out noisy and unnecessary modifications to hidden neurons, facilitatingthe reuse of their values. We apply this approach to the transformersarchitecture, creating an efficient incremental inference algorithm withcomplexity proportional to the fraction of the modified inputs. Our experimentswith adapting the OPT-125M pre-trained language model demonstrate comparableaccuracy on document classification while requiring 12.1X (median) feweroperations for processing sequences of atomic edits.", "output": "Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose the Thinker algorithm, a novel approach that enables reinforcementlearning agents to autonomously interact with and utilize a learned worldmodel. The Thinker algorithm wraps the environment with a world model andintroduces new actions designed for interacting with the world model. Thesemodel-interaction actions enable agents to perform planning by proposingalternative plans to the world model before selecting a final action to executein the environment. This approach eliminates the need for hand-crafted planningalgorithms by enabling the agent to learn how to plan autonomously and allowsfor easy interpretation of the agent's plan with visualization. We demonstratethe algorithm's effectiveness through experimental results in the game ofSokoban and the Atari 2600 benchmark, where the Thinker algorithm achievesstate-of-the-art performance and competitive results, respectively.Visualizations of agents trained with the Thinker algorithm demonstrate thatthey have learned to plan effectively with the world model to select betteractions. The algorithm's generality opens a new research direction on how aworld model can be used in reinforcement learning and how planning can beseamlessly integrated into an agent's decision-making process.", "output": "Thinker: Learning to Plan and Act."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the increased deployment of machine learning models in variousreal-world applications, researchers and practitioners alike have emphasizedthe need for explanations of model behaviour. To this end, two broad strategieshave been outlined in prior literature to explain models. Post hoc explanationmethods explain the behaviour of complex black-box models by highlightingfeatures that are critical to model predictions; however, prior work has shownthat these explanations may not be faithful, and even more concerning is ourinability to verify them. Specifically, it is nontrivial to evaluate if a givenattribution is correct with respect to the underlying model. Inherentlyinterpretable models, on the other hand, circumvent these issues by explicitlyencoding explanations into model architecture, meaning their explanations arenaturally faithful and verifiable, but they often exhibit poor predictiveperformance due to their limited expressive power. In this work, we aim tobridge the gap between the aforementioned strategies by proposing VerifiabilityTuning (VerT), a method that transforms black-box models into models thatnaturally yield faithful and verifiable feature attributions. We begin byintroducing a formal theoretical framework to understand verifiability and showthat attributions produced by standard models cannot be verified. We thenleverage this framework to propose a method to build verifiable models andfeature attributions out of fully trained black-box models. Finally, we performextensive experiments on semi-synthetic and real-world datasets, and show thatVerT produces models that (1) yield explanations that are correct andverifiable and (2) are faithful to the original black-box models they are meantto explain.", "output": "Verifiable Feature Attributions: A Bridge between Post Hoc Explainability and Inherent Interpretability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) are now highly capable at a diverse range oftasks. This paper studies whether or not GPT-4, one such LLM, is capable ofassisting researchers in the field of adversarial machine learning. As a casestudy, we evaluate the robustness of AI-Guardian, a recent defense toadversarial examples published at IEEE S&P 2023, a top computer securityconference. We completely break this defense: the proposed scheme does notincrease robustness compared to an undefended baseline.We write none of the code to attack this model, and instead prompt GPT-4 toimplement all attack algorithms following our instructions and guidance. Thisprocess was surprisingly effective and efficient, with the language model attimes producing code from ambiguous instructions faster than the author of thispaper could have done. We conclude by discussing (1) the warning signs presentin the evaluation that suggested to us AI-Guardian would be broken, and (2) ourexperience with designing attacks and performing novel research using the mostrecent advances in language modeling.", "output": "A LLM Assisted Exploitation of AI-Guardian."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The processing of information is an indispensable property of living systemsrealized by networks of active processes with enormous complexity. They haveinspired many variants of modern machine learning one of them being reservoircomputing, in which stimulating a network of nodes with fading memory enablescomputations and complex predictions. Reservoirs are implemented on computerhardware, but also on unconventional physical substrates such as mechanicaloscillators, spins, or bacteria often summarized as physical reservoircomputing. Here we demonstrate physical reservoir computing with a syntheticactive microparticle system that self-organizes from an active and passivecomponent into inherently noisy nonlinear dynamical units. Theself-organization and dynamical response of the unit is the result of a delayedpropulsion of the microswimmer to a passive target. A reservoir of such unitswith a self-coupling via the delayed response can perform predictive tasksdespite the strong noise resulting from Brownian motion of the microswimmers.To achieve efficient noise suppression, we introduce a special architecturethat uses historical reservoir states for output. Our results pave the way forthe study of information processing in synthetic self-organized active particlesystems.", "output": "Harnessing Synthetic Active Particles for Physical Reservoir Computing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Google's Bard has emerged as a formidable competitor to OpenAI's ChatGPT inthe field of conversational AI. Notably, Bard has recently been updated tohandle visual inputs alongside text prompts during conversations. Given Bard'simpressive track record in handling textual inputs, we explore its capabilitiesin understanding and interpreting visual data (images) conditioned by textquestions. This exploration holds the potential to unveil new insights andchallenges for Bard and other forthcoming multi-modal Generative models,especially in addressing complex computer vision problems that demand accuratevisual and language understanding. Specifically, in this study, we focus on 15diverse task scenarios encompassing regular, camouflaged, medical, under-waterand remote sensing data to comprehensively evaluate Bard's performance. Ourprimary finding indicates that Bard still struggles in these vision scenarios,highlighting the significant gap in vision-based understanding that needs to bebridged in future developments. We expect that this empirical study will provevaluable in advancing future models, leading to enhanced capabilities incomprehending and interpreting fine-grained visual data. Our project isreleased on ", "output": "How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We revisit the problem of designing scalable protocols for private statisticsand private federated learning when each device holds its private data. Ourfirst contribution is to propose a simple primitive that allows for efficientimplementation of several commonly used algorithms, and allows for privacyaccounting that is close to that in the central setting without requiring thestrong trust assumptions it entails. Second, we propose a system architecturethat implements this primitive and perform a security analysis of the proposedsystem.", "output": "Samplable Anonymous Aggregation for Private Federated Data Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deepfake detection methods have shown promising results in recognizingforgeries within a given dataset, where training and testing take place on thein-distribution dataset. However, their performance deteriorates significantlywhen presented with unseen samples. As a result, a reliable deepfake detectionsystem must remain impartial to forgery types, appearance, and quality forguaranteed generalizable detection performance. Despite various attempts toenhance cross-dataset generalization, the problem remains challenging,particularly when testing against common post-processing perturbations, such asvideo compression or blur. Hence, this study introduces a deepfake detectionframework, leveraging a self-supervised pre-training model that deliversexceptional generalization ability, withstanding common corruptions andenabling feature explainability. The framework comprises three key components:a feature extractor based on vision Transformer architecture that ispre-trained via self-supervised contrastive learning methodology, a graphconvolution network coupled with a Transformer discriminator, and a graphTransformer relevancy map that provides a better understanding of manipulatedregions and further explains the model's decision. To assess the effectivenessof the proposed framework, several challenging experiments are conducted,including in-data distribution performance, cross-dataset, cross-manipulationgeneralization, and robustness against common post-production perturbations.The results achieved demonstrate the remarkable effectiveness of the proposeddeepfake detection framework, surpassing the current state-of-the-artapproaches.", "output": "Self-Supervised Graph Transformer for Deepfake Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Fourier neural operator (FNO) is a powerful technique for learningsurrogate maps for partial differential equation (PDE) solution operators. Formany real-world applications, which often require high-resolution data points,training time and memory usage are significant bottlenecks. While there aremixed-precision training techniques for standard neural networks, those workfor real-valued datatypes on finite dimensions and therefore cannot be directlyapplied to FNO, which crucially operates in the (complex-valued) Fourier domainand in function spaces. On the other hand, since the Fourier transform isalready an approximation (due to discretization error), we do not need toperform the operation at full precision. In this work, we (i) profile memoryand runtime for FNO with full and mixed-precision training, (ii) conduct astudy on the numerical stability of mixed-precision training of FNO, and (iii)devise a training routine which substantially decreases training time andmemory usage (up to 34%), with little or no reduction in accuracy, on theNavier-Stokes and Darcy flow equations. Combined with the recently proposedtensorized FNO (Kossaifi et al., 2023), the resulting model has far betterperformance while also being significantly faster than the original FNO.", "output": "Speeding up Fourier Neural Operators via Mixed Precision."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Because \"out-of-the-box\" large language models are capable of generating agreat deal of objectionable content, recent work has focused on aligning thesemodels in an attempt to prevent undesirable generation. While there has beensome success at circumventing these measures -- so-called \"jailbreaks\" againstLLMs -- these attacks have required significant human ingenuity and are brittlein practice. In this paper, we propose a simple and effective attack methodthat causes aligned language models to generate objectionable behaviors.Specifically, our approach finds a suffix that, when attached to a wide rangeof queries for an LLM to produce objectionable content, aims to maximize theprobability that the model produces an affirmative response (rather thanrefusing to answer). However, instead of relying on manual engineering, ourapproach automatically produces these adversarial suffixes by a combination ofgreedy and gradient-based search techniques, and also improves over pastautomatic prompt generation methods.Surprisingly, we find that the adversarial prompts generated by our approachare quite transferable, including to black-box, publicly released LLMs.Specifically, we train an adversarial attack suffix on multiple prompts (i.e.,queries asking for many different types of objectionable content), as well asmultiple models (in our case, Vicuna-7B and 13B). When doing so, the resultingattack suffix is able to induce objectionable content in the public interfacesto ChatGPT, Bard, and Claude, as well as open source LLMs such as LLaMA-2-Chat,Pythia, Falcon, and others. In total, this work significantly advances thestate-of-the-art in adversarial attacks against aligned language models,raising important questions about how such systems can be prevented fromproducing objectionable information. Code is available atgithub.com/llm-attacks/llm-attacks.", "output": "Universal and Transferable Adversarial Attacks on Aligned Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Handwriting recognition is a challenging and critical problem in the fieldsof pattern recognition and machine learning, with applications spanning a widerange of domains. In this paper, we focus on the specific issue of recognizingoffline Arabic handwritten text. Existing approaches typically utilize acombination of convolutional neural networks for image feature extraction andrecurrent neural networks for temporal modeling, with connectionist temporalclassification used for text generation. However, these methods suffer from alack of parallelization due to the sequential nature of recurrent neuralnetworks. Furthermore, these models cannot account for linguistic rules,necessitating the use of an external language model in the post-processingstage to boost accuracy. To overcome these issues, we introduce two alternativearchitectures, namely the Transformer Transducer and the standardsequence-to-sequence Transformer, and compare their performance in terms ofaccuracy and speed. Our approach can model language dependencies and reliesonly on the attention mechanism, thereby making it more parallelizable and lesscomplex. We employ pre-trained Transformers for both image understanding andlanguage modeling. Our evaluation on the Arabic KHATT dataset demonstrates thatour proposed method outperforms the current state-of-the-art approaches forrecognizing offline Arabic handwritten text.", "output": "A Transformer-based Approach for Arabic Offline Handwritten Text Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Approaches to recommendation are typically evaluated in one of two ways: (1)via a (simulated) online experiment, often seen as the gold standard, or (2)via some offline evaluation procedure, where the goal is to approximate theoutcome of an online experiment. Several offline evaluation metrics have beenadopted in the literature, inspired by ranking metrics prevalent in the fieldof Information Retrieval. (Normalised) Discounted Cumulative Gain (nDCG) is onesuch metric that has seen widespread adoption in empirical studies, and higher(n)DCG values have been used to present new methods as the state-of-the-art intop-$n$ recommendation for many years.Our work takes a critical look at this approach, and investigates when we canexpect such metrics to approximate the gold standard outcome of an onlineexperiment. We formally present the assumptions that are necessary to considerDCG an unbiased estimator of online reward and provide a derivation for thismetric from first principles, highlighting where we deviate from itstraditional uses in IR. Importantly, we show that normalising the metricrenders it inconsistent, in that even when DCG is unbiased, ranking competingmethods by their normalised DCG can invert their relative order. Through acorrelation analysis between off- and on-line experiments conducted on alarge-scale recommendation platform, we show that our unbiased DCG estimatesstrongly correlate with online reward, even when some of the metric's inherentassumptions are violated. This statement no longer holds for its normalisedvariant, suggesting that nDCG's practical utility may be limited.", "output": "On (Normalised) Discounted Cumulative Gain as an Offline Evaluation Metric for Top-$n$ Recommendation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Regulation of Multi-Agent Systems (MAS) and Declarative ElectronicInstitutions (DEIs) was a multidisciplinary research topic of the past decadeinvolving (Physical and Software) Agents and Law since the beginning, butrecently evolved towards News-claimed Robot Lawyer since 2016. One of thesefirst proposals of restricting the behaviour of Software Agents was ElectronicInstitutions.However, with the recent reformulation of Artificial NeuralNetworks (ANNs) as Deep Learning (DL), Security, Privacy,Ethical and Legalissues regarding the use of DL has raised concerns in the ArtificialIntelligence (AI) Community. Now that the Regulation of MAS is almost correctlyaddressed, we propose the Regulation of Artificial Neural Networks asAgent-based Training of a special type of regulated Artificial Neural Networkthat we call Institutional Neural Network (INN).The main purpose of this paperis to bring attention to Artificial Teaching (AT) and to give a tentativeanswer showing a proof-of-concept implementation of Regulated Deep Learning(RDL). This paper introduces the former concept and provide $I^*$, a languagepreviously used to model declaratively and extend Electronic Institutions, as ameans to regulate the execution of Artificial Neural Networks and theirinteractions with Artificial Teachers (ATs)", "output": "Towards Regulated Deep Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data augmentation is a powerful technique to improve performance inapplications such as image and text classification tasks. Yet, there is littlerigorous understanding of why and how various augmentations work. In this work,we consider a family of linear transformations and study their effects on theridge estimator in an over-parametrized linear regression setting. First, weshow that transformations that preserve the labels of the data can improveestimation by enlarging the span of the training data. Second, we show thattransformations that mix data can improve estimation by playing aregularization effect. Finally, we validate our theoretical insights on MNIST.Based on the insights, we propose an augmentation scheme that searches over thespace of transformations by how uncertain the model is about the transformeddata. We validate our proposed scheme on image and text datasets. For example,our method outperforms random sampling methods by 1.24% on CIFAR-100 usingWide-ResNet-28-10. Furthermore, we achieve comparable accuracy to the SoTAAdversarial AutoAugment on CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets.", "output": "On the Generalization Effects of Linear Transformations in Data Augmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present computational and experimental results on how artificialintelligence (AI) learns to control an Acrobot using reinforcement learning(RL). Thereby the experimental setup is designed as an embedded system, whichis of interest for robotics and energy harvesting applications. Specifically,we study the control of angular velocity of the Acrobot, as well as control ofits total energy, which is the sum of the kinetic and the potential energy. Bythis means the RL algorithm is designed to drive the angular velocity or theenergy of the first pendulum of the Acrobot towards a desired value. With this,libration or full rotation of the unactuated pendulum of the Acrobot isachieved. Moreover, investigations of the Acrobot control are carried out,which lead to insights about the influence of the state space discretization,the episode length, the action space or the mass of the driven pendulum on theRL control. By further numerous simulations and experiments the effects ofparameter variations are evaluated.", "output": "Experimental Study on Reinforcement Learning-based Control of an Acrobot."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It has long been believed that the brain is highly modular both in terms ofstructure and function, although recent evidence has led some to question theextent of both types of modularity. We used artificial neural networks to testthe hypothesis that structural modularity is sufficient to guarantee functionalspecialization, and find that in general, this doesn't necessarily hold exceptat extreme levels. We then systematically tested which features of theenvironment and network do lead to the emergence of specialization. We used asimple toy environment, task and network, allowing us precise control, and showthat in this setup, several distinct measures of specialization givequalitatively similar results. We further find that (1) specialization can onlyemerge in environments where features of that environment are meaningfullyseparable, (2) specialization preferentially emerges when the network isstrongly resource-constrained, and (3) these findings are qualitatively similaracross different network architectures, but the quantitative relationshipsdepends on the architecture type. Finally, we show that functionalspecialization varies dynamically across time, and demonstrate that thesedynamics depend on both the timing and bandwidth of information flow in thenetwork. We conclude that a static notion of specialization, based onstructural modularity, is likely too simple a framework for understandingintelligent systems in situations of real-world complexity. We propose thatthoroughly stress testing candidate definitions of functional modularity insimplified scenarios before extending to more complex data, network models andelectrophysiological recordings is likely to be a fruitful approach.", "output": "Dynamics of specialization in neural modules under resource constraints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Secure aggregation is a critical component in federated learning (FL), whichenables the server to learn the aggregate model of the users without observingtheir local models. Conventionally, secure aggregation algorithms focus only onensuring the privacy of individual users in a single training round. We contendthat such designs can lead to significant privacy leakages over multipletraining rounds, due to partial user selection/participation at each round ofFL. In fact, we show that the conventional random user selection strategies inFL lead to leaking users' individual models within number of rounds that islinear in the number of users. To address this challenge, we introduce a secureaggregation framework, Multi-RoundSecAgg, with multi-round privacy guarantees.In particular, we introduce a new metric to quantify the privacy guarantees ofFL over multiple training rounds, and develop a structured user selectionstrategy that guarantees the long-term privacy of each user (over any number oftraining rounds). Our framework also carefully accounts for the fairness andthe average number of participating users at each round. Our experiments onMNIST and CIFAR-10 datasets in the IID and the non-IID settings demonstrate theperformance improvement over the baselines, both in terms of privacy protectionand test accuracy.", "output": "Securing Secure Aggregation: Mitigating Multi-Round Privacy Leakage in Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the paper, we propose an effective and efficient Compositional FederatedLearning (ComFedL) algorithm for solving a new compositional Federated Learning(FL) framework, which frequently appears in many data mining and machinelearning problems with a hierarchical structure such as distributionally robustFL and model-agnostic meta learning (MAML). Moreover, we study the convergenceanalysis of our ComFedL algorithm under some mild conditions, and prove that itachieves a convergence rate of $O(frac{1}{sqrt{T}})$, where $T$ denotes thenumber of iteration. To the best of our knowledge, our new Compositional FLframework is the first work to bridge federated learning with compositionstochastic optimization. In particular, we first transform the distributionallyrobust FL (i.e., a minimax optimization problem) into a simple compositionoptimization problem by using KL divergence regularization. At the same time,we also first transform the distribution-agnostic MAML problem (i.e., a minimaxoptimization problem) into a simple yet effective composition optimizationproblem. Finally, we apply two popular machine learning tasks, i.e.,distributionally robust FL and MAML to demonstrate the effectiveness of ouralgorithm.", "output": "Compositional federated learning: Applications in distributionally robust averaging and meta learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning models always make a prediction, even when it is likely tobe inaccurate. This behavior should be avoided in many decision supportapplications, where mistakes can have severe consequences. Albeit alreadystudied in 1970, machine learning with rejection recently gained interest. Thismachine learning subfield enables machine learning models to abstain frommaking a prediction when likely to make a mistake.This survey aims to provide an overview on machine learning with rejection.We introduce the conditions leading to two types of rejection, ambiguity andnovelty rejection, which we carefully formalize. Moreover, we review andcategorize strategies to evaluate a model's predictive and rejective quality.Additionally, we define the existing architectures for models with rejectionand describe the standard techniques for learning such models. Finally, weprovide examples of relevant application domains and show how machine learningwith rejection relates to other machine learning research areas.", "output": "Machine Learning with a Reject Option: A survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Traditional machine learning paradigms are based on the assumption that bothtraining and test data follow the same statistical pattern, which ismathematically referred to as Independent and Identically Distributed($i.i.d.$). However, in real-world applications, this $i.i.d.$ assumption oftenfails to hold due to unforeseen distributional shifts, leading to considerabledegradation in model performance upon deployment. This observed discrepancyindicates the significance of investigating the Out-of-Distribution (OOD)generalization problem. OOD generalization is an emerging topic of machinelearning research that focuses on complex scenarios wherein the distributionsof the test data differ from those of the training data. This paper representsthe first comprehensive, systematic review of OOD generalization, encompassinga spectrum of aspects from problem definition, methodological development, andevaluation procedures, to the implications and future directions of the field.Our discussion begins with a precise, formal characterization of the OODgeneralization problem. Following that, we categorize existing methodologiesinto three segments: unsupervised representation learning, supervised modellearning, and optimization, according to their positions within the overarchinglearning process. We provide an in-depth discussion on representativemethodologies for each category, further elucidating the theoretical linksbetween them. Subsequently, we outline the prevailing benchmark datasetsemployed in OOD generalization studies. To conclude, we overview the existingbody of work in this domain and suggest potential avenues for future researchon OOD generalization. A summary of the OOD generalization methodologiessurveyed in this paper can be accessed at", "output": "Towards Out-Of-Distribution Generalization: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work we propose RELDEC, a novel approach for sequential decoding ofmoderate length low-density parity-check (LDPC) codes. The main idea behindRELDEC is that an optimized decoding policy is subsequently obtained viareinforcement learning based on a Markov decision process (MDP). In contrast toour previous work, where an agent learns to schedule only a single check node(CN) within a group (cluster) of CNs per iteration, in this work we train theagent to schedule all CNs in a cluster, and all clusters in every iteration.That is, in each learning step of RELDEC an agent learns to schedule CNclusters sequentially depending on a reward associated with the outcome ofscheduling a particular cluster. We also modify the state space representationof the MDP, enabling RELDEC to be suitable for larger block length LDPC codesthan those studied in our previous work. Furthermore, to address decoding undervarying channel conditions, we propose agile meta-RELDEC (AM-RELDEC) thatemploys meta-reinforcement learning. The proposed RELDEC scheme significantlyoutperforms standard flooding and random sequential decoding for a variety ofLDPC codes, including codes designed for 5G new radio.", "output": "RELDEC: Reinforcement Learning-Based Decoding of Moderate Length LDPC Codes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Prompt learning approaches have made waves in natural language processing byinducing better few-shot performance while they still follow a parametric-basedlearning paradigm; the oblivion and rote memorization problems in learning mayencounter unstable generalization issues. Specifically, vanilla prompt learningmay struggle to utilize atypical instances by rote during fully-supervisedtraining or overfit shallow patterns with low-shot data. To alleviate suchlimitations, we develop RetroPrompt with the motivation of decoupling knowledgefrom memorization to help the model strike a balance between generalization andmemorization. In contrast with vanilla prompt learning, RetroPrompt constructsan open-book knowledge-store from training instances and implements a retrievalmechanism during the process of input, training and inference, thus equippingthe model with the ability to retrieve related contexts from the trainingcorpus as cues for enhancement. Extensive experiments demonstrate thatRetroPrompt can obtain better performance in both few-shot and zero-shotsettings. Besides, we further illustrate that our proposed RetroPrompt canyield better generalization abilities with new datasets. Detailed analysis ofmemorization indeed reveals RetroPrompt can reduce the reliance of languagemodels on memorization; thus, improving generalization for downstream tasks.Code is available in", "output": "Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Algorithmic Gaussianization is a phenomenon that can arise when usingrandomized sketching or sampling methods to produce smaller representations oflarge datasets: For certain tasks, these sketched representations have beenobserved to exhibit many robust performance characteristics that are known tooccur when a data sample comes from a sub-gaussian random design, which is apowerful statistical model of data distributions. However, this phenomenon hasonly been studied for specific tasks and metrics, or by relying oncomputationally expensive methods. We address this by providing an algorithmicframework for gaussianizing data distributions via averaging, proving that itis possible to efficiently construct data sketches that are nearlyindistinguishable (in terms of total variation distance) from sub-gaussianrandom designs. In particular, relying on a recently introduced sketchingtechnique called Leverage Score Sparsified (LESS) embeddings, we show that onecan construct an $ntimes d$ sketch of an $Ntimes d$ matrix $A$, where $nllN$, that is nearly indistinguishable from a sub-gaussian design, in time$O(text{nnz}(A)log N + nd^2)$, where $text{nnz}(A)$ is the number ofnon-zero entries in $A$. As a consequence, strong statistical guarantees andprecise asymptotics available for the estimators produced from sub-gaussiandesigns (e.g., for least squares and Lasso regression, covariance estimation,low-rank approximation, etc.) can be straightforwardly adapted to our sketchingframework. We illustrate this with a new approximation guarantee for sketchedleast squares, among other examples.", "output": "Algorithmic Gaussianization through Sketching: Converting Data into Sub-gaussian Random Designs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning methods have significantly improved in their predictivecapabilities, but at the same time they are becoming more complex and lesstransparent. As a result, explainers are often relied on to provideinterpretability to these black-box prediction models. As crucial diagnosticstools, it is important that these explainers themselves are robust. In thispaper we focus on one particular aspect of robustness, namely that an explainershould give similar explanations for similar data inputs. We formalize thisnotion by introducing and defining explainer astuteness, analogous toastuteness of prediction functions. Our formalism allows us to connectexplainer robustness to the predictor's probabilistic Lipschitzness, whichcaptures the probability of local smoothness of a function. We provide lowerbound guarantees on the astuteness of a variety of explainers (e.g., SHAP,RISE, CXPlain) given the Lipschitzness of the prediction function. Thesetheoretical results imply that locally smooth prediction functions lendthemselves to locally robust explanations. We evaluate these resultsempirically on simulated as well as real datasets.", "output": "Analyzing Explainer Robustness via Lipschitzness of Prediction Functions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work we propose an algorithm for trace recovery from stochasticallyknown logs, a setting that is becoming more common with the increasing numberof sensors and predictive models that generate uncertain data. The suggestedapproach calculates the conformance between a process model and astochastically known trace and recovers the best alignment within thisstochastic trace as the true trace. The paper offers an analysis of the impactof various cost models on trace recovery accuracy and makes use of a productmulti-graph to compare alternative trace recovery options. The average accuracyof our approach, evaluated using two publicly available datasets, isimpressive, with an average recovery accuracy score of 90-97%, significantlyimproving a common heuristic that chooses the most likely value for eachuncertain activity. We believe that the effectiveness of the proposed algorithmin recovering correct traces from stochastically known logs may be a powerfulaid for developing credible decision-making tools in uncertain settings.", "output": "Trace Recovery from Stochastically Known Logs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One powerful paradigm in visual navigation is to predict actions fromobservations directly. Training such an end-to-end system allowsrepresentations useful for downstream tasks to emerge automatically. However,the lack of inductive bias makes this system data inefficient. We hypothesize asufficient representation of the current view and the goal view for anavigation policy can be learned by predicting the location and size of a cropof the current view that corresponds to the goal. We further show that trainingsuch random crop prediction in a self-supervised fashion purely on syntheticnoise images transfers well to natural home images. The learned representationcan then be bootstrapped to learn a navigation policy efficiently with littleinteraction data. The code is available at ", "output": "Visual Pre-training for Navigation: What Can We Learn from Noise?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The regression of a functional response on a set of scalar predictors can bea challenging task, especially if there is a large number of predictors, or therelationship between those predictors and the response is nonlinear. In thiswork, we propose a solution to this problem: a feed-forward neural network (NN)designed to predict a functional response using scalar inputs. First, wetransform the functional response to a finite-dimensional representation andconstruct an NN that outputs this representation. Then, we propose to modifythe output of an NN via the objective function and introduce differentobjective functions for network training. The proposed models are suited forboth regularly and irregularly spaced data, and a roughness penalty can befurther applied to control the smoothness of the predicted curve. Thedifficulty in implementing both those features lies in the definition ofobjective functions that can be back-propagated. In our experiments, wedemonstrate that our model outperforms the conventional function-on-scalarregression model in multiple scenarios while computationally scaling betterwith the dimension of the predictors.", "output": "Neural Networks for Scalar Input and Functional Output."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This note serves three purposes: (i) we provide a self-contained expositionof the fact that conjunctive queries are not efficiently learnable in theProbably-Approximately-Correct (PAC) model, paying clear attention to thecomplicating fact that this concept class lacks the polynomial-size fittingproperty, a property that is tacitly assumed in much of the computationallearning theory literature; (ii) we establish a strong negative PAClearnability result that applies to many restricted classes of conjunctivequeries (CQs), including acyclic CQs for a wide range of notions of\"acyclicity\"; (iii) we show that CQs (and UCQs) are efficiently PAC learnablewith membership queries.", "output": "On the non-efficient PAC learnability of conjunctive queries."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training reinforcement learning (RL) agents using scalar reward signals isoften infeasible when an environment has sparse and non-Markovian rewards.Moreover, handcrafting these reward functions before training is prone tomisspecification, especially when the environment's dynamics are only partiallyknown. This paper proposes a novel pipeline for learning non-Markovian taskspecifications as succinct finite-state `task automata' from episodes of agentexperience within unknown environments. We leverage two key algorithmicinsights. First, we learn a product MDP, a model composed of thespecification's automaton and the environment's MDP (both initially unknown),by treating the product MDP as a partially observable MDP and using thewell-known Baum-Welch algorithm for learning hidden Markov models. Second, wepropose a novel method for distilling the task automaton (assumed to be adeterministic finite automaton) from the learnt product MDP. Our learnt taskautomaton enables the decomposition of a task into its constituent sub-tasks,which improves the rate at which an RL agent can later synthesise an optimalpolicy. It also provides an interpretable encoding of high-level environmentaland task features, so a human can readily verify that the agent has learntcoherent tasks with no misspecifications. In addition, we take steps towardsensuring that the learnt automaton is environment-agnostic, making itwell-suited for use in transfer learning. Finally, we provide experimentalresults compared with two baselines to illustrate our algorithm's performancein different environments and tasks.", "output": "Learning Task Automata for Reinforcement Learning using Hidden Markov Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph neural networks (GNNs) have become compelling models designed toperform learning and inference on graph-structured data. However, little workhas been done to understand the fundamental limitations of GNNs for scaling tolarger graphs and generalizing to out-of-distribution (OOD) inputs. In thispaper, we use a random graph generator to systematically investigate how thegraph size and structural properties affect the predictive performance of GNNs.We present specific evidence that the average node degree is a key feature indetermining whether GNNs can generalize to unseen graphs, and that the use ofmultiple node update functions can improve the generalization performance ofGNNs when dealing with graphs of multimodal degree distributions. Accordingly,we propose a multi-module GNN framework that allows the network to adaptflexibly to new graphs by generalizing a single canonical nonlineartransformation over aggregated inputs. Our results show that the multi-moduleGNNs improve the OOD generalization on a variety of inference tasks in thedirection of diverse structural features.", "output": "Towards Better Generalization with Flexible Representation of Multi-Module Graph Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The rapid advancement of models based on artificial intelligence demandsinnovative monitoring techniques which can operate in real time with lowcomputational costs. In machine learning, especially if we consider artificialneural networks (ANNs), the models are often trained in a supervised manner.Consequently, the learned relationship between the input and the output mustremain valid during the model's deployment. If this stationarity assumptionholds, we can conclude that the ANN provides accurate predictions. Otherwise,the retraining or rebuilding of the model is required. We propose consideringthe latent feature representation of the data (called \"embedding\") generated bythe ANN to determine the time when the data stream starts being nonstationary.In particular, we monitor embeddings by applying multivariate control chartsbased on the data depth calculation and normalized ranks. The performance ofthe introduced method is compared with benchmark approaches for various ANNarchitectures and different underlying data formats.", "output": "Statistical process monitoring of artificial neural networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Nonconvex-nonconcave minimax optimization has gained widespread interest overthe last decade. However, most existing works focus on variants of gradientdescent-ascent (GDA) algorithms, which are only applicable to smoothnonconvex-concave settings. To address this limitation, we propose a novelalgorithm named smoothed proximal linear descent-ascent (smoothed PLDA), whichcan effectively handle a broad range of structured nonsmoothnonconvex-nonconcave minimax problems. Specifically, we consider the settingwhere the primal function has a nonsmooth composite structure and the dualfunction possesses the Kurdyka-Lojasiewicz (KL) property with exponent $thetain [0,1)$. We introduce a novel convergence analysis framework for smoothedPLDA, the key components of which are our newly developed nonsmooth primalerror bound and dual error bound. Using this framework, we show that smoothedPLDA can find both $epsilon$-game-stationary points and$epsilon$-optimization-stationary points of the problems of interest in$mathcal{O}(epsilon^{-2max{2theta,1}})$ iterations. Furthermore, when$theta in [0,frac{1}{2}]$, smoothed PLDA achieves the optimal iterationcomplexity of $mathcal{O}(epsilon^{-2})$. To further demonstrate theeffectiveness and wide applicability of our analysis framework, we show thatcertain max-structured problem possesses the KL property with exponent$theta=0$ under mild assumptions. As a by-product, we establishalgorithm-independent quantitative relationships among various stationarityconcepts, which may be of independent interest.", "output": "Nonsmooth Nonconvex-Nonconcave Minimax Optimization: Primal-Dual Balancing and Iteration Complexity Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Inference of transfer operators from data is often formulated as a classicalproblem that hinges on the Ulam method. The conventional description, known asthe Ulam-Galerkin method, involves projecting onto basis functions representedas characteristic functions supported over a fine grid of rectangles. From thisperspective, the Ulam-Galerkin approach can be interpreted as densityestimation using the histogram method. In this study, we recast the problemwithin the framework of statistical density estimation. This alternativeperspective allows for an explicit and rigorous analysis of bias and variance,thereby facilitating a discussion on the mean square error. Throughcomprehensive examples utilizing the logistic map and a Markov map, wedemonstrate the validity and effectiveness of this approach in estimating theeigenvectors of the Frobenius-Perron operator. We compare the performance ofHistogram Density Estimation(HDE) and Kernel Density Estimation(KDE) methodsand find that KDE generally outperforms HDE in terms of accuracy. However, itis important to note that KDE exhibits limitations around boundary points andjumps. Based on our research findings, we suggest the possibility ofincorporating other density estimation methods into this field and proposefuture investigations into the application of KDE-based estimation forhigh-dimensional maps. These findings provide valuable insights for researchersand practitioners working on estimating the Frobenius-Perron operator andhighlight the potential of density estimation techniques in this area of study.Keywords: Transfer Operators; Frobenius-Perron operator; probability densityestimation; Ulam-Galerkin method; Kernel Density Estimation; Histogram DensityEstimation.", "output": "Learning Transfer Operators by Kernel Density Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Solving parity games is a major building block for numerous applications inreactive program verification and synthesis. While they can be solvedefficiently in practice, no known approach has a polynomial worst-case runtimecomplexity. We present a incomplete polynomial-time approach to determining thewinning regions of parity games via graph neural networks.Our evaluation on 900 randomly generated parity games shows that thisapproach is effective and efficient in practice. It correctly determines thewinning regions of $sim$60% of the games in our data set and only incursminor errors in the remaining ones. We believe that this approach can beextended to efficiently solve parity games as well.", "output": "Predicting Winning Regions in Parity Games via Graph Neural Networks (Extended Abstract)."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Internet contains a wealth of knowledge -- from the birthdays ofhistorical figures to tutorials on how to code -- all of which may be learnedby language models. However, while certain pieces of information are ubiquitouson the web, others appear extremely rarely. In this paper, we study therelationship between the knowledge memorized by large language models and theinformation in pre-training datasets scraped from the web. In particular, weshow that a language model's ability to answer a fact-based question relates tohow many documents associated with that question were seen during pre-training.We identify these relevant documents by entity linking pre-training datasetsand counting documents that contain the same entities as a givenquestion-answer pair. Our results demonstrate strong correlational and causalrelationships between accuracy and relevant document count for numerousquestion answering datasets (e.g., TriviaQA), pre-training corpora (e.g.,ROOTS), and model sizes (e.g., 176B parameters). Moreover, while larger modelsare better at learning long-tail knowledge, we estimate that today's modelsmust be scaled by many orders of magnitude to reach competitive QA performanceon questions with little support in the pre-training data. Finally, we showthat retrieval-augmentation can reduce the dependence on relevant pre-traininginformation, presenting a promising approach for capturing the long-tail.", "output": "Large Language Models Struggle to Learn Long-Tail Knowledge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Stochastic restoration algorithms allow to explore the space of solutionsthat correspond to the degraded input. In this paper we reveal additionalfundamental advantages of stochastic methods over deterministic ones, whichfurther motivate their use. First, we prove that any restoration algorithm thatattains perfect perceptual quality and whose outputs are consistent with theinput must be a posterior sampler, and is thus required to be stochastic.Second, we illustrate that while deterministic restoration algorithms mayattain high perceptual quality, this can be achieved only by filling up thespace of all possible source images using an extremely sensitive mapping, whichmakes them highly vulnerable to adversarial attacks. Indeed, we show thatenforcing deterministic models to be robust to such attacks profoundly hinderstheir perceptual quality, while robustifying stochastic models hardlyinfluences their perceptual quality, and improves their output variability.These findings provide a motivation to foster progress in stochasticrestoration methods, paving the way to better recovery algorithms.", "output": "Reasons for the Superiority of Stochastic Estimators over Deterministic Ones: Robustness, Consistency and Perceptual Quality."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unsupervised Domain Adaptation (UDA) has emerged as a powerful solution forthe domain shift problem via transferring the knowledge from a labeled sourcedomain to a shifted unlabeled target domain. Despite the prevalence of UDA forvisual applications, it remains relatively less explored for time-seriesapplications. In this work, we propose a novel lightweight contrastive domainadaptation framework called CoTMix for time-series data. Unlike existingapproaches that either use statistical distances or adversarial techniques, weleverage contrastive learning solely to mitigate the distribution shift acrossthe different domains. Specifically, we propose a novel temporal mixup strategyto generate two intermediate augmented views for the source and target domains.Subsequently, we leverage contrastive learning to maximize the similaritybetween each domain and its corresponding augmented view. The generated viewsconsider the temporal dynamics of time-series data during the adaptationprocess while inheriting the semantics among the two domains. Hence, wegradually push both domains towards a common intermediate space, mitigating thedistribution shift across them. Extensive experiments conducted on fivereal-world time-series datasets show that our approach can significantlyoutperform all state-of-the-art UDA methods. The implementation code of CoTMixis available athref{", "output": "Contrastive Domain Adaptation for Time-Series via Temporal Mixup."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural network (NN) potentials promise highly accurate molecular dynamics(MD) simulations within the computational complexity of classical MD forcefields. However, when applied outside their training domain, NN potentialpredictions can be inaccurate, increasing the need for UncertaintyQuantification (UQ). Bayesian modeling provides the mathematical framework forUQ, but classical Bayesian methods based on Markov chain Monte Carlo (MCMC) arecomputationally intractable for NN potentials. By training graph NN potentialsfor coarse-grained systems of liquid water and alanine dipeptide, wedemonstrate here that scalable Bayesian UQ via stochastic gradient MCMC(SG-MCMC) yields reliable uncertainty estimates for MD observables. We showthat cold posteriors can reduce the required training data size and that forreliable UQ, multiple Markov chains are needed. Additionally, we find thatSG-MCMC and the Deep Ensemble method achieve comparable results, despiteshorter training and less hyperparameter tuning of the latter. We show thatboth methods can capture aleatoric and epistemic uncertainty reliably, but notsystematic uncertainty, which needs to be minimized by adequate modeling toobtain accurate credible intervals for MD observables. Our results represent astep towards accurate UQ that is of vital importance for trustworthy NNpotential-based MD simulations required for decision-making in practice.", "output": "Scalable Bayesian Uncertainty Quantification for Neural Network Potentials: Promise and Pitfalls."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Mixup is a popular data augmentation technique for training deep neuralnetworks where additional samples are generated by linearly interpolating pairsof inputs and their labels. This technique is known to improve thegeneralization performance in many learning paradigms and applications. In thiswork, we first analyze Mixup and show that it implicitly regularizes infinitelymany directional derivatives of all orders. Based on this new insight, wepropose an improved version of Mixup, theoretically justified to deliver bettergeneralization performance than the vanilla Mixup. To demonstrate theeffectiveness of the proposed method, we conduct experiments across variousdomains such as images, tabular data, speech, and graphs. Our results show thatthe proposed method improves Mixup across multiple datasets using a variety ofarchitectures, for instance, exhibiting an improvement over Mixup by 0.8% inImageNet top-1 accuracy.", "output": "MixupE: Understanding and Improving Mixup from Directional Derivative Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Constraint programming is known for being an efficient approach for solvingcombinatorial problems. Important design choices in a solver are the branchingheuristics, which are designed to lead the search to the best solutions in aminimum amount of time. However, developing these heuristics is atime-consuming process that requires problem-specific expertise. Thisobservation has motivated many efforts to use machine learning to automaticallylearn efficient heuristics without expert intervention. To the best of ourknowledge, it is still an open research question. Although several genericvariable-selection heuristics are available in the literature, the options fora generic value-selection heuristic are more scarce. In this paper, we proposeto tackle this issue by introducing a generic learning procedure that can beused to obtain a value-selection heuristic inside a constraint programmingsolver. This has been achieved thanks to the combination of a deep Q-learningalgorithm, a tailored reward signal, and a heterogeneous graph neural networkarchitecture. Experiments on graph coloring, maximum independent set, andmaximum cut problems show that our framework is able to find better solutionsclose to optimality without requiring a large amounts of backtracks while beinggeneric.", "output": "Learning a Generic Value-Selection Heuristic Inside a Constraint Programming Solver."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing causal models for link prediction assume an underlying set ofinherent node factors -- an innate characteristic defined at the node's birth-- that governs the causal evolution of links in the graph. In some causaltasks, however, link formation is path-dependent: The outcome of linkinterventions depends on existing links. Unfortunately, these existing causalmethods are not designed for path-dependent link formation, as the cascadingfunctional dependencies between links (arising from path dependence) are eitherunidentifiable or require an impractical number of control variables. Toovercome this, we develop the first causal model capable of dealing with pathdependencies in link prediction. In this work we introduce the concept ofcausal lifting, an invariance in causal models of independent interest that, ongraphs, allows the identification of causal link prediction queries usinglimited interventional data. Further, we show how structural pairwiseembeddings exhibit lower bias and correctly represent the task's causalstructure, as opposed to existing node embeddings, e.g., graph neural networknode embeddings and matrix factorization. Finally, we validate our theoreticalfindings on three scenarios for causal link prediction tasks: knowledge basecompletion, covariance matrix estimation and consumer-product recommendations.", "output": "Causal Lifting and Link Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present Factor Fields, a novel framework for modeling and representingsignals. Factor Fields decomposes a signal into a product of factors, eachrepresented by a classical or neural field representation which operates ontransformed input coordinates. This decomposition results in a unifiedframework that accommodates several recent signal representations includingNeRF, Plenoxels, EG3D, Instant-NGP, and TensoRF. Additionally, our frameworkallows for the creation of powerful new signal representations, such as the\"Dictionary Field\" (DiF) which is a second contribution of this paper. Ourexperiments show that DiF leads to improvements in approximation quality,compactness, and training time when compared to previous fast reconstructionmethods. Experimentally, our representation achieves better image approximationquality on 2D image regression tasks, higher geometric quality whenreconstructing 3D signed distance fields, and higher compactness for radiancefield reconstruction tasks. Furthermore, DiF enables generalization to unseenimages/3D scenes by sharing bases across signals during training which greatlybenefits use cases such as image regression from sparse observations andfew-shot radiance field reconstruction.", "output": "Factor Fields: A Unified Framework for Neural Fields and Beyond."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we explore a framework for contextual decision-making to studyhow the relevance and quantity of past data affects the performance of adata-driven policy. We analyze a contextual Newsvendor problem in which adecision-maker needs to trade-off between an underage and an overage cost inthe face of uncertain demand. We consider a setting in which past demandsobserved under ``close by'' contexts come from close by distributions andanalyze the performance of data-driven algorithms through a notion ofcontext-dependent worst-case expected regret. We analyze the broad class ofWeighted Empirical Risk Minimization (WERM) policies which weigh past dataaccording to their similarity in the contextual space. This class includesclassical policies such as ERM, k-Nearest Neighbors and kernel-based policies.Our main methodological contribution is to characterize exactly the worst-caseregret of any WERM policy on any given configuration of contexts. To the bestof our knowledge, this provides the first understanding of tight performanceguarantees in any contextual decision-making problem, with past literaturefocusing on upper bounds via concentration inequalities. We instead take anoptimization approach, and isolate a structure in the Newsvendor loss functionthat allows to reduce the infinite-dimensional optimization problem overworst-case distributions to a simple line search.This in turn allows us to unveil fundamental insights that were obfuscated byprevious general-purpose bounds. We characterize actual guaranteed performanceas a function of the contexts, as well as granular insights on the learningcurve of algorithms.", "output": "From Contextual Data to Newsvendor Decisions: On the Actual Performance of Data-Driven Algorithms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper explores the application of emerging machine learning methods fromimage super-resolution (SR) to the task of statistical downscaling. Wespecifically focus on convolutional neural network-based Generative AdversarialNetworks (GANs). Our GANs are conditioned on low-resolution (LR) inputs togenerate high-resolution (HR) surface winds emulating Weather Research andForecasting (WRF) model simulations over North America. Unlike traditional SRmodels, where LR inputs are idealized coarsened versions of the HR images, WRFemulation involves using non-idealized LR and HR pairs resulting inshared-scale mismatches due to internal variability. Our study builds uponcurrent SR-based statistical downscaling by experimenting with a novelfrequency-separation (FS) approach from the computer vision field. To assessthe skill of SR models, we carefully select evaluation metrics, and focus onperformance measures based on spatial power spectra. Our analyses reveal howGAN configurations influence spatial structures in the generated fields,particularly biases in spatial variability spectra. Using power spectra toevaluate the FS experiments reveals that successful applications of FS incomputer vision do not translate to climate fields. However, the FS experimentsdemonstrate the sensitivity of power spectra to a commonly used GAN-based SRobjective function, which helps interpret and understand its role indetermining spatial structures. This result motivates the development of anovel partial frequency-separation scheme as a promising configuration option.We also quantify the influence on GAN performance of non-idealized LR fieldsresulting from internal variability. Furthermore, we conduct a spectra-basedfeature-importance experiment allowing us to explore the dependence of thespatial structure of generated fields on different physically relevant LRcovariates.", "output": "Algorithmic Hallucinations of Near-Surface Winds: Statistical Downscaling with Generative Adversarial Networks to Convection-Permitting Scales."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "An Electroencephalogram (EEG) is a non-invasive exam that records theelectrical activity of the brain. This exam is used to help diagnose conditionssuch as different brain problems. EEG signals are taken for the purpose ofepilepsy detection and with Discrete Wavelet Transform (DWT) and machinelearning classifier, they perform epilepsy detection. In Epilepsy seizuredetection, mainly machine learning classifiers and statistical features areused. The hidden information in the EEG signal is useful for detecting diseasesaffecting the brain. Sometimes it is very difficult to identify the minimumchanges in the EEG in the time and frequency domains purpose. The DWT can givea good decomposition of the signals in different frequency bands and featureextraction. We use the tri-dimensionality reduction algorithm.; PrincipalComponent Analysis (PCA), Independent Component Analysis (ICA), and LinearDiscriminant Analysis (LDA). Finally, features are selected by using a fusionrule and at the last step three different classifiers Support Vector Machine(SVM), Naive Bayes (NB) and K-Nearest-Neighbor(KNN) have been used individuallyfor the classification. The proposed framework is tested on the Bonn datasetand the simulation results provide the accuracy for the combination of LDA andSVM 89.17%, LDA and KNN 80.42%, PCA and NB 89.92%, PCA and SVM 85.58%, PCA andKNN 80.42%, ICA and NB 82.33%, ICA and SVM 90.42%, and ICA and KNN 90%, LDA andNB 100%, accuracy. It shows the sensitivity, specificity, accuracy, Precision,and Recall of 100%, 100%, 100%, 100%, and 100%. This combination of LDA with NBmethod provides the accuracy of 100% outperforming all existing methods. Theresults prove the effectiveness of this model.", "output": "Empirical analysis of Different Dimensionality Reduction and classification Techniques for Epileptic Seizure detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Self-supervised learning (SSL) strategies have demonstrated remarkableperformance in various recognition tasks. However, both our preliminaryinvestigation and recent studies suggest that they may be less effective inlearning representations for fine-grained visual recognition (FGVR) since manyfeatures helpful for optimizing SSL objectives are not suitable forcharacterizing the subtle differences in FGVR. To overcome this issue, wepropose learning an additional screening mechanism to identify discriminativeclues commonly seen across instances and classes, dubbed as common rationalesin this paper. Intuitively, common rationales tend to correspond to thediscriminative patterns from the key parts of foreground objects. We show thata common rationale detector can be learned by simply exploiting the GradCAMinduced from the SSL objective without using any pre-trained object parts orsaliency detectors, making it seamlessly to be integrated with the existing SSLprocess. Specifically, we fit the GradCAM with a branch with limited fittingcapacity, which allows the branch to capture the common rationales and discardthe less common discriminative patterns. At the test stage, the branchgenerates a set of spatial weights to selectively aggregate featuresrepresenting an instance. Extensive experimental results on four visual tasksdemonstrate that the proposed method can lead to a significant improvement indifferent evaluation settings.", "output": "Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Latent linear dynamical systems with Bernoulli observations provide apowerful modeling framework for identifying the temporal dynamics underlyingbinary time series data, which arise in a variety of contexts such as binarydecision-making and discrete stochastic processes (e.g., binned neural spiketrains). Here we develop a spectral learning method for fast, efficient fittingof probit-Bernoulli latent linear dynamical system (LDS) models. Our approachextends traditional subspace identification methods to the Bernoulli settingvia a transformation of the first and second sample moments. This results in arobust, fixed-cost estimator that avoids the hazards of local optima and thelong computation time of iterative fitting procedures like theexpectation-maximization (EM) algorithm. In regimes where data is limited orassumptions about the statistical structure of the data are not met, wedemonstrate that the spectral estimate provides a good initialization forLaplace-EM fitting. Finally, we show that the estimator provides substantialbenefits to real world settings by analyzing data from mice performing asensory decision-making task.", "output": "Spectral learning of Bernoulli linear dynamical systems models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automatic Robotic Assembly Sequence Planning (RASP) can significantly improveproductivity and resilience in modern manufacturing along with the growing needfor greater product customization. One of the main challenges in realizing suchautomation resides in efficiently finding solutions from a growing number ofpotential sequences for increasingly complex assemblies. Besides, costlyfeasibility checks are always required for the robotic system. To address this,we propose a holistic graphical approach including a graph representationcalled Assembly Graph for product assemblies and a policy architecture, GraphAssembly Processing Network, dubbed GRACE for assembly sequence generation.With GRACE, we are able to extract meaningful information from the graph inputand predict assembly sequences in a step-by-step manner. In experiments, weshow that our approach can predict feasible assembly sequences across productvariants of aluminum profiles based on data collected in simulation of adual-armed robotic system. We further demonstrate that our method is capable ofdetecting infeasible assemblies, substantially alleviating the undesirableimpacts from false predictions, and hence facilitating real-world deploymentsoon. Code and training data are available at ", "output": "Efficient and Feasible Robotic Assembly Sequence Planning via Graph Representation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Low-count PET is an efficient way to reduce radiation exposure andacquisition time, but the reconstructed images often suffer from lowsignal-to-noise ratio (SNR), thus affecting diagnosis and other downstreamtasks. Recent advances in deep learning have shown great potential in improvinglow-count PET image quality, but acquiring a large, centralized, and diversedataset from multiple institutions for training a robust model is difficult dueto privacy and security concerns of patient data. Moreover, low-count PET dataat different institutions may have different data distribution, thus requiringpersonalized models. While previous federated learning (FL) algorithms enablemulti-institution collaborative training without the need of aggregating localdata, addressing the large domain shift in the application ofmulti-institutional low-count PET denoising remains a challenge and is stillhighly under-explored. In this work, we propose FedFTN, a personalizedfederated learning strategy that addresses these challenges. FedFTN uses alocal deep feature transformation network (FTN) to modulate the feature outputsof a globally shared denoising network, enabling personalized low-count PETdenoising for each institution. During the federated learning process, only thedenoising network's weights are communicated and aggregated, while the FTNremains at the local institutions for feature transformation. We evaluated ourmethod using a large-scale dataset of multi-institutional low-count PET imagingdata from three medical centers located across three continents, and showedthat FedFTN provides high-quality low-count PET images, outperforming previousbaseline FL reconstruction methods across all low-count levels at all threeinstitutions.", "output": "FedFTN: Personalized Federated Learning with Deep Feature Transformation Network for Multi-institutional Low-count PET Denoising."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robots operating in real-world environments must reason about possibleoutcomes of stochastic actions and make decisions based on partial observationsof the true world state. A major challenge for making accurate and robustaction predictions is the problem of confounding, which if left untreated canlead to prediction errors. The partially observable Markov decision process(POMDP) is a widely-used framework to model these stochastic andpartially-observable decision-making problems. However, due to a lack ofexplicit causal semantics, POMDP planning methods are prone to confounding biasand thus in the presence of unobserved confounders may produce underperformingpolicies. This paper presents a novel causally-informed extension of \"anytimeregularized determinized sparse partially observable tree\" (AR-DESPOT), amodern anytime online POMDP planner, using causal modelling and inference toeliminate errors caused by unmeasured confounder variables. We further proposea method to learn offline the partial parameterisation of the causal model forplanning, from ground truth model data. We evaluate our methods on a toyproblem with an unobserved confounder and show that the learned causal model ishighly accurate, while our planning method is more robust to confounding andproduces overall higher performing policies than AR-DESPOT.", "output": "CAR-DESPOT: Causally-Informed Online POMDP Planning for Robots in Confounded Environments."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "An end-to-end machine learning (ML) lifecycle consists of many iterativeprocesses, from data preparation and ML model design to model training and thendeploying the trained model for inference. When building an end-to-endlifecycle for an ML problem, many ML pipelines must be designed and executedthat produce a huge number of lifecycle versions. Therefore, this paperintroduces VeML, a Version management system dedicated to end-to-end MLLifecycle. Our system tackles several crucial problems that other systems havenot solved. First, we address the high cost of building an ML lifecycle,especially for large-scale and high-dimensional dataset. We solve this problemby proposing to transfer the lifecycle of similar datasets managed in oursystem to the new training data. We design an algorithm based on the core setto compute similarity for large-scale, high-dimensional data efficiently.Another critical issue is the model accuracy degradation by the differencebetween training data and testing data during the ML lifetime, which leads tolifecycle rebuild. Our system helps to detect this mismatch without gettinglabeled data from testing data and rebuild the ML lifecycle for a new dataversion. To demonstrate our contributions, we conduct experiments onreal-world, large-scale datasets of driving images and spatiotemporal sensordata and show promising results.", "output": "VeML: An End-to-End Machine Learning Lifecycle for Large-scale and High-dimensional Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) is the most popular distributed machine learningtechnique. However, implementation of FL over modern wireless networks faceskey challenges caused by (i) dynamics of the network conditions and (ii) thecoexistence of multiple FL services/tasks and other network services in thesystem, which are not jointly considered in prior works. Motivated by thesechallenges, we introduce a generic FL paradigm over NextG networks, calleddynamic multi-service FL (DMS-FL). We identify three unexplored designconsiderations in DMS-FL: (i) FL service operator accumulation, (ii) wirelessresource fragmentation, and (iii) signal strength fluctuations. We take thefirst steps towards addressing these design considerations by proposing a noveldistributed ML architecture called elastic virtualized FL (EV-FL). EV-FLunleashes the full potential of Open RAN (O-RAN) systems and introduces anelastic resource provisioning methodology to execute FL services. It furtherconstitutes a multi-time-scale FL management system that introduces threedimensions into existing FL architectures: (i) virtualization, (ii)scalability, and (iii) elasticity. Through investigating EV-FL, we reveal aseries of open research directions for future work. We finally simulate EV-FLto demonstrate its potential in saving wireless resources and increasingfairness among FL services.", "output": "Synergies Between Federated Learning and O-RAN: Towards an Elastic Virtualized Architecture for Multiple Distributed Machine Learning Services."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Fuzzy time series forecasting (FTSF) is a typical forecasting method withwide application. Traditional FTSF is regarded as an expert system which leadsto loss of the ability to recognize undefined features. The mentioned is themain reason for poor forecasting with FTSF. To solve the problem, the proposedmodel Differential Fuzzy Convolutional Neural Network (DFCNN) utilizes aconvolution neural network to re-implement FTSF with learnable ability. DFCNNis capable of recognizing potential information and improving forecastingaccuracy. Thanks to the learnable ability of the neural network, the length offuzzy rules established in FTSF is expended to an arbitrary length that theexpert is not able to handle by the expert system. At the same time, FTSFusually cannot achieve satisfactory performance of non-stationary time seriesdue to the trend of non-stationary time series. The trend of non-stationarytime series causes the fuzzy set established by FTSF to be invalid and causesthe forecasting to fail. DFCNN utilizes the Difference algorithm to weaken thenon-stationary of time series so that DFCNN can forecast the non-stationarytime series with a low error that FTSF cannot forecast in satisfactoryperformance. After the mass of experiments, DFCNN has an excellent predictioneffect, which is ahead of the existing FTSF and common time series forecastingalgorithms. Finally, DFCNN provides further ideas for improving FTSF and holdscontinued research value.", "output": "Differential Convolutional Fuzzy Time Series Forecasting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Towards safe autonomous driving (AD), we consider the problem of learningmodels that accurately capture the diversity and tail quantiles of human driverbehavior probability distributions, in interaction with an AD vehicle. Suchmodels, which predict drivers' continuous actions from their states, areparticularly relevant for closing the gap between AD agent simulations andreality. To this end, we adapt two flexible quantile learning frameworks forthis setting that avoid strong distributional assumptions: (1) quantileregression (based on the titled absolute loss), and (2) autoregressive quantileflows (a version of normalizing flows). Training happens in a behaviorcloning-fashion. We use the highD dataset consisting of driver trajectories onseveral highways. We evaluate our approach in a one-step accelerationprediction task, and in multi-step driver simulation rollouts. We reportquantitative results using the tilted absolute loss as metric, give qualitativeexamples showing that realistic extremal behavior can be learned, and discussthe main insights.", "output": "On Learning the Tail Quantiles of Driving Behavior Distributions via Quantile Regression and Flows."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent works show that the data distribution in a network's latent space isuseful for estimating classification uncertainty and detectingOut-of-distribution (OOD) samples. To obtain a well-regularized latent spacethat is conducive for uncertainty estimation, existing methods bring insignificant changes to model architectures and training procedures. In thispaper, we present a lightweight, fast, and high-performance regularizationmethod for Mahalanobis distance-based uncertainty prediction, and that requiresminimal changes to the network's architecture. To derive Gaussian latentrepresentation favourable for Mahalanobis Distance calculation, we introduce aself-supervised representation learning method that separates in-classrepresentations into multiple Gaussians. Classes with non-Gaussianrepresentations are automatically identified and dynamically clustered intomultiple new classes that are approximately Gaussian. Evaluation on standardOOD benchmarks shows that our method achieves state-of-the-art results on OODdetection with minimal inference time, and is very competitive on predictiveprobability calibration. Finally, we show the applicability of our method to areal-life computer vision use case on microorganism classification.", "output": "Gaussian Latent Representations for Uncertainty Estimation using Mahalanobis Distance in Deep Classifiers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recognition problems in long-tailed data, where the sample size per class isheavily skewed, have recently gained importance because the distribution of thesample size per class in a dataset is generally exponential unless the samplesize is intentionally adjusted. Various approaches have been devised to addressthese problems. Recently, weight balancing, which combines well-known classicalregularization techniques with two-stage training, has been proposed. Despiteits simplicity, it is known for its high performance against existing methodsdevised in various ways. However, there is a lack of understanding as to whythis approach is effective for long-tailed data. In this study, we analyze themethod focusing on neural collapse and cone effect at each training stage andfind that it can be decomposed into the increase in Fisher's discriminant ratioof the feature extractor caused by weight decay and cross entropy loss andimplicit logit adjustment caused by weight decay and class-balanced loss. Ouranalysis shows that the training method can be further simplified by reducingthe number of training stages to one while increasing accuracy.", "output": "Exploring Weight Balancing on Long-Tailed Recognition Problem."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Procedural planning, which entails decomposing a high-level goal into asequence of temporally ordered steps, is an important yet intricate task formachines. It involves integrating common-sense knowledge to reason aboutcomplex contextualized situations that are often counterfactual, e.g.\"scheduling a doctor's appointment without a phone\". While current approachesshow encouraging results using large language models (LLMs), they are hinderedby drawbacks such as costly API calls and reproducibility issues. In thispaper, we advocate planning using smaller language models. We present PlaSma, anovel two-pronged approach to endow small language models with proceduralknowledge and (counterfactual) planning capabilities. More concretely, wedevelop symbolic procedural knowledge distillation to enhance the implicitknowledge in small language models and an inference-time algorithm tofacilitate more structured and accurate reasoning. In addition, we introduce anovel task, Counterfactual Planning, that requires a revision of a plan to copewith a counterfactual situation. In both the original and counterfactualsetting, we show that orders-of-magnitude smaller models (770M-11B parameters)can compete and often surpass their larger teacher models' capabilities.", "output": "PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Weighted low rank approximation is a fundamental problem in numerical linearalgebra, and it has many applications in machine learning. Given a matrix $Min mathbb{R}^{n times n}$, a weight matrix $W in mathbb{R}_{geq 0}^{ntimes n}$, a parameter $k$, the goal is to output two matrices $U, V inmathbb{R}^{n times k}$ such that $| W circ (M - U V^top) |_F$ isminimized, where $circ$ denotes the Hadamard product. Such a problem is knownto be NP-hard and even hard to approximate assuming Exponential Time Hypothesis[GG11, RSW16]. Meanwhile, alternating minimization is a good heuristic solutionfor approximating weighted low rank approximation. The work [LLR16] shows that,under mild assumptions, alternating minimization does provide provableguarantees. In this work, we develop an efficient and robust framework foralternating minimization. For weighted low rank approximation, this improvesthe runtime of [LLR16] from $n^2 k^2$ to $n^2k$. At the heart of our workframework is a high-accuracy multiple response regression solver together witha robust analysis of alternating minimization.", "output": "Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider solving partial differential equations (PDEs) with Fourier neuraloperators (FNOs), which operate in the frequency domain. Since the laws ofphysics do not depend on the coordinate system used to describe them, it isdesirable to encode such symmetries in the neural operator architecture forbetter performance and easier learning. While encoding symmetries in thephysical domain using group theory has been studied extensively, how to capturesymmetries in the frequency domain is under-explored. In this work, we extendgroup convolutions to the frequency domain and design Fourier layers that areequivariant to rotations, translations, and reflections by leveraging theequivariance property of the Fourier transform. The resulting $G$-FNOarchitecture generalizes well across input resolutions and performs well insettings with varying levels of symmetry. Our code is publicly available aspart of the AIRS library (", "output": "Group Equivariant Fourier Neural Operators for Partial Differential Equations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Bayesian state and parameter estimation have been automated effectively in avariety of probabilistic programming languages. The process of model comparisonon the other hand, which still requires error-prone and time-consuming manualderivations, is often overlooked despite its importance. This paper efficientlyautomates Bayesian model averaging, selection, and combination by messagepassing on a Forney-style factor graph with a custom mixture node. Parameterand state inference, and model comparison can then be executed simultaneouslyusing message passing with scale factors. This approach shortens the modeldesign cycle and allows for the straightforward extension to hierarchical andtemporal model priors to accommodate for modeling complicated time-varyingprocesses.", "output": "Automating Model Comparison in Factor Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study property prediction for crystal materials. A crystal structureconsists of a minimal unit cell that is repeated infinitely in 3D space. How toaccurately represent such repetitive structures in machine learning modelsremains unresolved. Current methods construct graphs by establishing edges onlybetween nearby nodes, thereby failing to faithfully capture infinite repeatingpatterns and distant interatomic interactions. In this work, we propose severalinnovations to overcome these limitations. First, we propose to modelphysics-principled interatomic potentials directly instead of only usingdistances as in many existing methods. These potentials include the Coulombpotential, London dispersion potential, and Pauli repulsion potential. Second,we model the complete set of potentials among all atoms, instead of onlybetween nearby atoms as in existing methods. This is enabled by ourapproximations of infinite potential summations with provable error bounds. Wefurther develop efficient algorithms to compute the approximations. Finally, wepropose to incorporate our computations of complete interatomic potentials intomessage passing neural networks for representation learning. We performexperiments on the JARVIS and Materials Project benchmarks for evaluation.Results show that the use of interatomic potentials and complete interatomicpotentials leads to consistent performance improvements with reasonablecomputational costs. Our code is publicly available as part of the AIRS library(", "output": "Efficient Approximations of Complete Interatomic Potentials for Crystal Property Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data valuation -- quantifying the contribution of individual data sources tocertain predictive behaviors of a model -- is of great importance to enhancingthe transparency of machine learning and designing incentive systems for datasharing. Existing work has focused on evaluating data sources with the sharedfeature or sample space. How to valuate fragmented data sources of which eachonly contains partial features and samples remains an open question. We startby presenting a method to calculate the counterfactual of removing a fragmentfrom the aggregated data matrix. Based on the counterfactual calculation, wefurther propose 2D-Shapley, a theoretical framework for fragmented datavaluation that uniquely satisfies some appealing axioms in the fragmented datacontext. 2D-Shapley empowers a range of new use cases, such as selecting usefuldata fragments, providing interpretation for sample-wise data values, andfine-grained data issue diagnosis.", "output": "2D-Shapley: A Framework for Fragmented Data Valuation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The option of sharing images, videos and audio files on social media opens upnew possibilities for distinguishing between false information and fake news onthe Internet. Due to the vast amount of data shared every second on socialmedia, not all data can be verified by a computer or a human expert. Here, acheck-worthiness analysis can be used as a first step in the fact-checkingpipeline and as a filtering mechanism to improve efficiency. This paperproposes a novel way of detecting the check-worthiness in multi-modal tweets.It takes advantage of two classifiers, each trained on a single modality. Forimage data, extracting the embedded text with an OCR analysis has shown toperform best. By combining the two classifiers, the proposed solution was ableto place first in the CheckThat! 2023 Task 1A with an F1 score of 0.7297achieved on the private test set.", "output": "Fraunhofer SIT at CheckThat! 2023: Mixing Single-Modal Classifiers to Estimate the Check-Worthiness of Multi-Modal Tweets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Autonomous vehicles and Advanced Driving Assistance Systems (ADAS) have thepotential to radically change the way we travel. Many such vehicles currentlyrely on segmentation and object detection algorithms to detect and trackobjects around its surrounding. The data collected from the vehicles are oftensent to cloud servers to facilitate continual/life-long learning of thesealgorithms. Considering the bandwidth constraints, the data is compressedbefore sending it to servers, where it is typically decompressed for trainingand analysis. In this work, we propose the use of a learning-based compressionCodec to reduce the overhead in latency incurred for the decompressionoperation in the standard pipeline. We demonstrate that the learned compressedrepresentation can also be used to perform tasks like semantic segmentation inaddition to decompression to obtain the images. We experimentally validate theproposed pipeline on the Cityscapes dataset, where we achieve a compressionfactor up to $66 times$ while preserving the information required to performsegmentation with a dice coefficient of $0.84$ as compared to $0.88$ achievedusing decompressed images while reducing the overall compute by $11%$.", "output": "Exploiting Richness of Learned Compressed Representation of Images for Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper describes the second-placed approach developed by the FraunhoferSIT team in the CLEF-2023 CheckThat! lab Task 1B for English. Given a textsnippet from a political debate, the aim of this task is to determine whetherit should be assessed for check-worthiness. Detecting check-worthy statementsaims to facilitate manual fact-checking efforts by prioritizing the claims thatfact-checkers should consider first. It can also be considered as primary stepof a fact-checking system. Our best-performing method took advantage of anensemble classification scheme centered on Model Souping. When applied to theEnglish data set, our submitted model achieved an overall F1 score of 0.878 andwas ranked as the second-best model in the competition.", "output": "Fraunhofer SIT at CheckThat! 2023: Tackling Classification Uncertainty Using Model Souping on the Example of Check-Worthiness Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of clustering privately a dataset in $mathbb{R}^d$that undergoes both insertion and deletion of points. Specifically, we give an$varepsilon$-differentially private clustering mechanism for the $k$-meansobjective under continual observation. This is the first approximationalgorithm for that problem with an additive error that depends onlylogarithmically in the number $T$ of updates. The multiplicative error isalmost the same as non privately. To do so we show how to perform dimensionreduction under continual observation and combine it with a differentiallyprivate greedy approximation algorithm for $k$-means. We also partially extendour results to the $k$-median problem.", "output": "Differential Privacy for Clustering Under Continual Observation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Advanced computational methods are being actively sought for addressing thechallenges associated with discovery and development of new combinatorialmaterial such as formulations. A widely adopted approach involves domaininformed high-throughput screening of individual components that can becombined into a formulation. This manages to accelerate the discovery of newcompounds for a target application but still leave the process of identifyingthe right 'formulation' from the shortlisted chemical space largely alaboratory experiment-driven process. We report a deep learning model,Formulation Graph Convolution Network (F-GCN), that can mapstructure-composition relationship of the individual components to the propertyof liquid formulation as whole. Multiple GCNs are assembled in parallel thatfeaturize formulation constituents domain-intuitively on the fly. The resultingmolecular descriptors are scaled based on respective constituent's molarpercentage in the formulation, followed by formalizing into a combineddescriptor that represents a complete formulation to an external learningarchitecture. The use case of proposed formulation learning model isdemonstrated for battery electrolytes by training and testing it on twoexemplary datasets representing electrolyte formulations vs battery performance-- one dataset is sourced from literature about Li/Cu half-cells, while theother is obtained by lab-experiments related to lithium-iodide full-cellchemistry. The model is shown to predict the performance metrics like CoulombicEfficiency (CE) and specific capacity of new electrolyte formulations withlowest reported errors. The best performing F-GCN model uses moleculardescriptors derived from molecular graphs that are informed with HOMO-LUMO andelectric moment properties of the molecules using a knowledge transfertechnique.", "output": "Formulation Graphs for Mapping Structure-Composition of Battery Electrolytes to Device Performance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep-learning models for traffic data prediction can have superiorperformance in modeling complex functions using a multi-layer architecture.However, a major drawback of these approaches is that most of these approachesdo not offer forecasts with uncertainty estimates, which are essential fortraffic operations and control. Without uncertainty estimates, it is difficultto place any level of trust to the model predictions, and operationalstrategies relying on overconfident predictions can lead to worsening trafficconditions. In this study, we propose a Bayesian recurrent neural networkframework for uncertainty quantification in traffic prediction with highergeneralizability by introducing spectral normalization to its hidden layers. Inour paper, we have shown that normalization alters the training process of deepneural networks by controlling the model's complexity and reducing the risk ofoverfitting to the training data. This, in turn, helps improve thegeneralization performance of the model on out-of-distribution datasets.Results demonstrate that spectral normalization improves uncertainty estimatesand significantly outperforms both the layer normalization and model withoutnormalization in single-step prediction horizons. This improved performance canbe attributed to the ability of spectral normalization to better localize thefeature space of the data under perturbations. Our findings are especiallyrelevant to traffic management applications, where predicting trafficconditions across multiple locations is the goal, but the availability oftraining data from multiple locations is limited. Spectral normalization,therefore, provides a more generalizable approach that can effectively capturethe underlying patterns in traffic data without requiring location-specificmodels.", "output": "A Bayesian approach to quantifying uncertainties and improving generalizability in traffic prediction models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning (DL) approaches are being increasingly used for time-seriesforecasting, with many efforts devoted to designing complex DL models. Recentstudies have shown that the DL success is often attributed to effective datarepresentations, fostering the fields of feature engineering and representationlearning. However, automated approaches for feature learning are typicallylimited with respect to incorporating prior knowledge, identifying interactionsamong variables, and choosing evaluation metrics to ensure that the models arereliable. To improve on these limitations, this paper contributes a novelvisual analytics framework, namely TimeTuner, designed to help analystsunderstand how model behaviors are associated with localized correlations,stationarity, and granularity of time-series representations. The system mainlyconsists of the following two-stage technique: We first leverage counterfactualexplanations to connect the relationships among time-series representations,multivariate features and model predictions. Next, we design multiplecoordinated views including a partition-based correlation matrix and juxtaposedbivariate stripes, and provide a set of interactions that allow users to stepinto the transformation selection process, navigate through the feature space,and reason the model performance. We instantiate TimeTuner with twotransformation methods of smoothing and sampling, and demonstrate itsapplicability on real-world time-series forecasting of univariate sunspots andmultivariate air pollutants. Feedback from domain experts indicates that oursystem can help characterize time-series representations and guide the featureengineering processes.", "output": "TimeTuner: Diagnosing Time Representations for Time-Series Forecasting with Counterfactual Explanations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Semantic segmentation is a common task in autonomous driving to understandthe surrounding environment. Driveable Area Segmentation and Lane Detection areparticularly important for safe and efficient navigation on the road. However,original semantic segmentation models are computationally expensive and requirehigh-end hardware, which is not feasible for embedded systems in autonomousvehicles. This paper proposes a lightweight model for the driveable area andlane line segmentation. TwinLiteNet is designed cheaply but achieves accurateand efficient segmentation results. We evaluate TwinLiteNet on the BDD100Kdataset and compare it with modern models. Experimental results show that ourTwinLiteNet performs similarly to existing approaches, requiring significantlyfewer computational resources. Specifically, TwinLiteNet achieves a mIoU scoreof 91.3% for the Drivable Area task and 31.08% IoU for the Lane Detection taskwith only 0.4 million parameters and achieves 415 FPS on GPU RTX A5000.Furthermore, TwinLiteNet can run in real-time on embedded devices with limitedcomputing power, especially since it achieves 60FPS on Jetson Xavier NX, makingit an ideal solution for self-driving vehicles. Code is available:url{", "output": "TwinLiteNet: An Efficient and Lightweight Model for Driveable Area and Lane Segmentation in Self-Driving Cars."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the vulnerability of fairness-constrained learning to smallamounts of malicious noise in the training data. Konstantinov and Lampert(2021) initiated the study of this question and presented negative resultsshowing there exist data distributions where for several fairness constraints,any proper learner will exhibit high vulnerability when group sizes areimbalanced. Here, we present a more optimistic view, showing that if we allowrandomized classifiers, then the landscape is much more nuanced. For example,for Demographic Parity we show we can incur only a $Theta(alpha)$ loss inaccuracy, where $alpha$ is the malicious noise rate, matching the bestpossible even without fairness constraints. For Equal Opportunity, we show wecan incur an $O(sqrt{alpha})$ loss, and give a matching$Omega(sqrt{alpha})$lower bound. In contrast, Konstantinov and Lampert(2021) showed for proper learners the loss in accuracy for both notions is$Omega(1)$. The key technical novelty of our work is how randomization canbypass simple \"tricks\" an adversary can use to amplify his power. We alsoconsider additional fairness notions including Equalized Odds and Calibration.For these fairness notions, the excess accuracy clusters into three naturalregimes $O(alpha)$,$O(sqrt{alpha})$ and $O(1)$. These results provide a morefine-grained view of the sensitivity of fairness-constrained learning toadversarial noise in training data.", "output": "On the Vulnerability of Fairness Constrained Learning to Malicious Noise."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Self-supervised speech representations (SSSRs) have been successfully appliedto a number of speech-processing tasks, e.g. as feature extractor for speechquality (SQ) prediction, which is, in turn, relevant for assessment andtraining speech enhancement systems for users with normal or impaired hearing.However, exact knowledge of why and how quality-related information is encodedwell in such representations remains poorly understood. In this work,techniques for non-intrusive prediction of SQ ratings are extended to theprediction of intelligibility for hearing-impaired users. It is found thatself-supervised representations are useful as input features to non-intrusiveprediction models, achieving competitive performance to more complex systems. Adetailed analysis of the performance depending on Clarity Prediction Challenge1 listeners and enhancement systems indicates that more data might be needed toallow generalisation to unknown systems and (hearing-impaired) individuals", "output": "Non Intrusive Intelligibility Predictor for Hearing Impaired Individuals using Self Supervised Speech Representations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learned cardinality estimation methods have achieved high precision comparedto traditional methods. Among learned methods, query-driven approaches face thedata and workload drift problem for a long time. Although both query-driven andhybrid methods are proposed to avoid this problem, even the state-of-art ofthem suffer from high training and estimation costs, limited scalability,instability, and long-tailed distribution problem on high cardinality and highdimensional tables, which seriously affects the practical application oflearned cardinality estimators. In this paper, we prove that most of theseproblems are directly caused by the widely used progressive sampling. We solvethis problem by introducing predicates into the autoregressive model andpropose Duet, a stable, efficient, and scalable hybrid method to estimatecardinality directly without sampling or any non-differentiable process, whichcan not only reduces the inference complexity from $O(n)$ to $O(1)$ compared toNaru and UAE but also achieve higher accuracy on high cardinality and highdimensional tables. Experimental results show that Duet can achieve all thedesign goals above and be much more practical and even has a lower inferencecost on CPU than that of most learned methods on GPU.", "output": "Duet: efficient and scalable hybriD neUral rElation undersTanding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many properties in the real world, such as desirability or strength incompetitive environment, can't be directly observed, which makes them difficultto evaluate. To deal with this challenging problem, prior works have primarilyfocused on estimating those properties of known items, especially the strengthof sports players, only of those who appears in paired comparison dataset. Inthis paper, we introduce Deep Bradley-Terry Rating (DBTR), a novel ML frameworkto evaluate any properties of unknown items, not necessarily present in thetraining data. Our method seamlessly integrates traditional Bradley-Terry modelwith a neural network structure. We also generalizes this architecture furtherfor asymmetric environment with unfairness, which is much more common in realworld settings. In our experimental analysis, DBTR successfully learned desiredquantification of those properties.", "output": "Deep Bradley-Terry Rating: Estimate Properties Without Metric of Unseen Items."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Preserving training dynamics across batch sizes is an important tool forpractical machine learning as it enables the trade-off between batch size andwall-clock time. This trade-off is typically enabled by a scaling rule, forexample, in stochastic gradient descent, one should scale the learning ratelinearly with the batch size. Another important tool for practical machinelearning is the model Exponential Moving Average (EMA), which is a model copythat does not receive gradient information, but instead follows its targetmodel with some momentum. This model EMA can improve the robustness andgeneralization properties of supervised learning, stabilize pseudo-labeling,and provide a learning signal for Self-Supervised Learning (SSL). Prior workshave treated the model EMA separately from optimization, leading to differenttraining dynamics across batch sizes and lower model performance. In this work,we provide a scaling rule for optimization in the presence of model EMAs anddemonstrate its validity across a range of architectures, optimizers, and datamodalities. We also show the rule's validity where the model EMA contributes tothe optimization of the target model, enabling us to train EMA-basedpseudo-labeling and SSL methods at small and large batch sizes. For SSL, weenable training of BYOL up to batch size 24,576 without sacrificingperformance, optimally a 6$times$ wall-clock time reduction.", "output": "How to Scale Your EMA."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Medical radiography segmentation, and specifically dental radiography, ishighly limited by the cost of labeling which requires specific expertise andlabor-intensive annotations. In this work, we propose a straightforwardpre-training method for semantic segmentation leveraging Denoising DiffusionProbabilistic Models (DDPM), which have shown impressive results for generativemodeling. Our straightforward approach achieves remarkable performance in termsof label efficiency and does not require architectural modifications betweenpre-training and downstream tasks. We propose to first pre-train a Unet byexploiting the DDPM training objective, and then fine-tune the resulting modelon a segmentation task. Our experimental results on the segmentation of dentalradiographs demonstrate that the proposed method is competitive withstate-of-the-art pre-training methods.", "output": "Pre-Training with Diffusion models for Dental Radiography segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The quantification of the entanglement present in a physical system is ofpara-mount importance for fundamental research and many cutting-edgeapplications. Currently, achieving this goal requires either a priori knowledgeon the system or very demanding experimental procedures such as full statetomography or collective measurements. Here, we demonstrate that by employingneural networks we can quantify the degree of entanglement without needing toknow the full description of the quantum state. Our method allows for directquantification of the quantum correlations using an incomplete set of localmeasurements. Despite using undersampled measurements, we achieve aquantification error of up to an order of magnitude lower than thestate-of-the-art quantum tomography. Furthermore, we achieve this resultemploying networks trained using exclusively simulated data. Finally, we derivea method based on a convolutional network input that can accept data fromvarious measurement scenarios and perform, to some extent, independently of themeasurement device.", "output": "Deep learning of quantum entanglement from incomplete measurements."}] \ No newline at end of file