diff --git a/LLMs-papers.md b/LLMs-papers.md index a76e90b..65f8b8c 100644 --- a/LLMs-papers.md +++ b/LLMs-papers.md @@ -44,4 +44,5 @@ 28. [DPRK-BERT: The Supreme Language Model](http://arxiv.org/abs/2112.00567v1) 29. [MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation](http://arxiv.org/abs/2208.08227v4) -30. [Subword Segmental Language Modelling for Nguni Languages](http://arxiv.org/abs/2210.06525v1) +30. [TabLLM: Few-shot Classification of Tabular Data with Large Language + Models](http://arxiv.org/abs/2210.10723v2) diff --git a/data/2023-05-21.json b/data/2023-05-21.json new file mode 100644 index 0000000..9b4be1d --- /dev/null +++ b/data/2023-05-21.json @@ -0,0 +1 @@ +[{"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual text evokes an image in a person's mind, while non-visual text failsto do so. A method to automatically detect visualness in text will unlock theability to augment text with relevant images, as neural text-to-imagegeneration and retrieval models operate on the implicit assumption that theinput text is visual in nature. We curate a dataset of 3,620 English sentencesand their visualness scores provided by multiple human annotators.Additionally, we use documents that contain text and visual assets to create adistantly supervised corpus of document text and associated images. We alsopropose a fine-tuning strategy that adapts large vision-language models likeCLIP that assume a one-to-one correspondence between text and image to the taskof scoring text visualness from text input alone. Our strategy involvesmodifying the model's contrastive learning objective to map text identified asnon-visual to a common NULL image while matching visual text to theircorresponding images in the document. We evaluate the proposed approach on itsability to (i) classify visual and non-visual text accurately, and (ii) attendover words that are identified as visual in psycholinguistic studies. Empiricalevaluation indicates that our approach performs better than several heuristicsand baseline models for the proposed task. Furthermore, to highlight theimportance of modeling the visualness of text, we conduct qualitative analysesof text-to-image generation systems like DALL-E.", "output": "Learning the Visualness of Text Using Large Vision-Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Generative Pre-trained Transformer models represent a notablebreakthrough in the domain of natural language processing, which is propellingus toward the development of machines that can understand and communicate usinglanguage in a manner that closely resembles that of humans. GenerativePre-trained Transformer models are based on the transformer architecture, adeep neural network designed for natural language processing tasks. Due totheir impressive performance on natural language processing tasks and abilityto effectively converse, Generative Pre-trained Transformer models have gainedsignificant popularity among researchers and industrial communities, makingthem one of the most widely used and effective models in natural languageprocessing and related fields, which motivated to conduct this review. Thisreview provides a detailed overview of the Generative Pre-trained Transformer,including its architecture, working process, training procedures, enablingtechnologies, and its impact on various applications. In this review, we alsoexplored the potential challenges and limitations of a Generative Pre-trainedTransformer. Furthermore, we discuss potential solutions and future directions.Overall, this paper aims to provide a comprehensive understanding of GenerativePre-trained Transformers, enabling technologies, their impact on variousapplications, emerging challenges, and potential solutions.", "output": "Generative Pre-trained Transformer: A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Internet of Things is transforming our society, providing new servicesthat improve the quality of life and resource management. These applicationsare based on ubiquitous networks of multiple distributed devices, with limitedcomputing resources and power, capable of collecting and storing data fromheterogeneous sources in real-time. To avoid network saturation and highdelays, new architectures such as fog computing are emerging to bring computinginfrastructure closer to data sources. Additionally, new data centers areneeded to provide real-time Big Data and data analytics capabilities at theedge of the network, where energy efficiency needs to be considered to ensure asustainable and effective deployment in areas of human activity. In thisresearch, we present an IoT model based on the principles of Model-BasedSystems Engineering defined using the Discrete Event System Specificationformalism. The provided mathematical formalism covers the description of theentire architecture, from IoT devices to the processing units in edge datacenters. Our work includes the location-awareness of user equipment, network,and computing infrastructures to optimize federated resource management interms of delay and power consumption. We present an effective framework toassist the dimensioning and the dynamic operation of IoT data stream analyticsapplications, demonstrating our contributions through a driving assistance usecase based on real traces and data.", "output": "Bringing AI to the edge: A formal M&S specification to deploy effective IoT architectures."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Word embeddings, i.e., semantically meaningful vector representation ofwords, are largely influenced by the distributional hypothesis \"You shall knowa word by the company it keeps\" (Harris, 1954), whereas modern prediction-basedneural network embeddings rely on design choices and hyperparameteroptimization. Word embeddings like Word2Vec, GloVe etc. well capture thecontextuality and real-world analogies but contemporary convolution-based imageembeddings such as VGGNet, AlexNet, etc. do not capture contextual knowledge.The popular king-queen analogy does not hold true for most commonly used visionembeddings.In this paper, we introduce a pre-trained joint embedding (JE), namedIMAGINATOR, trained on 21K distinct image objects level from 1M image+textpairs. JE is a way to encode multimodal data into a vector space where the textmodality serves as the ground-ing key, which the complementary modality (inthis case, the image) is anchored with. IMAGINATOR encapsulates threeindividual representations: (i) object-object co-location, (ii) word-objectco-location, and (iii) word-object correlation. These three ways capturecomplementary aspects of the two modalities which are further combined toobtain the final JEs.Generated JEs are intrinsically evaluated to assess how well they capture thecontextuality and real-world analogies. We also evaluate pre-trained IMAGINATORJEs on three downstream tasks: (i) image captioning, (ii) Image2Tweet, and(iii) text-based image retrieval. IMAGINATOR establishes a new standard on theaforementioned down-stream tasks by outperforming the current SoTA on all theselected tasks. IMAGINATOR will be made publicly available. The codes areavailable at ", "output": "IMAGINATOR: Pre-Trained Image+Text Joint Embeddings using Word-Level Grounding of Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multicast communication technology is widely applied in wireless environmentswith a high device density. Traditional wireless network architectures havedifficulty flexibly obtaining and maintaining global network state informationand cannot quickly respond to network state changes, thus affecting thethroughput, delay, and other QoS requirements of existing multicastingsolutions. Therefore, this paper proposes a new multicast routing method basedon multiagent deep reinforcement learning (MADRL-MR) in a software-definedwireless networking (SDWN) environment. First, SDWN technology is adopted toflexibly configure the network and obtain network state information in the formof traffic matrices representing global network links information, such as linkbandwidth, delay, and packet loss rate. Second, the multicast routing problemis divided into multiple subproblems, which are solved through multiagentcooperation. To enable each agent to accurately understand the current networkstate and the status of multicast tree construction, the state space of eachagent is designed based on the traffic and multicast tree status matrices, andthe set of AP nodes in the network is used as the action space. A novelsingle-hop action strategy is designed, along with a reward function based onthe four states that may occur during tree construction: progress, invalid,loop, and termination. Finally, a decentralized training approach is combinedwith transfer learning to enable each agent to quickly adapt to dynamic networkchanges and accelerate convergence. Simulation experiments show that MADRL-MRoutperforms existing algorithms in terms of throughput, delay, packet lossrate, etc., and can establish more intelligent multicast routes.", "output": "Intelligent multicast routing method based on multi-agent deep reinforcement learning in SDWN."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Due to the highly dynamic changes in wireless network topologies, efficientlyobtaining network status information and flexibly forwarding data to improvecommunication quality of service are important challenges. This articleintroduces an intelligent routing algorithm (DRL-PPONSA) based on proximalpolicy optimization deep reinforcement learning with network situationalawareness under a software-defined wireless networking architecture. First, aspecific data plane is designed for network topology construction and dataforwarding. The control plane collects network traffic information, sends flowtables, and uses a GCN-GRU prediction mechanism to perceive future trafficchange trends to achieve network situational awareness. Second, a DRL-baseddata forwarding mechanism is designed in the knowledge plane. The predictednetwork traffic matrix and topology information matrix are treated as theenvironment for DRL agents, while next-hop adjacent nodes are treated asexecutable actions. Accordingly, action selection strategies are designed fordifferent network conditions to achieve more intelligent, flexible, andefficient routing control. The reward function is designed using network linkinformation and various reward and penalty mechanisms. Additionally, importancesampling and gradient clipping techniques are employed during gradient updatingto enhance convergence speed and stability. Experimental results show thatDRL-PPONSA outperforms traditional routing methods in network throughput,delay, packet loss rate, and wireless node distance. Compared tovalue-function-based Dueling DQN routing, the convergence speed issignificantly improved, and the convergence effect is more stable.Simultaneously, its consumption of hardware storage space is reduced, andefficient routing decisions can be made in real-time using the current networkstate information.", "output": "An Intelligent SDWN Routing Algorithm Based on Network Situational Awareness and Deep Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Fully autonomous driving has been widely studied and is becoming increasinglyfeasible. However, such autonomous driving has yet to be achieved on publicroads, because of various uncertainties due to surrounding human drivers andpedestrians. In this paper, we present an end-to-end learningbased autonomousdriving system named SuperDriver AI, where Deep Neural Networks (DNNs) learnthe driving actions and policies from the experienced human drivers anddetermine the driving maneuvers to take while guaranteeing road safety. Inaddition, to improve robustness and interpretability, we present a slit modeland a visual attention module. We build a datacollection system and emulatorwith real-world hardware, and we also test the SuperDriver AI system withreal-world driving scenarios. Finally, we have collected 150 runs for onedriving scenario in Tokyo, Japan, and have shown the demonstration ofSuperDriver AI with the real-world vehicle.", "output": "SuperDriverAI: Towards Design and Implementation for End-to-End Learning-based Autonomous Driving."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Given the importance of forests and their role in maintaining the ecologicalbalance, which directly affects the planet, the climate, and the life on thisplanet, this research presents the problem of forest fire monitoring usingdrones. The forest monitoring process is performed continuously to track anychanges in the monitored region within the forest. During fires, drones'capture data is used to increase the follow-up speed and enhance the controlprocess of these fires to prevent their spread. The time factor in suchproblems determines the success rate of the fire extinguishing process, asappropriate data at the right time may be the decisive factor in controllingfires, preventing their spread, extinguishing them, and limiting their losses.Therefore, this research presented the problem of monitoring task schedulingfor drones in the forest monitoring system. This problem is solved bydeveloping several algorithms with the aim of minimizing the total completiontime required to carry out all the drones' assigned tasks. System performanceis measured by using 990 instances of three different classes. The performedexperimental results indicated the effectiveness of the proposed algorithms andtheir ability to act efficiently to achieve the desired goal. The algorithm$RID$ achieved the best performance with a percentage rate of up to 90.3% witha time of 0.088 seconds.", "output": "Optimizing Forest Fire Prevention: Intelligent Scheduling Algorithms for Drone-Based Surveillance System."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Over-parameterized neural language models (LMs) can memorize and recite longsequences of training data. While such memorization is normally associated withundesired properties such as overfitting and information leaking, our workcasts memorization as an unexplored capability of LMs. We propose the firstsymmetric encryption algorithm with autoregressive language models (SELM). Weshow that autoregressive LMs can encode arbitrary data into a compactreal-valued vector (i.e., encryption) and then losslessly decode the vector tothe original message (i.e., decryption) via random subspace optimization andgreedy decoding. While SELM is not amenable to conventional cryptanalysis, weinvestigate its security through a novel empirical variant of the classicIND-CPA (indistinguishability under chosen-plaintext attack) game. Our code anddatasets are available at ", "output": "Memorization for Good: Encryption with Autoregressive Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Emotion regulation is a crucial element in dealing with emotional events andhas positive effects on mental health. This paper aims to provide a morecomprehensive understanding of emotional events by introducing a new Frenchcorpus of emotional narratives collected using a questionnaire for emotionregulation. We follow the theoretical framework of the Component Process Modelwhich considers emotions as dynamic processes composed of four interrelatedcomponents (behavior, feeling, thinking and territory). Each narrative isrelated to a discrete emotion and is structured based on all emotion componentsby the writers. We study the interaction of components and their impact onemotion classification with machine learning methods and pre-trained languagemodels. Our results show that each component improves prediction performance,and that the best results are achieved by jointly considering all components.Our results also show the effectiveness of pre-trained language models inpredicting discrete emotion from certain components, which reveal differencesin how emotion components are expressed.", "output": "Emotion Recognition based on Psychological Components in Guided Narratives for Emotion Regulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural networks and in particular the attention mechanism have broughtsignificant advances to the field of Automated Essay Scoring. Many of thesesystems use a regression-based model which may be prone to underfitting whenthe model only predicts the mean of the training data. In this paper, wepresent a dynamic loss function that creates an incentive for the model topredict with the correct distribution, as well as predicting the correctvalues. Our loss function achieves this goal without sacrificing anyperformance achieving a Quadratic Weighted Kappa score of 0.752 on theAutomated Student Assessment Prize Automated Essay Scoring dataset.", "output": "The Effectiveness of a Dynamic Loss Function in Neural Network Based Automated Essay Scoring."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents GenDoc, a general sequence-to-sequence documentunderstanding model pre-trained with unified masking across three modalities:text, image, and layout. The proposed model utilizes an encoder-decoderarchitecture, which allows for increased adaptability to a wide range ofdownstream tasks with diverse output formats, in contrast to the encoder-onlymodels commonly employed in document understanding. In addition to thetraditional text infilling task used in previous encoder-decoder models, ourpre-training extends to include tasks of masked image token prediction andmasked layout prediction. We also design modality-specific instruction andadopt both disentangled attention and the mixture-of-modality-experts strategyto effectively capture the information leveraged by each modality. Evaluationof the proposed model through extensive experiments on several downstream tasksin document understanding demonstrates its ability to achieve superior orcompetitive performance compared to state-of-the-art approaches. Our analysisfurther suggests that GenDoc is more robust than the encoder-only models inscenarios where the OCR quality is imperfect.", "output": "Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Going beyond 'dendritic democracy', we introduce a 'democracy of localprocessors', termed Cooperator. Here we compare their capabilities when used inpermutation-invariant neural networks for reinforcement learning (RL), withmachine learning algorithms based on Transformers, such as ChatGPT.Transformers are based on the long-standing conception of integrate-and-fire'point' neurons, whereas Cooperator is inspired by recent neurobiologicalbreakthroughs suggesting that the cellular foundations of mental life depend oncontext-sensitive pyramidal neurons in the neocortex which have twofunctionally distinct points. We show that when used for RL, an algorithm basedon Cooperator learns far quicker than that based on Transformer, even whilehaving the same number of parameters.", "output": "Cooperation Is All You Need."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This document serves as a position paper that outlines the authors' visionfor a potential pathway towards generalist robots. The purpose of this documentis to share the excitement of the authors with the community and highlight apromising research direction in robotics and AI. The authors believe theproposed paradigm is a feasible path towards accomplishing the long-standinggoal of robotics research: deploying robots, or embodied AI agents morebroadly, in various non-factory real-world settings to perform diverse tasks.This document presents a specific idea for mining knowledge in the latestlarge-scale foundation models for robotics research. Instead of directlyadapting these models or using them to guide low-level policy learning, itadvocates for using them to generate diversified tasks and scenes at scale,thereby scaling up low-level skill learning and ultimately leading to afoundation model for robotics that empowers generalist robots. The authors areactively pursuing this direction, but in the meantime, they recognize that theambitious goal of building generalist robots with large-scale policy trainingdemands significant resources such as computing power and hardware, andresearch groups in academia alone may face severe resource constraints inimplementing the entire vision. Therefore, the authors believe sharing theirthoughts at this early stage could foster discussions, attract interest towardsthe proposed pathway and related topics from industry groups, and potentiallyspur significant technical advancements in the field.", "output": "Towards A Foundation Model for Generalist Robots: Diverse Skill Learning at Scale via Automated Task and Scene Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Estimating the 3DoF rotation from a single RGB image is an important yetchallenging problem. As a popular approach, probabilistic rotation modelingadditionally carries prediction uncertainty information, compared tosingle-prediction rotation regression. For modeling probabilistic distributionover SO(3), it is natural to use Gaussian-like Bingham distribution and matrixFisher, however they are shown to be sensitive to outlier predictions, e.g.$180^circ$ error and thus are unlikely to converge with optimal performance.In this paper, we draw inspiration from multivariate Laplace distribution andpropose a novel rotation Laplace distribution on SO(3). Our rotation Laplacedistribution is robust to the disturbance of outliers and enforces muchgradient to the low-error region that it can improve. In addition, we show thatour method also exhibits robustness to small noises and thus toleratesimperfect annotations. With this benefit, we demonstrate its advantages insemi-supervised rotation regression, where the pseudo labels are noisy. Tofurther capture the multi-modal rotation solution space for symmetric objects,we extend our distribution to rotation Laplace mixture model and demonstrateits effectiveness. Our extensive experiments show that our proposeddistribution and the mixture model achieve state-of-the-art performance in allthe rotation regression experiments over both probabilistic andnon-probabilistic baselines.", "output": "Towards Robust Probabilistic Modeling on SO(3) via Rotation Laplace Distribution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Lung cancer(LC) is a type of malignant neoplasm that originates in thebronchial mucosa or glands.As a clinically common nodule,solitary pulmonarynodules(SPNs) have a significantly higher probability of malignancy when theyare larger than 8 mm in diameter.But there is also a risk of lung cancer whenthe diameter is less than 8mm,the purpose of this study was to create anomogram for estimating the likelihood of lung cancer in patients with SPNs of8 mm or smaller using computed tomography(CT) scans and biomarkerinformation.Use CT scans and various biomarkers as input to build predictivemodels for the likelihood of lung cancer in patients with SPNs of 8 mm orless.The age,precursor gastrin-releasing peptide(ProGRP),gender,Carcinoembryonic Antigen(CEA),and stress corrosioncracking(SCC) were independent key tumor markers and were entered into thenomogram.The developed nomogram demonstrated strong accuracy in predicting lungcancer risk,with an internal validation area under the receiver operatingcharacteristics curve(ROC) of 0.8474.The calibration curves plotted showed thatthe nomogram predicted the probability of lung cancer with good agreement withthe actual probability.In this study,we finally succeeded in constructing asuitable nomogram that could predict the risk of lung cancer in patients withSPNs<=8 mm in diameter.The model has a high level of accuracy and is able toaccurately distinguish between different patients,allowing clinicians todevelop personalized treatment plans for individuals with SPNs.", "output": "Solitary pulmonary nodules prediction for lung cancer patients using nomogram and machine learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this note we present a fully information theoretic approach torenormalization inspired by Bayesian statistical inference, which we refer toas Bayesian Renormalization. The main insight of Bayesian Renormalization isthat the Fisher metric defines a correlation length that plays the role of anemergent RG scale quantifying the distinguishability between nearby points inthe space of probability distributions. This RG scale can be interpreted as aproxy for the maximum number of unique observations that can be made about agiven system during a statistical inference experiment. The role of theBayesian Renormalization scheme is subsequently to prepare an effective modelfor a given system up to a precision which is bounded by the aforementionedscale. In applications of Bayesian Renormalization to physical systems, theemergent information theoretic scale is naturally identified with the maximumenergy that can be probed by current experimental apparatus, and thus BayesianRenormalization coincides with ordinary renormalization. However, BayesianRenormalization is sufficiently general to apply even in circumstances in whichan immediate physical scale is absent, and thus provides an ideal approach torenormalization in data science contexts. To this end, we provide insight intohow the Bayesian Renormalization scheme relates to existing methods for datacompression and data generation such as the information bottleneck and thediffusion learning paradigm.", "output": "Bayesian Renormalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Feature attribution methods (FAs) are popular approaches for providinginsights into the model reasoning process of making predictions. The morefaithful a FA is, the more accurately it reflects which parts of the input aremore important for the prediction. Widely used faithfulness metrics, such assufficiency and comprehensiveness use a hard erasure criterion, i.e. entirelyremoving or retaining the top most important tokens ranked by a given FA andobserving the changes in predictive likelihood. However, this hard criterionignores the importance of each individual token, treating them all equally forcomputing sufficiency and comprehensiveness. In this paper, we propose a simpleyet effective soft erasure criterion. Instead of entirely removing or retainingtokens from the input, we randomly mask parts of the token vectorrepresentations proportionately to their FA importance. Extensive experimentsacross various natural language processing tasks and different FAs show thatour soft-sufficiency and soft-comprehensiveness metrics consistently prefermore faithful explanations compared to hard sufficiency and comprehensiveness.Our code: ", "output": "Incorporating Attribution Importance for Improving Faithfulness Metrics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently Transformer and Convolution neural network (CNN) based models haveshown promising results in EEG signal processing. Transformer models cancapture the global dependencies in EEG signals through a self-attentionmechanism, while CNN models can capture local features such as sawtooth waves.In this work, we propose an end-to-end neural epilepsy detection model, EENED,that combines CNN and Transformer. Specifically, by introducing the convolutionmodule into the Transformer encoder, EENED can learn the time-dependentrelationship of the patient's EEG signal features and notice local EEG abnormalmutations closely related to epilepsy, such as the appearance of spikes and thesprinkling of sharp and slow waves. Our proposed framework combines the abilityof Transformer and CNN to capture different scale features of EEG signals andholds promise for improving the accuracy and reliability of epilepsy detection.Our source code will be released soon on GitHub.", "output": "EENED: End-to-End Neural Epilepsy Detection based on Convolutional Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robust Markov decision processes (MDPs) address the challenge of modeluncertainty by optimizing the worst-case performance over an uncertainty set ofMDPs. In this paper, we focus on the robust average-reward MDPs under themodel-free setting. We first theoretically characterize the structure ofsolutions to the robust average-reward Bellman equation, which is essential forour later convergence analysis. We then design two model-free algorithms,robust relative value iteration (RVI) TD and robust RVI Q-learning, andtheoretically prove their convergence to the optimal solution. We provideseveral widely used uncertainty sets as examples, including those defined bythe contamination model, total variation, Chi-squared divergence,Kullback-Leibler (KL) divergence and Wasserstein distance.", "output": "Model-Free Robust Average-Reward Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The large-scale deployment of autonomous vehicles is yet to come, and one ofthe major remaining challenges lies in urban dense traffic scenarios. In suchcases, it remains challenging to predict the future evolution of the scene andfuture behaviors of objects, and to deal with rare adverse events such as thesudden appearance of occluded objects. In this paper, we present ReasonNet, anovel end-to-end driving framework that extensively exploits both temporal andglobal information of the driving scene. By reasoning on the temporal behaviorof objects, our method can effectively process the interactions andrelationships among features in different frames. Reasoning about the globalinformation of the scene can also improve overall perception performance andbenefit the detection of adverse events, especially the anticipation ofpotential danger from occluded objects. For comprehensive evaluation onocclusion events, we also release publicly a driving simulation benchmarkDriveOcclusionSim consisting of diverse occlusion events. We conduct extensiveexperiments on multiple CARLA benchmarks, where our model outperforms all priormethods, ranking first on the sensor track of the public CARLA Leaderboard.", "output": "ReasonNet: End-to-End Driving with Temporal and Global Reasoning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this multicultural age, language translation is one of the most performedtasks, and it is becoming increasingly AI-moderated and automated. As a novelAI system, ChatGPT claims to be proficient in such translation tasks and inthis paper, we put that claim to the test. Specifically, we examine ChatGPT'saccuracy in translating between English and languages that exclusively usegender-neutral pronouns. We center this study around Bengali, the 7$^{th}$ mostspoken language globally, but also generalize our findings across five otherlanguages: Farsi, Malay, Tagalog, Thai, and Turkish. We find that ChatGPTperpetuates gender defaults and stereotypes assigned to certain occupations(e.g. man = doctor, woman = nurse) or actions (e.g. woman = cook, man = go towork), as it converts gender-neutral pronouns in languages to `he' or `she'. Wealso observe ChatGPT completely failing to translate the English gender-neutralpronoun `they' into equivalent gender-neutral pronouns in other languages, asit produces translations that are incoherent and incorrect. While it doesrespect and provide appropriately gender-marked versions of Bengali words whenprompted with gender information in English, ChatGPT appears to confer a higherrespect to men than to women in the same occupation. We conclude that ChatGPTexhibits the same gender biases which have been demonstrated for tools likeGoogle Translate or MS Translator, as we provide recommendations for a humancentered approach for future designers of AIs that perform language translationto better accommodate such low-resource languages.", "output": "ChatGPT Perpetuates Gender Bias in Machine Translation and Ignores Non-Gendered Pronouns: Findings across Bengali and Five other Low-Resource Languages."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Off-Policy reinforcement learning has been a driving force for thestate-of-the-art conversational AIs leading to more natural humanagentinteractions and improving the user satisfaction for goal-oriented agents.However, in large-scale commercial settings, it is often challenging to balancebetween policy improvements and experience continuity on the broad spectrum ofapplications handled by such system. In the literature, off-policy evaluationand guard-railing on aggregate statistics has been commonly used to addressthis problem. In this paper, we propose a method for curating and leveraginghigh-precision samples sourced from historical regression incident reports tovalidate, safe-guard, and improve policies prior to the online deployment. Weconducted extensive experiments using data from a real-world conversationalsystem and actual regression incidents. The proposed method is currentlydeployed in our production system to protect customers against brokenexperiences and enable long-term policy improvements.", "output": "Scalable and Safe Remediation of Defective Actions in Self-Learning Conversational Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Bayesian networks (BN) are directed acyclic graphical (DAG) models that havebeen adopted into many fields for their strengths in transparency,interpretability, probabilistic reasoning, and causal modeling. Given a set ofdata, one hurdle towards using BNs is in building the network graph from thedata that properly handles dependencies, whether correlated or causal. In thispaper, we propose an initial methodology for discovering network structuresusing Tsetlin Machines.", "output": "Generating Bayesian Network Models from Data Using Tsetlin Machines."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce Graph-Induced Sum-Product Networks (GSPNs), a new probabilisticframework for graph representation learning that can tractably answerprobabilistic queries. Inspired by the computational trees induced by verticesin the context of message-passing neural networks, we build hierarchies ofsum-product networks (SPNs) where the parameters of a parent SPN are learnabletransformations of the a-posterior mixing probabilities of its children's sumunits. Due to weight sharing and the tree-shaped computation graphs of GSPNs,we obtain the efficiency and efficacy of deep graph networks with theadditional advantages of a purely probabilistic model. We show the model'scompetitiveness on scarce supervision scenarios, handling missing data, andgraph classification in comparison to popular neural models. We complement theexperiments with qualitative analyses on hyper-parameters and the model'sability to answer probabilistic queries.", "output": "Tractable Probabilistic Graph Representation Learning with Graph-Induced Sum-Product Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The discovery of individual objectives in collective behavior of complexdynamical systems such as fish schools and bacteria colonies is a long-standingchallenge. Inverse reinforcement learning is a potent approach for addressingthis challenge but its applicability to dynamical systems, involving continuousstate-action spaces and multiple interacting agents, has been limited. In thisstudy, we tackle this challenge by introducing an off-policy inversemulti-agent reinforcement learning algorithm (IMARL). Our approach combines theReF-ER techniques with guided cost learning. By leveraging demonstrations, ouralgorithm automatically uncovers the reward function and learns an effectivepolicy for the agents. Through extensive experimentation, we demonstrate thatthe proposed policy captures the behavior observed in the provided data, andachieves promising results across problem domains including single agent modelsin the OpenAI gym and multi-agent models of schooling behavior. The presentstudy shows that the proposed IMARL algorithm is a significant step towardsunderstanding collective dynamics from the perspective of its constituents, andshowcases its value as a tool for studying complex physical systems exhibitingcollective behaviour.", "output": "Discovering Individual Rewards in Collective Behavior through Inverse Multi-Agent Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Urban air mobility (UAM) has the potential to revolutionize our dailytransportation, offering rapid and efficient deliveries of passengers and cargobetween dedicated locations within and around the urban environment. Before thecommercialization and adoption of this emerging transportation mode, however,aviation safety must be guaranteed, i.e., all the aircraft have to be safelyseparated by strategic and tactical deconfliction. Reinforcement learning hasdemonstrated effectiveness in the tactical deconfliction of en route commercialair traffic in simulation. However, its performance is found to be dependent onthe traffic density. In this project, we propose a novel framework thatcombines demand capacity balancing (DCB) for strategic conflict management andreinforcement learning for tactical separation. By using DCB to preconditiontraffic to proper density levels, we show that reinforcement learning canachieve much better performance for tactical safety separation. Our resultsalso indicate that this DCB preconditioning can allow target levels of safetyto be met that are otherwise impossible. In addition, combining strategic DCBwith reinforcement learning for tactical separation can meet these safetylevels while achieving greater operational efficiency than alternativesolutions.", "output": "Integrated Conflict Management for UAM with Strategic Demand Capacity Balancing and Learning-based Tactical Deconfliction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The success of the knowledge graph completion task heavily depends on thequality of the knowledge graph embeddings (KGEs), which relies onself-supervised learning and augmenting the dataset with negative triples.There is a gap in literature between the theoretical analysis of negativesamples on contrastive loss and heuristic generation of quality (i.e., hard)negative triples. In this paper, we modify the InfoNCE loss to explicitlyaccount for the negative sample distribution. We show minimizing InfoNCE losswith hard negatives maximizes the KL-divergence between the given and negativetriple embedding. However, we also show that hard negatives can lead to falsenegatives (i.e., accidentally factual triples) and reduce downstream taskperformance. To address this issue, we propose a novel negative sampledistribution that uses the graph structure of the knowledge graph to remove thefalse negative triples. We call our algorithm Hardness and Structure-aware(textbf{HaSa}) contrastive KGE. Experiments show that our method outperformsstate-of-the-art KGE methods in several metrics for WN18RR and FB15k-237datasets.", "output": "Investigating the Effect of Hard Negative Sample Distribution on Contrastive Knowledge Graph Embedding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Abstaining classifiers have the option to abstain from making predictions oninputs that they are unsure about. These classifiers are becoming increasinglypopular in high-stake decision-making problems, as they can withhold uncertainpredictions to improve their reliability and safety. When evaluating black-boxabstaining classifier(s), however, we lack a principled approach that accountsfor what the classifier would have predicted on its abstentions. These missingpredictions are crucial when, e.g., a radiologist is unsure of their diagnosisor when a driver is inattentive in a self-driving car. In this paper, weintroduce a novel approach and perspective to the problem of evaluating andcomparing abstaining classifiers by treating abstentions as missing data. Ourevaluation approach is centered around defining the counterfactual score of anabstaining classifier, defined as the expected performance of the classifierhad it not been allowed to abstain. We specify the conditions under which thecounterfactual score is identifiable: if the abstentions are stochastic, and ifthe evaluation data is independent of the training data (ensuring that thepredictions are missing at random), then the score is identifiable. Note that,if abstentions are deterministic, then the score is unidentifiable because theclassifier can perform arbitrarily poorly on its abstentions. Leveraging toolsfrom observational causal inference, we then develop nonparametric and doublyrobust methods to efficiently estimate this quantity under identification. Ourapproach is examined in both simulated and real data experiments.", "output": "Counterfactually Comparing Abstaining Classifiers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative AI models like DALL-E 2 can interpret textual prompts and generatehigh-quality images exhibiting human creativity. Though public enthusiasm isbooming, systematic auditing of potential gender biases in AI-generated imagesremains scarce. We addressed this gap by examining the prevalence of twooccupational gender biases (representational and presentational biases) in15,300 DALL-E 2 images spanning 153 occupations, and assessed potential biasamplification by benchmarking against 2021 census labor statistics and GoogleImages. Our findings reveal that DALL-E 2 underrepresents women inmale-dominated fields while overrepresenting them in female-dominatedoccupations. Additionally, DALL-E 2 images tend to depict more women than menwith smiling faces and downward-pitching heads, particularly infemale-dominated (vs. male-dominated) occupations. Our computational algorithmauditing study demonstrates more pronounced representational and presentationalbiases in DALL-E 2 compared to Google Images and calls for feministinterventions to prevent such bias-laden AI-generated images to feedback intothe media ecology.", "output": "Smiling Women Pitching Down: Auditing Representational and Presentational Gender Biases in Image Generative AI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we present Sim-MEES: a large-scale synthetic dataset thatcontains 1,550 objects with varying difficulty levels and physics properties,as well as 11 million grasp labels for mobile manipulators to plan grasps usingdifferent gripper modalities in cluttered environments. Our dataset generationprocess combines analytic models and dynamic simulations of the entirecluttered environment to provide accurate grasp labels. We provide a detailedstudy of our proposed labeling process for both parallel jaw grippers andsuction cup grippers, comparing them with state-of-the-art methods todemonstrate how Sim-MEES can provide precise grasp labels in clutteredenvironments.", "output": "Sim-MEES: Modular End-Effector System Grasping Dataset for Mobile Manipulators in Cluttered Environments."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language models are increasingly being deployed for general problem solvingacross a wide range of tasks, but are still confined to token-level,left-to-right decision-making processes during inference. This means they canfall short in tasks that require exploration, strategic lookahead, or whereinitial decisions play a pivotal role. To surmount these challenges, weintroduce a new framework for language model inference, Tree of Thoughts (ToT),which generalizes over the popular Chain of Thought approach to promptinglanguage models, and enables exploration over coherent units of text (thoughts)that serve as intermediate steps toward problem solving. ToT allows LMs toperform deliberate decision making by considering multiple different reasoningpaths and self-evaluating choices to decide the next course of action, as wellas looking ahead or backtracking when necessary to make global choices. Ourexperiments show that ToT significantly enhances language models'problem-solving abilities on three novel tasks requiring non-trivial planningor search: Game of 24, Creative Writing, and Mini Crosswords. For instance, inGame of 24, while GPT-4 with chain-of-thought prompting only solved 4% oftasks, our method achieved a success rate of 74%. Code repo with all prompts:", "output": "Tree of Thoughts: Deliberate Problem Solving with Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While there is much recent interest in studying why Transformer-based largelanguage models make predictions the way they do, the complex computationsperformed within each layer have traditionally posed a strong bottleneck. Tomitigate this shortcoming, this work presents a linear decomposition of finalhidden states from autoregressive language models based on each initial inputtoken, which is exact for virtually all contemporary Transformer architectures.This decomposition allows the definition of probability distributions thatablate the contribution of specific input tokens, which can be used to analyzetheir influence on model probabilities over a sequence of upcoming words withonly one forward pass from the model. Using the change in next-word probabilityas a measure of importance, this work first examines which context words makethe biggest contribution to language model predictions. Regression experimentssuggest that Transformer-based language models rely primarily on collocationalassociations, followed by linguistic factors such as syntactic dependencies andcoreference relationships in making next-word predictions. Additionally,analyses using these measures to predict syntactic dependencies and coreferentmention spans show that collocational association and repetitions of the sametoken respectively, largely explain the language model's predictions on thetasks.", "output": "Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The task of root cause analysis (RCA) is to identify the root causes ofsystem faults/failures by analyzing system monitoring data. Efficient RCA cangreatly accelerate system failure recovery and mitigate system damages orfinancial losses. However, previous research has mostly focused on developingoffline RCA algorithms, which often require manually initiating the RCAprocess, a significant amount of time and data to train a robust model, andthen being retrained from scratch for a new system fault.In this paper, we propose CORAL, a novel online RCA framework that canautomatically trigger the RCA process and incrementally update the RCA model.CORAL consists of Trigger Point Detection, Incremental Disentangled CausalGraph Learning, and Network Propagation-based Root Cause Localization. TheTrigger Point Detection component aims to detect system state transitionsautomatically and in near-real-time. To achieve this, we develop an onlinetrigger point detection approach based on multivariate singular spectrumanalysis and cumulative sum statistics. To efficiently update the RCA model, wepropose an incremental disentangled causal graph learning approach to decouplethe state-invariant and state-dependent information. After that, CORAL appliesa random walk with restarts to the updated causal graph to accurately identifyroot causes. The online RCA process terminates when the causal graph and thegenerated root cause list converge. Extensive experiments on three real-worlddatasets with case studies demonstrate the effectiveness and superiority of theproposed framework.", "output": "Incremental Causal Graph Learning for Online Unsupervised Root Cause Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models, e.g. ChatGPT are currently contributing enormously tomake artificial intelligence even more popular, especially among the generalpopulation. However, such chatbot models were developed as tools to supportnatural language communication between humans. Problematically, it is very mucha ``statistical correlation machine\" (correlation instead of causality) andthere are indeed ethical concerns associated with the use of AI language modelssuch as ChatGPT, such as Bias, Privacy, and Abuse. This paper highlightsspecific ethical concerns on ChatGPT and articulates key challenges whenChatGPT is used in various applications. Practical commandments for differentstakeholders of ChatGPT are also proposed that can serve as checklistguidelines for those applying ChatGPT in their applications. These commandmentexamples are expected to motivate the ethical use of ChatGPT.", "output": "Ethical ChatGPT: Concerns, Challenges, and Commandments."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Biomedical Named Entity Recognition (BioNER) is the fundamental task ofidentifying named entities from biomedical text. However, BioNER suffers fromsevere data scarcity and lacks high-quality labeled data due to the highlyspecialized and expert knowledge required for annotation. Though dataaugmentation has shown to be highly effective for low-resource NER in general,existing data augmentation techniques fail to produce factual and diverseaugmentations for BioNER. In this paper, we present BioAug, a novel dataaugmentation framework for low-resource BioNER. BioAug, built on BART, istrained to solve a novel text reconstruction task based on selective maskingand knowledge augmentation. Post training, we perform conditional generationand generate diverse augmentations conditioning BioAug on selectively corruptedtext similar to the training stage. We demonstrate the effectiveness of BioAugon 5 benchmark BioNER datasets and show that BioAug outperforms all ourbaselines by a significant margin (1.5%-21.5% absolute improvement) and is ableto generate augmentations that are both more factual and diverse. Code:", "output": "BioAug: Conditional Generation based Data Augmentation for Low-Resource Biomedical NER."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is not uncommon that real-world data are distributed with a long tail. Forsuch data, the learning of deep neural networks becomes challenging because itis hard to classify tail classes correctly. In the literature, several existingmethods have addressed this problem by reducing classifier bias provided thatthe features obtained with long-tailed data are representative enough. However,we find that training directly on long-tailed data leads to uneven embeddingspace. That is, the embedding space of head classes severely compresses that oftail classes, which is not conducive to subsequent classifier learning.%further improving model performance. This paper therefore studies the problemof long-tailed visual recognition from the perspective of feature level. Weintroduce feature augmentation to balance the embedding distribution. Thefeatures of different classes are perturbed with varying amplitudes in Gaussianform. Based on these perturbed features, two novel logit adjustment methods areproposed to improve model performance at a modest computational overhead.Subsequently, the distorted embedding spaces of all classes can be calibrated.In such balanced-distributed embedding spaces, the biased classifier can beeliminated by simply retraining the classifier with class-balanced samplingdata. Extensive experiments conducted on benchmark datasets demonstrate thesuperior performance of the proposed method over the state-of-the-art ones.", "output": "Adjusting Logit in Gaussian Form for Long-Tailed Visual Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There have been increasing challenges to dual-system descriptions of System-1and System-2, critiquing them as imprecise and fostering misconceptions. Weaddress these issues here by way of Dennett's appeal to use computationalthinking as an analytical tool, specifically we employ the Common Model ofCognition. Results show that the characteristics thought to be distinctive ofSystem-1 and System-2 instead form a spectrum of cognitive properties. Bygrounding System-1 and System-2 in the Common Model we aim to clarify theirunderlying mechanisms, persisting misconceptions, and implications formetacognition.", "output": "Clarifying System 1 & 2 through the Common Model of Cognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A key challenge in dysarthric speech recognition is the speaker-leveldiversity attributed to both speaker-identity associated factors such asgender, and speech impairment severity. Most prior researches on addressingthis issue focused on using speaker-identity only. To this end, this paperproposes a novel set of techniques to use both severity and speaker-identity indysarthric speech recognition: a) multitask training incorporating severityprediction error; b) speaker-severity aware auxiliary feature adaptation; andc) structured LHUC transforms separately conditioned on speaker-identity andseverity. Experiments conducted on UASpeech suggest incorporating additionalspeech impairment severity into state-of-the-art hybrid DNN, E2E Conformer andpre-trained Wav2vec 2.0 ASR systems produced statistically significant WERreductions up to 4.78% (14.03% relative). Using the best system the lowestpublished WER of 17.82% (51.25% on very low intelligibility) was obtained onUASpeech.", "output": "Use of Speech Impairment Severity for Dysarthric Speech Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Weakly supervised learning based on scribble annotations in target extractionof remote sensing images has drawn much interest due to scribbles' flexibilityin denoting winding objects and low cost of manually labeling. However,scribbles are too sparse to identify object structure and detailed information,bringing great challenges in target localization and boundary description. Toalleviate these problems, in this paper, we construct two innerstructure-constraints, a deformation consistency loss and a trainable activecontour loss, together with a scribble-constraint to supervise the optimizationof the encoder-decoder network without introducing any auxiliary module orextra operation based on prior cues. Comprehensive experiments demonstrate ourmethod's superiority over five state-of-the-art algorithms in this field.Source code is available at ", "output": "Scribble-Supervised Target Extraction Method Based on Inner Structure-Constraint for Remote Sensing Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unrestricted adversarial attacks typically manipulate the semantic content ofan image (e.g., color or texture) to create adversarial examples that are botheffective and photorealistic, demonstrating their ability to deceive humanperception and deep neural networks with stealth and success. However, currentworks usually sacrifice unrestricted degrees and subjectively select some imagecontent to guarantee the photorealism of unrestricted adversarial examples,which limits its attack performance. To ensure the photorealism of adversarialexamples and boost attack performance, we propose a novel unrestricted attackframework called Content-based Unrestricted Adversarial Attack. By leveraging alow-dimensional manifold that represents natural images, we map the images ontothe manifold and optimize them along its adversarial direction. Therefore,within this framework, we implement Adversarial Content Attack based on StableDiffusion and can generate high transferable unrestricted adversarial exampleswith various adversarial contents. Extensive experimentation and visualizationdemonstrate the efficacy of ACA, particularly in surpassing state-of-the-artattacks by an average of 13.3-50.4% and 16.8-48.0% in normally trained modelsand defense methods, respectively.", "output": "Content-based Unrestricted Adversarial Attack."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph anomaly detection has long been an important problem in various domainspertaining to information security such as financial fraud, social spam,network intrusion, etc. The majority of existing methods are performed in anunsupervised manner, as labeled anomalies in a large scale are often tooexpensive to acquire. However, the identified anomalies may turn out to be datanoises or uninteresting data instances due to the lack of prior knowledge onthe anomalies. In realistic scenarios, it is often feasible to obtain limitedlabeled anomalies, which have great potential to advance graph anomalydetection. However, the work exploring limited labeled anomalies and a largeamount of unlabeled nodes in graphs to detect anomalies is rather limited.Therefore, in this paper, we study a novel problem of few-shot graph anomalydetection. We propose a new framework MetaGAD to learn to meta-transfer theknowledge between unlabeled and labeled nodes for graph anomaly detection.Experimental results on six real-world datasets with synthetic anomalies and\"organic\" anomalies (available in the dataset) demonstrate the effectiveness ofthe proposed approach in detecting anomalies with limited labeled anomalies.", "output": "MetaGAD: Learning to Meta Transfer for Few-shot Graph Anomaly Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The prevalence of large-scale graphs poses great challenges in time andstorage for training and deploying graph neural networks (GNNs). Several recentworks have explored solutions for pruning the large original graph into a smalland highly-informative one, such that training and inference on the pruned andlarge graphs have comparable performance. Although empirically effective,current researches focus on static or non-temporal graphs, which are notdirectly applicable to dynamic scenarios. In addition, they require labels asground truth to learn the informative structure, limiting their applicabilityto new problem domains where labels are hard to obtain. To solve the dilemma,we propose and study the problem of unsupervised graph pruning on dynamicgraphs. We approach the problem by our proposed STEP, a self-supervisedtemporal pruning framework that learns to remove potentially redundant edgesfrom input dynamic graphs. From a technical and industrial viewpoint, ourmethod overcomes the trade-offs between the performance and the time & memoryoverheads. Our results on three real-world datasets demonstrate the advantageson improving the efficacy, robustness, and efficiency of GNNs on dynamic nodeclassification tasks. Most notably, STEP is able to prune more than 50% ofedges on a million-scale industrial graph Alipay (7M nodes, 21M edges) whileapproximating up to 98% of the original performance. Code is available at", "output": "Less Can Be More: Unsupervised Graph Pruning for Large-scale Dynamic Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Code generation aims to automatically generate source code from high-leveltask specifications, which can significantly increase productivity of softwareengineering. Recently, approaches based on large language models (LLMs) haveshown remarkable code generation abilities on simple tasks. However, generatecode for more complex tasks, such as competition-level problems, remainschallenging. In this paper, we introduce Brainstorm framework for codegeneration. It leverages a brainstorming step that generates and selectsdiverse thoughts on the problem to facilitate algorithmic reasoning, where thethoughts are possible blueprint of solving the problem. We demonstrate thatBrainstorm significantly enhances the ability of LLMs to solvecompetition-level programming problems, resulting in a more than 50% increasein the pass@$k$ metrics for ChatGPT on the CodeContests benchmark, achievingstate-of-the-art performance. Furthermore, our experiments conducted onLeetCode contests show that our framework boosts the ability of ChatGPT to alevel comparable to that of human programmers.", "output": "Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The uptake of health insurance has been poor in Nigeria, a significant stepto improving this includes improved awareness, access to information and toolsto support decision making. Artificial intelligence (AI) based recommendersystems have gained popularity in helping individuals find movies, books,music, and different types of products on the internet including diverseapplications in healthcare. The content-based methodology (item-based approach)was employed in the recommender system. We applied both the K-Nearest Neighbor(KNN) and Cosine similarity algorithm. We chose the Cosine similarity as ourchosen algorithm after several evaluations based of their outcomes incomparison with domain knowledge. The recommender system takes intoconsideration the choices entered by the user, filters the health managementorganization (HMO) data by location and chosen prices. It then recommends thetop 3 HMOs with closest similarity in services offered. A recommendation toolto help people find and select the best health insurance plan for them isuseful in reducing the barrier of accessing health insurance. Users areempowered to easily find appropriate information on available plans, reducecognitive overload in dealing with over 100 options available in the market andeasily see what matches their financial capacity.", "output": "Machine Learning Recommendation System For Health Insurance Decision Making In Nigeria."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large-scale datasets in the real world inevitably involve label noise. Deepmodels can gradually overfit noisy labels and thus degrade modelgeneralization. To mitigate the effects of label noise, learning with noisylabels (LNL) methods are designed to achieve better generalization performance.Due to the lack of suitable datasets, previous studies have frequently employedsynthetic label noise to mimic real-world label noise. However, synthetic noiseis not instance-dependent, making this approximation not always effective inpractice. Recent research has proposed benchmarks for learning with real-worldnoisy labels. However, the noise sources within may be single or fuzzy, makingbenchmarks different from data with heterogeneous label noises in the realworld. To tackle these issues, we contribute NoisywikiHow, the largest NLPbenchmark built with minimal supervision. Specifically, inspired by humancognition, we explicitly construct multiple sources of label noise to imitatehuman errors throughout the annotation, replicating real-world noise, whosecorruption is affected by both ground-truth labels and instances. Moreover, weprovide a variety of noise levels to support controlled experiments on noisydata, enabling us to evaluate LNL methods systematically and comprehensively.After that, we conduct extensive multi-dimensional experiments on a broad rangeof LNL methods, obtaining new and intriguing findings.", "output": "NoisywikiHow: A Benchmark for Learning with Real-world Noisy Labels in Natural Language Processing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Time-Series Mining (TSM) is an important research area since it shows greatpotential in practical applications. Deep learning models that rely on massivelabeled data have been utilized for TSM successfully. However, constructing alarge-scale well-labeled dataset is difficult due to data annotation costs.Recently, Pre-Trained Models have gradually attracted attention in the timeseries domain due to their remarkable performance in computer vision andnatural language processing. In this survey, we provide a comprehensive reviewof Time-Series Pre-Trained Models (TS-PTMs), aiming to guide the understanding,applying, and studying TS-PTMs. Specifically, we first briefly introduce thetypical deep learning models employed in TSM. Then, we give an overview ofTS-PTMs according to the pre-training techniques. The main categories weexplore include supervised, unsupervised, and self-supervised TS-PTMs. Further,extensive experiments are conducted to analyze the advantages and disadvantagesof transfer learning strategies, Transformer-based models, and representativeTS-PTMs. Finally, we point out some potential directions of TS-PTMs for futurework.", "output": "A Survey on Time-Series Pre-Trained Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Long-term time series forecasting has gained significant attention in recentyears. While there are various specialized designs for capturing temporaldependency, previous studies have demonstrated that a single linear layer canachieve competitive forecasting performance compared to other complexarchitectures. In this paper, we thoroughly investigate the intrinsiceffectiveness of recent approaches and make three key observations: 1) linearmapping is critical to prior long-term time series forecasting efforts; 2)RevIN (reversible normalization) and CI (Channel Independent) play a vital rolein improving overall forecasting performance; and 3) linear mapping caneffectively capture periodic features in time series and has robustness fordifferent periods across channels when increasing the input horizon. We providetheoretical and experimental explanations to support our findings and alsodiscuss the limitations and future works. Our framework's code is available aturl{", "output": "Revisiting Long-term Time Series Forecasting: An Investigation on Linear Mapping."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a novel framework, i.e., Segment Any Anomaly + (SAA+), forzero-shot anomaly segmentation with hybrid prompt regularization to improve theadaptability of modern foundation models. Existing anomaly segmentation modelstypically rely on domain-specific fine-tuning, limiting their generalizationacross countless anomaly patterns. In this work, inspired by the greatzero-shot generalization ability of foundation models like Segment Anything, wefirst explore their assembly to leverage diverse multi-modal prior knowledgefor anomaly localization. For non-parameter foundation model adaptation toanomaly segmentation, we further introduce hybrid prompts derived from domainexpert knowledge and target image context as regularization. Our proposed SAA+model achieves state-of-the-art performance on several anomaly segmentationbenchmarks, including VisA, MVTec-AD, MTD, and KSDD2, in the zero-shot setting.We will release the code athref{", "output": "Segment Any Anomaly without Training via Hybrid Prompt Regularization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Today, we have a mixture of young and older individuals, people with specialneeds, and people who can care for themselves. Over 1 billion people areestimated to be disabled; this figure corresponds to about 15% of the world'spopulation, with 3.8% (approximately 190 million people) accounting for peopleaged 15 and up (Organization, 2011). The number of people with disabilities isupward due to the increase in chronic health conditions and many other things.These and other factors have made the need for proper care facilities urgent intoday's society. Several care facilities are built to help people withdisabilities live their everyday lives and not be left out of the community.", "output": "Ambient Technology & Intelligence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite substantial progress in abstractive text summarization to generatefluent and informative texts, the factual inconsistency in the generatedsummaries remains an important yet challenging problem to be solved. In thispaper, we construct causal graphs for abstractive text summarization andidentify the intrinsic causes of the factual inconsistency, i.e., the languagebias and irrelevancy bias, and further propose a debiasing framework, namedCoFactSum, to alleviate the causal effects of these biases by counterfactualestimation. Specifically, the proposed CoFactSum provides two counterfactualestimation strategies, i.e., Explicit Counterfactual Masking with an explicitdynamic masking strategy, and Implicit Counterfactual Training with an implicitdiscriminative cross-attention mechanism. Meanwhile, we design a DebiasingDegree Adjustment mechanism to dynamically adapt the debiasing degree at eachdecoding step. Extensive experiments on two widely-used summarization datasetsdemonstrate the effectiveness of CoFactSum in enhancing the factual consistencyof generated summaries compared with several baselines.", "output": "Counterfactual Debiasing for Generating Factually Consistent Text Summaries."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep graph clustering has recently received significant attention due to itsability to enhance the representation learning capabilities of models inunsupervised scenarios. Nevertheless, deep clustering for temporal graphs,which could capture crucial dynamic interaction information, has not been fullyexplored. It means that in many clustering-oriented real-world scenarios,temporal graphs can only be processed as static graphs. This not only causesthe loss of dynamic information but also triggers huge computationalconsumption. To solve the problem, we propose a general framework for deepTemporal Graph Clustering called TGC, which adjusts deep clustering techniques(clustering assignment distribution and adjacency matrix reconstruction) tosuit the interaction sequence-based batch-processing pattern of temporalgraphs. In addition, we discuss differences between temporal graph clusteringand existing static graph clustering from several levels. To verify thesuperiority of the proposed framework TGC, we conduct extensive experiments.The experimental results show that temporal graph clustering enables moreflexibility in finding a balance between time and space requirements, and ourframework can effectively improve the performance of existing temporal graphlearning methods. Our code and supplementary material will be released afterpublication.", "output": "Deep Temporal Graph Clustering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Prior beliefs about the latent function to shape inductive biases can beincorporated into a Gaussian Process (GP) via the kernel. However, beyondkernel choices, the decision-making process of GP models remains poorlyunderstood. In this work, we contribute an analysis of the loss landscape forGP models using methods from physics. We demonstrate $nu$-continuity forMatern kernels and outline aspects of catastrophe theory at critical points inthe loss landscape. By directly including $nu$ in the hyperparameteroptimisation for Matern kernels, we find that typical values of $nu$ are farfrom optimal in terms of performance, yet prevail in the literature due to theincreased computational speed. We also provide an a priori method forevaluating the effect of GP ensembles and discuss various voting approachesbased on physical properties of the loss landscape. The utility of theseapproaches is demonstrated for various synthetic and real datasets. Ourfindings provide an enhanced understanding of the decision-making processbehind GPs and offer practical guidance for improving their performance andinterpretability in a range of applications.", "output": "Physics Inspired Approaches Towards Understanding Gaussian Processes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial attack is commonly regarded as a huge threat to neural networksbecause of misleading behavior. This paper presents an opposite perspective:adversarial attacks can be harnessed to improve neural models if amendedcorrectly. Unlike traditional adversarial defense or adversarial trainingschemes that aim to improve the adversarial robustness, the proposedadversarial amendment (AdvAmd) method aims to improve the original accuracylevel of neural models on benign samples. We thoroughly analyze thedistribution mismatch between the benign and adversarial samples. Thisdistribution mismatch and the mutual learning mechanism with the same learningratio applied in prior art defense strategies is the main cause leading theaccuracy degradation for benign samples. The proposed AdvAmd is demonstrated tosteadily heal the accuracy degradation and even leads to a certain accuracyboost of common neural models on benign classification, object detection, andsegmentation tasks. The efficacy of the AdvAmd is contributed by three keycomponents: mediate samples (to reduce the influence of distribution mismatchwith a fine-grained amendment), auxiliary batch norm (to solve the mutuallearning mechanism and the smoother judgment surface), and AdvAmd loss (toadjust the learning ratios according to different attack vulnerabilities)through quantitative and ablation experiments.", "output": "Adversarial Amendment is the Only Force Capable of Transforming an Enemy into a Friend."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, the ever-increasing demand for bandwidth in multi-modalcommunication systems requires a paradigm shift. Powered by deep learning,semantic communications are applied to multi-modal scenarios to boostcommunication efficiency and save communication resources. However, theexisting end-to-end neural network (NN) based framework without the channelencoder/decoder is incompatible with modern digital communication systems.Moreover, most end-to-end designs are task-specific and require re-design andre-training for new tasks, which limits their applications. In this paper, wepropose a distributed multi-modal semantic communication frameworkincorporating the conventional channel encoder/decoder. We adopt NN-basedsemantic encoder and decoder to extract correlated semantic informationcontained in different modalities, including speech, text, and image. Based onthe proposed framework, we further establish a general rate-adaptive codingmechanism for various types of multi-modal semantic tasks. In particular, weutilize unequal error protection based on semantic importance, which is derivedby evaluating the distortion bound of each modality. We further formulate andsolve an optimization problem that aims at minimizing inference delay whilemaintaining inference accuracy for semantic tasks. Numerical results show thatthe proposed mechanism fares better than both conventional communication andexisting semantic communication systems in terms of task performance, inferencedelay, and deployment complexity.", "output": "Rate-Adaptive Coding Mechanism for Semantic Communications With Multi-Modal Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large Language Models (LLMs) do not differentially represent numbers, whichare pervasive in text. In contrast, neuroscience research has identifieddistinct neural representations for numbers and words. In this work, weinvestigate how well popular LLMs capture the magnitudes of numbers (e.g., that$4 < 5$) from a behavioral lens. Prior research on the representationalcapabilities of LLMs evaluates whether they show human-level performance, forinstance, high overall accuracy on standard benchmarks. Here, we ask adifferent question, one inspired by cognitive science: How closely do thenumber representations of LLMscorrespond to those of human language users, whotypically demonstrate the distance, size, and ratio effects? We depend on alinking hypothesis to map the similarities among the model embeddings of numberwords and digits to human response times. The results reveal surprisinglyhuman-like representations across language models of different architectures,despite the absence of the neural circuitry that directly supports theserepresentations in the human brain. This research shows the utility ofunderstanding LLMs using behavioral benchmarks and points the way to futurework on the number of representations of LLMs and their cognitive plausibility.", "output": "Numeric Magnitude Comparison Effects in Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human intelligence's adaptability is remarkable, allowing us to adjust to newtasks and multi-modal environments swiftly. This skill is evident from a youngage as we acquire new abilities and solve problems by imitating others orfollowing natural language instructions. The research community is activelypursuing the development of interactive \"embodied agents\" that can engage innatural conversations with humans and assist them with real-world tasks. Theseagents must possess the ability to promptly request feedback in casecommunication breaks down or instructions are unclear. Additionally, they mustdemonstrate proficiency in learning new vocabulary specific to a given domain.In this paper, we made the following contributions: (1) a crowd-sourcing toolfor collecting grounded language instructions; (2) the largest dataset ofgrounded language instructions; and (3) several state-of-the-art baselines.These contributions are suitable as a foundation for further research.", "output": "Transforming Human-Centered AI Collaboration: Redefining Embodied Agents Capabilities through Interactive Grounded Language Instructions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Modern online service providers such as online shopping platforms oftenprovide both search and recommendation (S&R) services to meet different userneeds. Rarely has there been any effective means of incorporating user behaviordata from both S&R services. Most existing approaches either simply treat S&Rbehaviors separately, or jointly optimize them by aggregating data from bothservices, ignoring the fact that user intents in S&R can be distinctivelydifferent. In our paper, we propose a Search-Enhanced framework for theSequential Recommendation (SESRec) that leverages users' search interests forrecommendation, by disentangling similar and dissimilar representations withinS&R behaviors. Specifically, SESRec first aligns query and item embeddingsbased on users' query-item interactions for the computations of theirsimilarities. Two transformer encoders are used to learn the contextualrepresentations of S&R behaviors independently. Then a contrastive learningtask is designed to supervise the disentanglement of similar and dissimilarrepresentations from behavior sequences of S&R. Finally, we extract userinterests by the attention mechanism from three perspectives, i.e., thecontextual representations, the two separated behaviors containing similar anddissimilar interests. Extensive experiments on both industrial and publicdatasets demonstrate that SESRec consistently outperforms state-of-the-artmodels. Empirical studies further validate that SESRec successfully disentanglesimilar and dissimilar user interests from their S&R behaviors.", "output": "When Search Meets Recommendation: Learning Disentangled Search Representation for Recommendation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sequential Recommender Systems (SRSs) are a popular type of recommendersystem that learns from a user's history to predict the next item they arelikely to interact with. However, user interactions can be affected by noisestemming from account sharing, inconsistent preferences, or accidental clicks.To address this issue, we (i) propose a new evaluation protocol that takesmultiple future items into account and (ii) introduce a novel relevance-awareloss function to train a SRS with multiple future items to make it more robustto noise. Our relevance-aware models obtain an improvement of ~1.2% of NDCG@10and 0.88% in the traditional evaluation protocol, while in the new evaluationprotocol, the improvement is ~1.63% of NDCG@10 and ~1.5% of HR w.r.t the bestperforming models.", "output": "Integrating Item Relevance in Training Loss for Sequential Recommender Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent decades, the field of affective computing has made substantialprogress in advancing the ability of AI systems to recognize and expressaffective phenomena, such as affect and emotions, during human-human andhuman-machine interactions. This paper describes our examination of research atthe intersection of multimodal interaction and affective computing, with theobjective of observing trends and identifying understudied areas. We examinedover 16,000 papers from selected conferences in multimodal interaction,affective computing, and natural language processing: ACM InternationalConference on Multimodal Interaction, AAAC International Conference onAffective Computing and Intelligent Interaction, Annual Meeting of theAssociation for Computational Linguistics, and Conference on Empirical Methodsin Natural Language Processing. We identified 910 affect-related papers andpresent our analysis of the role of affective phenomena in these papers. Wefind that this body of research has primarily focused on enabling machines torecognize and express affect and emotion. However, we find limited research onhow affect and emotion predictions might be used by AI systems to enhancemachine understanding of human social behaviors and cognitive states. Based onour analysis, we discuss directions to expand the role of affective phenomenain multimodal interaction research.", "output": "Expanding the Role of Affective Phenomena in Multimodal Interaction Research."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Shear wall structures are widely used in high-rise residential buildings, andthe layout of shear walls requires many years of design experience anditerative trial and error. Currently, there are methods based on heuristicalgorithms, but they generate results too slowly. Those based on GenerativeAdversarial Networks (GANs) or Graph Neural Networks (GNNs) can only generatesingle arrangements and require large amounts of training data. At present,Stable Diffusion is being widely used, and by using the Low-Rank Adaptation(LoRA) method to fine-tune large models with small amounts of data, goodgenerative results can be achieved. Therefore, this paper proposes apersonalized AI assistant for shear wall layout based on Stable Diffusion,which has been proven to produce good generative results through testing.", "output": "Constructing a personalized AI assistant for shear wall layout using Stable Diffusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "During 2022, both transformer-based AI text generation sys-tems such as GPT-3and AI text-to-image generation systems such as DALL-E 2 and Stable Diffusionmade exponential leaps forward and are unquestionably altering the fields ofdigital art and electronic literature. In this panel a group of electronicliterature authors and theorists consider new oppor-tunities for humancreativity presented by these systems and present new works have producedduring the past year that specifically address these systems as environmentsfor literary expressions that are translated through iterative interlocutiveprocesses into visual representations. The premise that binds thesepresentations is that these systems and the works gener-ated must be consideredfrom a literary perspective, as they originate in human writing. In worksranging from a visual memoir of the personal experience of a health crisis, tointerac-tive web comics, to architectures based on abstract poetic language, topolitical satire, four artists explore the capabili-ties of these writingenvironments for new genres of literary artist practice, while a digitalculture theorist considers the origins and effects of the particular trainingdatasets of human language and images on which these new hybrid forms arebased.", "output": "AIwriting: Relations Between Image Generation and Digital Writing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Uncertainty-quantification methods are applied to estimate the confidence ofdeep-neural-networks classifiers over their predictions. However, most widelyused methods are known to be overconfident. We address this problem bydeveloping an algorithm that exploits the latent-space representation of datapoints fed into the network, to assess the accuracy of their prediction. Usingthe latent-space representation generated by the fraction of training set thatthe network classifies correctly, we build a statistical model that is able tocapture the likelihood of a given prediction. We show on a synthetic datasetthat commonly used methods are mostly overconfident. Overconfidence occurs alsofor predictions made on data points that are outside the distribution thatgenerated the training data. In contrast, our method can detect suchout-of-distribution data points as inaccurately predicted, thus aiding in theautomatic detection of outliers.", "output": "Uncertainty Quantification in Deep Neural Networks through Statistical Inference on Latent Space."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces a novel explainable image quality evaluation approachcalled X-IQE, which leverages visual large language models (LLMs) to evaluatetext-to-image generation methods by generating textual explanations. X-IQEutilizes a hierarchical Chain of Thought (CoT) to enable MiniGPT-4 to produceself-consistent, unbiased texts that are highly correlated with humanevaluation. It offers several advantages, including the ability to distinguishbetween real and generated images, evaluate text-image alignment, and assessimage aesthetics without requiring model training or fine-tuning. X-IQE is morecost-effective and efficient compared to human evaluation, while significantlyenhancing the transparency and explainability of deep image quality evaluationmodels. We validate the effectiveness of our method as a benchmark using imagesgenerated by prevalent diffusion models. X-IQE demonstrates similar performanceto state-of-the-art (SOTA) evaluation methods on COCO Caption, while overcomingthe limitations of previous evaluation models on DrawBench, particularly inhandling ambiguous generation prompts and text recognition in generated images.Project website:", "output": "X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Approximation fixpoint theory (AFT) is an abstract and general algebraicframework for studying the semantics of non-monotonic logics. In recent work,AFT was generalized to non-deterministic operators, i.e. operators whose rangeare sets of elements rather than single elements. In this paper, we make threefurther contributions to non-deterministic AFT: (1) we define and studyultimate approximations of non-deterministic operators, (2) we give analgebraic formulation of the semi-equilibrium semantics by Amendola, et al.,and (3) we generalize the characterisations of disjunctive logic programs todisjunctive logic programs with aggregates.", "output": "Non-deterministic approximation operators: ultimate operators, semi-equilibrium semantics and aggregates (full version)."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large Language Models (LLMs) have demonstrated exceptional performance in avariety of tasks, including essay writing and question answering. However, itis crucial to address the potential misuse of these models, which can lead todetrimental outcomes such as plagiarism and spamming. Recently, severaldetectors have been proposed, including fine-tuned classifiers and variousstatistical methods. In this study, we reveal that with the aid of carefullycrafted prompts, LLMs can effectively evade these detection systems. We proposea novel Substitution-based In-Context example Optimization method (SICO) toautomatically generate such prompts. On three real-world tasks where LLMs canbe misused, SICO successfully enables ChatGPT to evade six existing detectors,causing a significant 0.54 AUC drop on average. Surprisingly, in most casesthese detectors perform even worse than random classifiers. These resultsfirmly reveal the vulnerability of existing detectors. Finally, the strongperformance of SICO suggests itself as a reliable evaluation protocol for anynew detector in this field.", "output": "Large Language Models can be Guided to Evade AI-Generated Text Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Systems for serving inference requests on graph neural networks (GNN) mustcombine low latency with high throughout, but they face irregular computationdue to skew in the number of sampled graph nodes and aggregated GNN features.This makes it challenging to exploit GPUs effectively: using GPUs to sampleonly a few graph nodes yields lower performance than CPU-based sampling; andaggregating many features exhibits high data movement costs between GPUs andCPUs. Therefore, current GNN serving systems use CPUs for graph sampling andfeature aggregation, limiting throughput.We describe Quiver, a distributed GPU-based GNN serving system withlow-latency and high-throughput. Quiver's key idea is to exploit workloadmetrics for predicting the irregular computation of GNN requests, and governingthe use of GPUs for graph sampling and feature aggregation: (1) for graphsampling, Quiver calculates the probabilistic sampled graph size, a metric thatpredicts the degree of parallelism in graph sampling. Quiver uses this metricto assign sampling tasks to GPUs only when the performance gains surpassCPU-based sampling; and (2) for feature aggregation, Quiver relies on thefeature access probability to decide which features to partition and replicateacross a distributed GPU NUMA topology. We show that Quiver achieves up to 35times lower latency with an 8 times higher throughput compared tostate-of-the-art GNN approaches (DGL and PyG).", "output": "Quiver: Supporting GPUs for Low-Latency, High-Throughput GNN Serving with Workload Awareness."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The difficulty of appropriately assigning credit is particularly heightenedin cooperative MARL with sparse reward, due to the concurrent time andstructural scales involved. Automatic subgoal generation (ASG) has recentlyemerged as a viable MARL approach inspired by utilizing subgoals inintrinsically motivated reinforcement learning. However, end-to-end learning ofcomplex task planning from sparse rewards without prior knowledge, undoubtedlyrequires massive training samples. Moreover, the diversity-promoting nature ofexisting ASG methods can lead to the \"over-representation\" of subgoals,generating numerous spurious subgoals of limited relevance to the actual taskreward and thus decreasing the sample efficiency of the algorithm. To addressthis problem and inspired by the disentangled representation learning, wepropose a novel \"disentangled\" decision-making method, Semantically Alignedtask decomposition in MARL (SAMA), that prompts pretrained language models withchain-of-thought that can suggest potential goals, provide suitable goaldecomposition and subgoal allocation as well as self-reflection-basedreplanning. Additionally, SAMA incorporates language-grounded RL to train eachagent's subgoal-conditioned policy. SAMA demonstrates considerable advantagesin sample efficiency compared to state-of-the-art ASG methods, as evidenced byits performance on two challenging sparse-reward tasks, Overcooked and MiniRTS.", "output": "Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video-assisted transoral tracheal intubation (TI) necessitates using anendoscope that helps the physician insert a tracheal tube into the glottisinstead of the esophagus. The growing trend of robotic-assisted TI wouldrequire a medical robot to distinguish anatomical features like an experiencedphysician which can be imitated by utilizing supervised deep-learningtechniques. However, the real datasets of oropharyngeal organs are ofteninaccessible due to limited open-source data and patient privacy. In this work,we propose a domain adaptive Sim-to-Real framework called IoU-RankingBlend-ArtFlow (IRB-AF) for image segmentation of oropharyngeal organs. Theframework includes an image blending strategy called IoU-Ranking Blend (IRB)and style-transfer method ArtFlow. Here, IRB alleviates the problem of poorsegmentation performance caused by significant datasets domain differences;while ArtFlow is introduced to reduce the discrepancies between datasetsfurther. A virtual oropharynx image dataset generated by the SOFA framework isused as the learning subject for semantic segmentation to deal with the limitedavailability of actual endoscopic images. We adapted IRB-AF with thestate-of-the-art domain adaptive segmentation models. The results demonstratethe superior performance of our approach in further improving the segmentationaccuracy and training stability.", "output": "Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The trustworthiness of DNNs is often challenged by their vulnerability tominor adversarial perturbations, which may not only undermine predictionaccuracy (robustness) but also cause biased predictions for similar inputs(individual fairness). Accurate fairness has been recently proposed to enforcea harmonic balance between accuracy and individual fairness. It induces thenotion of fairness confusion matrix to categorize predictions as true fair,true biased, false fair, and false biased. This paper proposes a harmonicevaluation approach, RobustFair, for the accurate fairness of DNNs, usingadversarial perturbations crafted through fairness confusion directed gradientsearch. By using Taylor expansions to approximate the ground truths ofadversarial instances, RobustFair can particularly identify the robustnessdefects entangled for spurious fairness, which are often elusive in robustnessevaluation, and missing in individual fairness evaluation. RobustFair can boostrobustness and individual fairness evaluations by identifying robustness orfairness defects simultaneously. Empirical case studies on fairness benchmarkdatasets show that, compared with the state-of-the-art white-box robustness andindividual fairness testing approaches, RobustFair detects significantly1.77-11.87 times adversarial perturbations, yielding 1.83-13.12 times biasedand 1.53-8.22 times false instances. The adversarial instances can then beeffectively exploited to improve the accurate fairness (and hence accuracy andindividual fairness) of the original deep neural network through retraining.The empirical case studies further show that the adversarial instancesidentified by RobustFair outperform those identified by the other testingapproaches, in promoting 21% accurate fairness and 19% individual fairness onmultiple sensitive attributes, without losing accuracy at all or even promotingit by up to 4%.", "output": "RobustFair: Adversarial Evaluation through Fairness Confusion Directed Gradient Search."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We investigate the use of transformer sequence models as dynamics models(TDMs) for control. In a number of experiments in the DeepMind control suite,we find that first, TDMs perform well in a single-environment learning settingwhen compared to baseline models. Second, TDMs exhibit strong generalizationcapabilities to unseen environments, both in a few-shot setting, where ageneralist model is fine-tuned with small amounts of data from the targetenvironment, and in a zero-shot setting, where a generalist model is applied toan unseen environment without any further training. We further demonstrate thatgeneralizing system dynamics can work much better than generalizing optimalbehavior directly as a policy. This makes TDMs a promising ingredient for afoundation model of control.", "output": "A Generalist Dynamics Model for Control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Using only image-sentence pairs, weakly-supervised visual-textual groundingaims to learn region-phrase correspondences of the respective entity mentions.Compared to the supervised approach, learning is more difficult since boundingboxes and textual phrases correspondences are unavailable. In light of this, wepropose the Semantic Prior Refinement Model (SPRM), whose predictions areobtained by combining the output of two main modules. The first untrainedmodule aims to return a rough alignment between textual phrases and boundingboxes. The second trained module is composed of two sub-components that refinethe rough alignment to improve the accuracy of the final phrase-bounding boxalignments. The model is trained to maximize the multimodal similarity betweenan image and a sentence, while minimizing the multimodal similarity of the samesentence and a new unrelated image, carefully selected to help the most duringtraining. Our approach shows state-of-the-art results on two popular datasets,Flickr30k Entities and ReferIt, shining especially on ReferIt with a 9.6%absolute improvement. Moreover, thanks to the untrained component, it reachescompetitive performances just using a small fraction of training examples.", "output": "Weakly-Supervised Visual-Textual Grounding with Semantic Prior Refinement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "To develop computational agents that better communicate using their ownemergent language, we endow the agents with an ability to focus their attentionon particular concepts in the environment. Humans often understand an object orscene as a composite of concepts and those concepts are further mapped ontowords. We implement this intuition as cross-modal attention mechanisms inSpeaker and Listener agents in a referential game and show attention leads tomore compositional and interpretable emergent language. We also demonstrate howattention aids in understanding the learned communication protocol byinvestigating the attention weights associated with each message symbol and thealignment of attention weights between Speaker and Listener agents. Overall,our results suggest that attention is a promising mechanism for developing morehuman-like emergent language.", "output": "Emergent Communication with Attention."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative modeling has recently undergone remarkable advancements, primarilypropelled by the transformative implications of Diffusion Probabilistic Models(DPMs). The impressive capability of these models, however, often entailssignificant computational overhead during both training and inference. Totackle this challenge, we present Diff-Pruning, an efficient compression methodtailored for learning lightweight diffusion models from pre-existing ones,without the need for extensive re-training. The essence of Diff-Pruning isencapsulated in a Taylor expansion over pruned timesteps, a process thatdisregards non-contributory diffusion steps and ensembles informative gradientsto identify important weights. Our empirical assessment, undertaken across fourdiverse datasets highlights two primary benefits of our proposed method: 1)Efficiency: it enables approximately a 50% reduction in FLOPs at a mere 10% to20% of the original training expenditure; 2) Consistency: the pruned diffusionmodels inherently preserve generative behavior congruent with their pre-trainedprogenitors. Code is available at url{", "output": "Structural Pruning for Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While multilingual neural machine translation has achieved great success, itsuffers from the off-target issue, where the translation is in the wronglanguage. This problem is more pronounced on zero-shot translation tasks. Inthis work, we find that failing in encoding discriminative target languagesignal will lead to off-target and a closer lexical distance (i.e.,KL-divergence) between two languages' vocabularies is related with a higheroff-target rate. We also find that solely isolating the vocab of differentlanguages in the decoder can alleviate the problem. Motivated by the findings,we propose Language Aware Vocabulary Sharing (LAVS), a simple and effectivealgorithm to construct the multilingual vocabulary, that greatly alleviates theoff-target problem of the translation model by increasing the KL-divergencebetween languages. We conduct experiments on a multilingual machine translationbenchmark in 11 languages. Experiments show that the off-target rate for 90translation tasks is reduced from 29% to 8%, while the overall BLEU score isimproved by an average of 1.9 points without extra training cost or sacrificingthe supervised directions' performance. We release the code athref{ reproduction.", "output": "On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite its successes, to date Artificial Intelligence (AI) is stillcharacterized by a number of shortcomings with regards to different applicationdomains and goals. These limitations are arguably both conceptual (e.g.,related to underlying theoretical models, such as symbolic vs. connectionist),and operational (e.g., related to robustness and ability to generalize).Biologically inspired AI, and more specifically brain-inspired AI, promises toprovide further biological aspects beyond those that are already traditionallyincluded in AI, making it possible to assess and possibly overcome some of itspresent shortcomings. This article examines some conceptual, technical, andethical issues raised by the development and use of brain-inspired AI. Againstthis background, the paper asks whether there is anything ethically uniqueabout brain-inspired AI. The aim of the paper is to introduce a method that hasa heuristic nature and that can be applied to identify and address the ethicalissues arising from brain-inspired AI. The conclusion resulting from theapplication of this method is that, compared to traditional AI, brain-inspiredAI raises new foundational ethical issues and some new practical ethicalissues, and exacerbates some of the issues raised by traditional AI.", "output": "A method for the ethical analysis of brain-inspired AI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reducing the number of bits needed to encode the weights and activations ofneural networks is highly desirable as it speeds up their training andinference time while reducing memory consumption. For these reasons, researchin this area has attracted significant attention toward developing neuralnetworks that leverage lower-precision computing, such as mixed-precisiontraining. Interestingly, none of the existing approaches has investigated pure16-bit floating-point settings. In this paper, we shed light on the overlookedefficiency of pure 16-bit floating-point neural networks. As such, we provide acomprehensive theoretical analysis to investigate the factors contributing tothe differences observed between 16-bit and 32-bit models. We formalize theconcepts of floating-point error and tolerance, enabling us to quantitativelyexplain the conditions under which a 16-bit model can closely approximate theresults of its 32-bit counterpart. This theoretical exploration offersperspective that is distinct from the literature which attributes the successof low-precision neural networks to its regularization effect. This in-depthanalysis is supported by an extensive series of experiments. Our findingsdemonstrate that pure 16-bit floating-point neural networks can achieve similaror even better performance than their mixed-precision and 32-bit counterparts.We believe the results presented in this paper will have significantimplications for machine learning practitioners, offering an opportunity toreconsider using pure 16-bit networks in various applications.", "output": "In Defense of Pure 16-bit Floating-Point Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We address the problem of teleoperating an industrial robot manipulator via acommercially available Virtual Reality (VR) interface. Previous works on VRteleoperation for robot manipulators focus primarily on collaborative orresearch robot platforms (whose dynamics and constraints differ from industrialrobot arms), or only address tasks where the robot's dynamics are not asimportant (e.g: pick and place tasks). We investigate the usage of commerciallyavailable VR interfaces for effectively teleoeprating industrial robotmanipulators in a variety of contact-rich manipulation tasks. We find thatapplying standard practices for VR control of robot arms is challenging forindustrial platforms because torque and velocity control is not exposed, andposition control is mediated through a black-box controller. To mitigate theseproblems, we propose a simplified filtering approach to process command signalsto enable operators to effectively teleoperate industrial robot arms with VRinterfaces in dexterous manipulation tasks. We hope our findings will helprobot practitioners implement and setup effective VR teleoperation interfacesfor robot manipulators. The proposed method is demonstrated on a variety ofcontact-rich manipulation tasks which can also involve very precise movement ofthe robot during execution (videos can be found at", "output": "A Virtual Reality Teleoperation Interface for Industrial Robot Manipulators."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Prevention is better than cure. This old truth applies not only to theprevention of diseases but also to the prevention of issues with AI models usedin medicine. The source of malfunctioning of predictive models often lies notin the training process but reaches the data acquisition phase or design of theexperiment phase.In this paper, we analyze in detail a single use case - a Kaggle competitionrelated to the detection of abnormalities in X-ray lung images. We demonstratehow a series of simple tests for data imbalance exposes faults in the dataacquisition and annotation process. Complex models are able to learn suchartifacts and it is difficult to remove this bias during or after the training.Errors made at the data collection stage make it difficult to validate themodel correctly.Based on this use case, we show how to monitor data and model balance(fairness) throughout the life cycle of a predictive model, from dataacquisition to parity analysis of model scores.", "output": "Prevention is better than cure: a case study of the abnormalities detection in the chest."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In an indivisible participatory budgeting (PB) framework, we have a limitedbudget that is to be distributed among a set of projects, by aggregating thepreferences of voters for the projects. All the prior work on indivisible PBassumes that each project has only one possible cost. In this work, we let eachproject have a set of permissible costs, each reflecting a possible degree ofsophistication of the project. Each voter approves a range of costs for eachproject, by giving an upper and lower bound on the cost that she thinks theproject deserves. The outcome of a PB rule selects a subset of projects andalso specifies their corresponding costs. We study different utility notionsand prove that the existing positive results when every project has exactly onepermissible cost can also be extended to our framework where a project hasseveral permissible costs. We also analyze the fixed parameter tractability ofthe problem. Finally, we propose some important and intuitive axioms andanalyze their satisfiability by different PB rules. We conclude by making somecrucial remarks.", "output": "Participatory Budgeting With Multiple Degrees of Projects And Ranged Approval Votes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Ocular Toxoplasmosis (OT), is a common eye infection caused by T. gondii thatcan cause vision problems. Diagnosis is typically done through a clinicalexamination and imaging, but these methods can be complicated and costly,requiring trained personnel. To address this issue, we have created a benchmarkstudy that evaluates the effectiveness of existing pre-trained networks usingtransfer learning techniques to detect OT from fundus images. Furthermore, wehave also analysed the performance of transfer-learning based segmentationnetworks to segment lesions in the images. This research seeks to provide aguide for future researchers looking to utilise DL techniques and develop acheap, automated, easy-to-use, and accurate diagnostic method. We haveperformed in-depth analysis of different feature extraction techniques in orderto find the most optimal one for OT classification and segmentation of lesions.For classification tasks, we have evaluated pre-trained models such as VGG16,MobileNetV2, InceptionV3, ResNet50, and DenseNet121 models. Among them,MobileNetV2 outperformed all other models in terms of Accuracy (Acc), Recall,and F1 Score outperforming the second-best model, InceptionV3 by 0.7% higherAcc. However, DenseNet121 achieved the best result in terms of Precision, whichwas 0.1% higher than MobileNetv2. For the segmentation task, this work hasexploited U-Net architecture. In order to utilize transfer learning the encoderblock of the traditional U-Net was replaced by MobileNetV2, InceptionV3,ResNet34, and VGG16 to evaluate different architectures moreover two differenttwo different loss functions (Dice loss and Jaccard loss) were exploited inorder to find the most optimal one. The MobileNetV2/U-Net outperformed ResNet34by 0.5% and 2.1% in terms of Acc and Dice Score, respectively when Jaccard lossfunction is employed during the training.", "output": "Benchmarking Deep Learning Frameworks for Automated Diagnosis of Ocular Toxoplasmosis: A Comprehensive Approach to Classification and Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The softmax attention mechanism has emerged as a noteworthy development inthe field of Artificial Intelligence research, building on the successes ofTransformer-based architectures. However, their ever increasing sizesnecessitate ever increasing computational memory, that limits their usage. Wepropose KgV, a sigmoid gating mechanism that, in conjunction with softmaxattention, significantly boosts performance without increasing architecturesize. To amend the size requirements, we leverage Tensor Chains to identify andprune the excess parameters. We find that such excess resides primarily withinthe embedding layer, and not in the output linear layer. To further improveembedding and significantly reduce parameters, we introduce H-SoftPOS, ahierarchical embedding layer which simultaneously enhances performance.Remarkably, on the WMT14 English-German validation set, our approach yields athreefold reduction in perplexity, surpassing the current state-of-the-art,while reducing parameter counts also by a factor of 3. When we further reducethe number of parameters up to sevenfold, we can still achieve a 21% decreasein perplexity with respect to the baseline Transformer. To understandgeneralization capabilities, we conduct experiments on the 7 language pairs ofthe WMT17 dataset. Our method outperforms existing techniques in terms of testloss while simultaneously halving the number of parameters. Moreover, weobserve a 70 times reduction in variance with respect to the priorstate-of-the-art. In conclusion, our proposed method yields significantimprovements in performance and much lower memory cost. We call the resultingarchitecture Anthe.", "output": "Less is More! A slim architecture for optimal language translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Masked language modeling (MLM) is a widely used self-supervised pretrainingobjective, where a model needs to predict an original token that is replacedwith a mask given contexts. Although simpler and computationally efficientpretraining objectives, e.g., predicting the first character of a masked token,have recently shown comparable results to MLM, no objectives with a maskingscheme actually outperform it in downstream tasks. Motivated by the assumptionthat their lack of complexity plays a vital role in the degradation, wevalidate whether more complex masked objectives can achieve better results andinvestigate how much complexity they should have to perform comparably to MLM.Our results using GLUE, SQuAD, and Universal Dependencies benchmarksdemonstrate that more complicated objectives tend to show better downstreamresults with at least half of the MLM complexity needed to perform comparablyto MLM. Finally, we discuss how we should pretrain a model using a maskedobjective from the task complexity perspective.", "output": "How does the task complexity of masked pretraining objectives affect downstream performance?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Lifelong learning agents aim to learn multiple tasks sequentially over alifetime. This involves the ability to exploit previous knowledge when learningnew tasks and to avoid forgetting. Modulating masks, a specific type ofparameter isolation approach, have recently shown promise in both supervisedand reinforcement learning. While lifelong learning algorithms have beeninvestigated mainly within a single-agent approach, a question remains on howmultiple agents can share lifelong learning knowledge with each other. We showthat the parameter isolation mechanism used by modulating masks is particularlysuitable for exchanging knowledge among agents in a distributed anddecentralized system of lifelong learners. The key idea is that the isolationof specific task knowledge to specific masks allows agents to transfer onlyspecific knowledge on-demand, resulting in robust and effective distributedlifelong learning. We assume fully distributed and asynchronous scenarios withdynamic agent numbers and connectivity. An on-demand communication protocolensures agents query their peers for specific masks to be transferred andintegrated into their policies when facing each task. Experiments indicate thaton-demand mask communication is an effective way to implement distributedlifelong reinforcement learning and provides a lifelong learning benefit withrespect to distributed RL baselines such as DD-PPO, IMPALA, and PPO+EWC. Thesystem is particularly robust to connection drops and demonstrates rapidlearning due to knowledge exchange.", "output": "Sharing Lifelong Reinforcement Learning Knowledge via Modulating Masks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent work has considered whether large language models (LLMs) can functionas planners: given a task, generate a plan. We investigate whether LLMs canserve as generalized planners: given a domain and training tasks, generate aprogram that efficiently produces plans for other tasks in the domain. Inparticular, we consider PDDL domains and use GPT-4 to synthesize Pythonprograms. We also consider (1) Chain-of-Thought (CoT) summarization, where theLLM is prompted to summarize the domain and propose a strategy in words beforesynthesizing the program; and (2) automated debugging, where the program isvalidated with respect to the training tasks, and in case of errors, the LLM isre-prompted with four types of feedback. We evaluate this approach in sevenPDDL domains and compare it to four ablations and four baselines. Overall, wefind that GPT-4 is a surprisingly powerful generalized planner. We alsoconclude that automated debugging is very important, that CoT summarization hasnon-uniform impact, that GPT-4 is far superior to GPT-3.5, and that just twotraining tasks are often sufficient for strong generalization.", "output": "Generalized Planning in PDDL Domains with Pretrained Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The objective of Audio-Visual Segmentation (AVS) is to locate soundingobjects within visual scenes by accurately predicting pixelwise segmentationmasks. In this paper, we present the following contributions: (i), we propose ascalable and annotation-free pipeline for generating artificial data for theAVS task. We leverage existing image segmentation and audio datasets to drawlinks between category labels, image-mask pairs, and audio samples, whichallows us to easily compose (image, audio, mask) triplets for training AVSmodels; (ii), we introduce a novel Audio-Aware Transformer (AuTR) architecturethat features an audio-aware query-based transformer decoder. This architectureenables the model to search for sounding objects with the guidance of audiosignals, resulting in more accurate segmentation; (iii), we present extensiveexperiments conducted on both synthetic and real datasets, which demonstratethe effectiveness of training AVS models with synthetic data generated by ourproposed pipeline. Additionally, our proposed AuTR architecture exhibitssuperior performance and strong generalization ability on public benchmarks.The project page is ", "output": "Annotation-free Audio-Visual Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Document-level relation extraction (DocRE) aims to infer complex semanticrelations among entities in a document. Distant supervision (DS) is able togenerate massive auto-labeled data, which can improve DocRE performance. Recentworks leverage pseudo labels generated by the pre-denoising model to reducenoise in DS data. However, unreliable pseudo labels bring new noise, e.g.,adding false pseudo labels and losing correct DS labels. Therefore, how toselect effective pseudo labels to denoise DS data is still a challenge indocument-level distant relation extraction. To tackle this issue, we introduceuncertainty estimation technology to determine whether pseudo labels can betrusted. In this work, we propose a Document-level distant Relation Extractionframework with Uncertainty Guided label denoising, UGDRE. Specifically, wepropose a novel instance-level uncertainty estimation method, which measuresthe reliability of the pseudo labels with overlapping relations. By furtherconsidering the long-tail problem, we design dynamic uncertainty thresholds fordifferent types of relations to filter high-uncertainty pseudo labels. Weconduct experiments on two public datasets. Our framework outperforms strongbaselines by 1.91 F1 and 2.28 Ign F1 on the RE-DocRED dataset.", "output": "Uncertainty Guided Label Denoising for Document-level Distant Relation Extraction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual Question Answering (VQA) is an emerging area of interest forresearches, being a recent problem in natural language processing and imageprediction. In this area, an algorithm needs to answer questions about certainimages. As of the writing of this survey, 25 recent studies were analyzed.Besides, 6 datasets were analyzed and provided their link to download. In thiswork, several recent pieces of research in this area were investigated and adeeper analysis and comparison among them were provided, including results, thestate-of-the-art, common errors, and possible points of improvement for futureresearchers.", "output": "Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The quantum perceptron, the variational circuit, and the Grover algorithmhave been proposed as promising components for quantum machine learning. Thispaper presents a new quantum perceptron that combines the quantum variationalcircuit and the Grover algorithm. However, this does not guarantee that thisquantum variational perceptron with Grover's algorithm (QVPG) will have anyadvantage over its quantum variational (QVP) and classical counterparts. Here,we examine the performance of QVP and QVP-G by computing their loss functionand analyzing their accuracy on the classification task, then comparing thesetwo quantum models to the classical perceptron (CP). The results show that ourtwo quantum models are more efficient than CP, and our novel suggested modelQVP-G outperforms the QVP, demonstrating that the Grover can be applied to theclassification task and even makes the model more accurate, besides theunstructured search problems.", "output": "Simulation of a Variational Quantum Perceptron using Grover's Algorithm."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, an increasing interest in the management of water and healthresources has been recorded. This interest is fed by the global sustainabilitychallenges posed to the humanity that have water scarcity and quality at theircore. Thus, the availability of effective, meaningful and open data is crucialto address those issues in the broader context of the Sustainable DevelopmentGoals of clean water and sanitation as targeted by the United Nations. In thispaper, we present the Water Health Open Knowledge Graph (WHOW-KG) along withits design methodology and analysis on impact. WHOW-KG is a semantic knowledgegraph that models data on water consumption, pollution, infectious diseaserates and drug distribution. The WHOW-KG is developed in the context of theEU-funded WHOW (Water Health Open Knowledge) project and aims at supporting awide range of applications: from knowledge discovery to decision-making, makingit a valuable resource for researchers, policymakers, and practitioners in thewater and health domains. The WHOW-KG consists of a network of five ontologiesand related linked open data, modelled according to those ontologies.", "output": "The Water Health Open Knowledge Graph."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Artificial Intelligence (AI) is poised to enable a new leap in the creationof scholarly content. New forms of engagement with AI systems, such ascollaborations with large language models like GPT-3, offer affordances thatwill change the nature of both the scholarly process and the artifacts itproduces. This article articulates ways in which those artifacts can bewritten, distributed, read, organized, and stored that are more dynamic, andpotentially more effective, than current academic practices. Specifically,rather than the current \"early-binding\" process (that is, one in which ideasare fully reduced to a final written form before they leave an author's desk),we propose that there are substantial benefits to a \"late-binding\" process, inwhich ideas are written dynamically at the moment of reading. In fact, theparadigm of \"binding\" knowledge may transition to a new model in whichscholarship remains ever \"unbound\" and evolving. An alternative form for ascholarly work could be encapsulated via several key components: a textabstract of the work's core arguments; hyperlinks to a bibliography of relevantrelated work; novel data that had been collected and metadata describing thosedata; algorithms or processes necessary for analyzing those data; a referenceto a particular AI model that would serve as a \"renderer\" of the canonicalversion of the text; and specified parameters that would allow for a precise,word-for-word reconstruction of the canonical version. Such a form would enableboth the rendering of the canonical version, and also the possibility ofdynamic AI reimaginings of the text in light of future findings, scholarshipunknown to the original authors, alternative theories, and precise tailoring tospecific audiences (e.g., children, adults, professionals, amateurs).", "output": "Late-Binding Scholarship in the Age of AI: Navigating Legal and Normative Challenges of a New Form of Knowledge Production."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Converting text into the structured query language (Text2SQL) is a researchhotspot in the field of natural language processing (NLP), which has broadapplication prospects. In the era of big data, the use of databases haspenetrated all walks of life, in which the collected data is large in scale,diverse in variety, and wide in scope, making the data query cumbersome andinefficient, and putting forward higher requirements for the Text2SQL model. Inpractical applications, the current mainstream end-to-end Text2SQL model is notonly difficult to build due to its complex structure and high requirements fortraining data, but also difficult to adjust due to massive parameters. Inaddition, the accuracy of the model is hard to achieve the desired result.Based on this, this paper proposes a pipelined Text2SQL method: SPSQL. Thismethod disassembles the Text2SQL task into four subtasks--table selection,column selection, SQL generation, and value filling, which can be convertedinto a text classification problem, a sequence labeling problem, and two textgeneration problems, respectively. Then, we construct data formats of differentsubtasks based on existing data and improve the accuracy of the overall modelby improving the accuracy of each submodel. We also use the named entityrecognition module and data augmentation to optimize the overall model. Weconstruct the dataset based on the marketing business data of the State GridCorporation of China. Experiments demonstrate our proposed method achieves thebest performance compared with the end-to-end method and other pipelinemethods.", "output": "SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Past work demonstrated that using neural networks, we can extend unfinishedmusic pieces while maintaining the music style of the musician. With recentadvancements in large language models and diffusion models, we are now capableof generating comics with an interesting storyline while maintaining the artstyle of the artist. In this paper, we used ChatGPT to generate storylines anddialogue and then generated the comic using stable diffusion. We introduced anovel way to evaluate AI-generated stories, and we achieved SOTA performance oncharacter fidelity and art style by fine-tuning stable diffusion using LoRA,ControlNet, etc.", "output": "Generating coherent comic with rich story using ChatGPT and Stable Diffusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The purpose of this work is to describe the Orkg-Leaderboard softwaredesigned to extract leaderboards defined as Task-Dataset-Metric tuplesautomatically from large collections of empirical research papers in ArtificialIntelligence (AI). The software can support both the main workflows ofscholarly publishing, viz. as LaTeX files or as PDF files. Furthermore, thesystem is integrated with the Open Research Knowledge Graph (ORKG) platform,which fosters the machine-actionable publishing of scholarly findings. Thus thesystem output, when integrated within the ORKG's supported Semantic Webinfrastructure of representing machine-actionable 'resources' on the Web,enables: 1) broadly, the integration of empirical results of researchers acrossthe world, thus enabling transparency in empirical research with the potentialto also being complete contingent on the underlying data source(s) ofpublications; and 2) specifically, enables researchers to track the progress inAI with an overview of the state-of-the-art (SOTA) across the most common AItasks and their corresponding datasets via dynamic ORKG frontend viewsleveraging tables and visualization charts over the machine-actionable data.Our best model achieves performances above 90% F1 on the textit{leaderboard}extraction task, thus proving Orkg-Leaderboards a practically viable tool forreal-world usage. Going forward, in a sense, Orkg-Leaderboards transforms theleaderboard extraction task to an automated digitalization task, which hasbeen, for a long time in the community, a crowdsourced endeavor.", "output": "ORKG-Leaderboards: A Systematic Workflow for Mining Leaderboards as a Knowledge Graph."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A considerable number of texts encountered daily are somehow connected witheach other. For example, Wikipedia articles refer to other articles viahyperlinks, scientific papers relate to others via citations or (co)authors,while tweets relate via users that follow each other or reshare content. Hence,a graph-like structure can represent existing connections and be seen ascapturing the \"context\" of the texts. The question thus arises if extractingand integrating such context information into a language model might helpfacilitate a better automated understanding of the text. In this study, weexperimentally demonstrate that incorporating graph-based contextualizationinto BERT model enhances its performance on an example of a classificationtask. Specifically, on Pubmed dataset, we observed a reduction in error from8.51% to 7.96%, while increasing the number of parameters just by 1.6%.Our source code: ", "output": "Enriching language models with graph-based context information to better understand textual data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automatically generating human-readable text describing the functionality ofa program is the intent of source code summarization. Although Neural LanguageModels achieve significant performance in this field, an emerging trend iscombining neural models with external knowledge. Most previous approaches relyon the sentence-level retrieval and combination paradigm (retrieval of similarcode snippets and use of the corresponding code and summary pairs) on theencoder side. However, this paradigm is coarse-grained and cannot directly takeadvantage of the high-quality retrieved summary tokens on the decoder side. Inthis paper, we explore a fine-grained token-level retrieval-augmented mechanismon the decoder side to help the vanilla neural model generate a better codesummary. Furthermore, to mitigate the limitation of token-level retrieval oncapturing contextual code semantics, we propose to integrate code semanticsinto summary tokens. Extensive experiments and human evaluation reveal that ourtoken-level retrieval-augmented approach significantly improves performance andis more interpretive.", "output": "Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present Self-Driven Strategy Learning (sdsl), a lightweight onlinelearning methodology for automated reasoning tasks that involve solving a setof related problems. sdsl automatically gathers information, in form of adataset, while solving earlier problems. It utilizes the learned data to adjustthe solving strategy for later problems by fitting a machine learning model tothe obtained data on the fly. We formally define the approach as a set ofabstract transition rules. We describe a concrete instance of the sdsl calculuswhich uses conditional sampling for generating data and random forests as theunderlying machine learning model. We implement the approach on top of theKissat solver and show that the combination of Kissat+sdsl certifies largerbounds and finds more counter-examples than other state-of-the-art boundedmodel checking approaches on benchmarks obtained from the latest Hardware ModelChecking Competition.", "output": "Lightweight Online Learning for Sets of Related Problems in Automated Reasoning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We investigate the emergent abilities of the recently proposed web-scalespeech model Whisper, by adapting it to unseen tasks with prompt engineering.We selected three tasks: audio-visual speech recognition (AVSR), code-switchedspeech recognition (CS-ASR), and speech translation (ST) on unseen languagepairs. We design task-specific prompts, by either leveraging anotherlarge-scale model, or simply manipulating the special tokens in the defaultprompts. Experiments show that compared to the default prompts, our proposedprompts improve performance by 10% to 45% on the three zero-shot tasks, andeven outperform SotA supervised models on some datasets. In addition, ourexperiments reveal many interesting properties of Whisper, including itsrobustness to prompts, bias on accents, and the multilingual understanding inits latent space. Code is available at", "output": "Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We prove a universal approximation property (UAP) for a class of ODENet and aclass of ResNet, which are simplified mathematical models for deep learningsystems with skip connections. The UAP can be stated as follows. Let $n$ and$m$ be the dimension of input and output data, and assume $mleq n$. Then weshow that ODENet of width $n+m$ with any non-polynomial continuous activationfunction can approximate any continuous function on a compact subset on$mathbb{R}^n$. We also show that ResNet has the same property as the depthtends to infinity. Furthermore, we derive the gradient of a loss functionexplicitly with respect to a certain tuning variable. We use this to constructa learning algorithm for ODENet. To demonstrate the usefulness of thisalgorithm, we apply it to a regression problem, a binary classification, and amultinomial classification in MNIST.", "output": "Universal Approximation Properties for an ODENet and a ResNet: Mathematical Analysis and Numerical Experiments."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Intelligence relies on an agent's knowledge of what it does not know. Thiscapability can be assessed based on the quality of joint predictions of labelsacross multiple inputs. In principle, ensemble-based approaches produceeffective joint predictions, but the computational costs of training largeensembles can become prohibitive. We introduce the epinet: an architecture thatcan supplement any conventional neural network, including large pretrainedmodels, and can be trained with modest incremental computation to estimateuncertainty. With an epinet, conventional neural networks outperform very largeensembles, consisting of hundreds or more particles, with orders of magnitudeless computation. The epinet does not fit the traditional framework of Bayesianneural networks. To accommodate development of approaches beyond BNNs, such asthe epinet, we introduce the epistemic neural network (ENN) as an interface formodels that produce joint predictions.", "output": "Epistemic Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The notion of reparametrizations of Weighted CSPs (WCSPs) (also known asequivalence-preserving transformations of WCSPs) is well-known and finds itsuse in many algorithms to approximate or bound the optimal WCSP value. Incontrast, the concept of super-reparametrizations (which are changes of theweights that keep or increase the WCSP objective for every assignment) wasalready proposed but never studied in detail. To fill this gap, we present anumber of theoretical properties of super-reparametrizations and compare themto those of reparametrizations. Furthermore, we propose a framework forcomputing upper bounds on the optimal value of the (maximization version of)WCSP using super-reparametrizations. We show that it is in principle possibleto employ arbitrary (under some technical conditions) constraint propagationrules to improve the bound. For arc consistency in particular, the methodreduces to the known Virtual AC (VAC) algorithm. We implemented the method forsingleton arc consistency (SAC) and compared it to other strong localconsistencies in WCSPs on a public benchmark. The results show that the boundsobtained from SAC are superior for many instance groups.", "output": "Super-Reparametrizations of Weighted CSPs: Properties and Optimization Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Myoelectric pattern recognition is one of the important aspects in the designof the control strategy for various applications including upper-limbprostheses and bio-robotic hand movement systems. The current work has proposedan approach to design an energy-efficient EMG-based controller by considering asupervised learning framework using a kernelized SVM classifier for decodingthe information of surface electromyography (sEMG) signals to infer theunderlying muscle movements. In order to achieve the optimized performance ofthe EMG-based controller, our main strategy of classifier design is to reducethe false movements of the overall system (when the EMG-based controller is atthe `Rest' position). To this end, unlike the traditional single trainingobjective of soft margin kernelized SVM, we have formulated the trainingalgorithm of the proposed supervised learning system as a general constrainedmulti-objective optimization problem. An elitist multi-objective evolutionaryalgorithm $-$ the non-dominated sorting genetic algorithm II (NSGA-II) has beenused for the tuning of SVM hyperparameters. We have presented the experimentalresults by performing the experiments on a dataset consisting of the sEMGsignals collected from eleven subjects at five different upper limb positions.It is evident from the presented result that the proposed approach providesmuch more flexibility to the designer in selecting the parameters of theclassifier to optimize the energy efficiency of the EMG-based controller.", "output": "Towards Power-Efficient Design of Myoelectric Controller based on Evolutionary Computation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformer-based models have delivered impressive results on many tasks,particularly vision and language tasks. In many model training situations,conventional configurations are typically adopted. For example, we often setthe base model with hidden dimensions (i.e. model width) to be 768 and thenumber of transformer layers (i.e. model depth) to be 12. In this paper, werevisit these conventional configurations. Through theoretical analysis andexperimental evaluation, we show that the masked autoencoder is effective inalleviating the over-smoothing issue in deep transformer training. Based onthis finding, we propose Bamboo, an idea of using deeper and narrowertransformer configurations, for masked autoencoder training. On ImageNet, withsuch a simple change in configuration, re-designed model achieves 87.1% top-1accuracy and outperforms SoTA models like MAE and BEiT. On language tasks,re-designed model outperforms BERT with default setting by 1.1 points onaverage, on GLUE datasets.", "output": "A Study on Transformer Configuration and Training Objective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph contrastive learning has emerged as a powerful tool for unsupervisedgraph representation learning. The key to the success of graph contrastivelearning is to acquire high-quality positive and negative samples ascontrasting pairs for the purpose of learning underlying structural semanticsof the input graph. Recent works usually sample negative samples from the sametraining batch with the positive samples, or from an external irrelevant graph.However, a significant limitation lies in such strategies, which is theunavoidable problem of sampling false negative samples. In this paper, wepropose a novel method to utilize textbf{C}ounterfactual mechanism to generateartificial hard negative samples for textbf{G}raph textbf{C}ontrastivelearning, namely textbf{CGC}, which has a different perspective compared tothose sampling-based strategies. We utilize counterfactual mechanism to producehard negative samples, which ensures that the generated samples are similar to,but have labels that different from the positive sample. The proposed methodachieves satisfying results on several datasets compared to some traditionalunsupervised graph learning methods and some SOTA graph contrastive learningmethods. We also conduct some supplementary experiments to give an extensiveillustration of the proposed method, including the performances of CGC withdifferent hard negative samples and evaluations for hard negative samplesgenerated with different similarity measurements.", "output": "Generating Counterfactual Hard Negative Samples for Graph Contrastive Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most reinforcement learning algorithms seek a single optimal strategy thatsolves a given task. However, it can often be valuable to learn a diverse setof solutions, for instance, to make an agent's interaction with users moreengaging, or improve the robustness of a policy to an unexpected perturbance.We propose Diversity-Guided Policy Optimization (DGPO), an on-policy algorithmthat discovers multiple strategies for solving a given task. Unlike prior work,it achieves this with a shared policy network trained over a single run.Specifically, we design an intrinsic reward based on an information-theoreticdiversity objective. Our final objective alternately constraints on thediversity of the strategies and on the extrinsic reward. We solve theconstrained optimization problem by casting it as a probabilistic inferencetask and use policy iteration to maximize the derived lower bound. Experimentalresults show that our method efficiently discovers diverse strategies in a widevariety of reinforcement learning tasks. Compared to baseline methods, DGPOachieves comparable rewards, while discovering more diverse strategies, andoften with better sample efficiency.", "output": "DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent years have seen a surge in research on dynamic graph representationlearning, which aims to model temporal graphs that are dynamic and evolvingconstantly over time. However, current work typically models graph dynamicswith recurrent neural networks (RNNs), making them suffer seriously fromcomputation and memory overheads on large temporal graphs. So far, scalabilityof dynamic graph representation learning on large temporal graphs remains oneof the major challenges. In this paper, we present a scalable framework, namelySpikeNet, to efficiently capture the temporal and structural patterns oftemporal graphs. We explore a new direction in that we can capture the evolvingdynamics of temporal graphs with spiking neural networks (SNNs) instead ofRNNs. As a low-power alternative to RNNs, SNNs explicitly model graph dynamicsas spike trains of neuron populations and enable spike-based propagation in anefficient way. Experiments on three large real-world temporal graph datasetsdemonstrate that SpikeNet outperforms strong baselines on the temporal nodeclassification task with lower computational costs. Particularly, SpikeNetgeneralizes to a large temporal graph (2.7M nodes and 13.9M edges) withsignificantly fewer parameters and computation overheads.Our code is publiclyavailable at url{", "output": "Scaling Up Dynamic Graph Representation Learning via Spiking Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In neural networks, task-relevant information is represented jointly bygroups of neurons. However, the specific way in which this mutual informationabout the classification label is distributed among the individual neurons isnot well understood: While parts of it may only be obtainable from specificsingle neurons, other parts are carried redundantly or synergistically bymultiple neurons. We show how Partial Information Decomposition (PID), a recentextension of information theory, can disentangle these different contributions.From this, we introduce the measure of \"Representational Complexity\", whichquantifies the difficulty of accessing information spread across multipleneurons. We show how this complexity is directly computable for smaller layers.For larger layers, we propose subsampling and coarse-graining procedures andprove corresponding bounds on the latter. Empirically, for quantized deepneural networks solving the MNIST and CIFAR10 tasks, we observe thatrepresentational complexity decreases both through successive hidden layers andover training, and compare the results to related measures. Overall, we proposerepresentational complexity as a principled and interpretable summary statisticfor analyzing the structure and evolution of neural representations and complexsystems in general.", "output": "A Measure of the Complexity of Neural Representations based on Partial Information Decomposition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present Multiscale Audio Spectrogram Transformer (MAST) for audioclassification, which brings the concept of multiscale feature hierarchies tothe Audio Spectrogram Transformer (AST). Given an input audio spectrogram, wefirst patchify and project it into an initial temporal resolution and embeddingdimension, post which the multiple stages in MAST progressively expand theembedding dimension while reducing the temporal resolution of the input. We usea pyramid structure that allows early layers of MAST operating at a hightemporal resolution but low embedding space to model simple low-level acousticinformation and deeper temporally coarse layers to model high-level acousticinformation with high-dimensional embeddings. We also extend our approach topresent a new Self-Supervised Learning (SSL) method called SS-MAST, whichcalculates a symmetric contrastive loss between latent representations from astudent and a teacher encoder, leveraging patch-drop, a novel audioaugmentation approach that we introduce. In practice, MAST significantlyoutperforms AST by an average accuracy of 3.4% across 8 speech and non-speechtasks from the LAPE Benchmark, achieving state-of-the-art results on keywordspotting in Speech Commands. Additionally, our proposed SS-MAST achieves anabsolute average improvement of 2.6% over the previously proposed SSAST.", "output": "MAST: Multiscale Audio Spectrogram Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a new Self-Supervised Learning (SSL) approach to pre-trainencoders on unlabeled audio data that reduces the need for large amounts oflabeled data for audio and speech classification. Our primary aim is to learnaudio representations that can generalize across a large variety of speech andnon-speech tasks in a low-resource un-labeled audio pre-training setting.Inspired by the recent success of clustering and contrasting learning paradigmsfor SSL-based speech representation learning, we propose SLICER (SymmetricalLearning of Instance and Cluster-level Efficient Representations), which bringstogether the best of both clustering and contrasting learning paradigms. We usea symmetric loss between latent representations from student and teacherencoders and simultaneously solve instance and cluster-level contrastivelearning tasks. We obtain cluster representations online by just projecting theinput spectrogram into an output subspace with dimensions equal to the numberof clusters. In addition, we propose a novel mel-spectrogram augmentationprocedure, k-mix, based on mixup, which does not require labels and aidsunsupervised representation learning for audio. Overall, SLICER achievesstate-of-the-art results on the LAPE Benchmark cite{9868132}, significantlyoutperforming DeLoRes-M and other prior approaches, which are pre-trained on$10times$ larger of unsupervised data. We will make all our codes available onGitHub.", "output": "SLICER: Learning universal audio representations using low-resource self-supervised pre-training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work is intended as a voice in the discussion over the recent claimsthat LaMDA, a pretrained language model based on the Transformer modelarchitecture, is sentient. This claim, if confirmed, would have seriousramifications in the Natural Language Processing (NLP) community due towide-spread use of similar models. However, here we take the position that sucha language model cannot be sentient, or conscious, and that LaMDA in particularexhibits no advances over other similar models that would qualify it. Wejustify this by analysing the Transformer architecture through IntegratedInformation Theory. We see the claims of consciousness as part of a widertendency to use anthropomorphic language in NLP reporting. Regardless of theveracity of the claims, we consider this an opportune moment to take stock ofprogress in language modelling and consider the ethical implications of thetask. In order to make this work helpful for readers outside the NLP community,we also present the necessary background in language modelling.", "output": "Deanthropomorphising NLP: Can a Language Model Be Conscious?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Causal effect estimation from observational data is a fundamental task inempirical sciences. It becomes particularly challenging when unobservedconfounders are involved in a system. This paper focuses on front-dooradjustment -- a classic technique which, using observed mediators allows toidentify causal effects even in the presence of unobserved confounding. Whilethe statistical properties of the front-door estimation are quite wellunderstood, its algorithmic aspects remained unexplored for a long time.Recently, Jeong, Tian, and Barenboim [NeurIPS 2022] have presented the firstpolynomial-time algorithm for finding sets satisfying the front-door criterionin a given directed acyclic graph (DAG), with an $O(n^3(n+m))$ run time, where$n$ denotes the number of variables and $m$ the number of edges of the causalgraph. In our work, we give the first linear-time, i.e., $O(n+m)$, algorithmfor this task, which thus reaches the asymptotically optimal time complexity.This result implies an $O(n(n+m))$ delay enumeration algorithm of allfront-door adjustment sets, again improving previous work by Jeong et al. by afactor of $n^3$. Moreover, we provide the first linear-time algorithm forfinding a minimal front-door adjustment set. We offer implementations of ouralgorithms in multiple programming languages to facilitate practical usage andempirically validate their feasibility, even for large graphs.", "output": "Linear-Time Algorithms for Front-Door Adjustment in Causal Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Determining the satisfiability of Boolean constraint-satisfaction problemswith different types of constraints, that is hybrid constraints, is awell-studied problem with important applications. We study here a newapplication of hybrid Boolean constraints, which arises in quantum computing.The problem relates to constrained perfect matching in edge-colored graphs.While general-purpose hybrid constraint solvers can be powerful, we show thatdirect encodings of the constrained-matching problem as hybrid constraintsscale poorly and special techniques are still needed. We propose a novelencoding based on Tutte's Theorem in graph theory as well as optimizationtechniques. Empirical results demonstrate that our encoding, in suitablelanguages with advanced SAT solvers, scales significantly better than a numberof competing approaches on constrained-matching benchmarks. Our studyidentifies the necessity of designing problem-specific encodings when applyingpowerful general-purpose constraint solvers.", "output": "Solving Quantum-Inspired Perfect Matching Problems via Tutte's Theorem-Based Hybrid Boolean Constraints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The generation of molecules with desired properties has gained tremendouspopularity, revolutionizing the way scientists design molecular structures andproviding valuable support for chemical and drug design. However, despite thepotential of language models in molecule generation, they face numerouschallenges such as the generation of syntactically or chemically flawedmolecules, narrow domain focus, and limitations in creating diverse anddirectionally feasible molecules due to a dearth of annotated data or externalmolecular databases. To this end, we introduce MolGen, a pre-trained molecularlanguage model tailored specifically for molecule generation. MolGen acquiresintrinsic structural and grammatical insights by reconstructing over 100million molecular SELFIES, while facilitating knowledge transfer betweendifferent domains through domain-agnostic molecular prefix tuning. Moreover, wepresent a self-feedback paradigm that inspires the pre-trained model to alignwith the ultimate goal of producing molecules with desirable properties.Extensive experiments demonstrate that MolGen achieves superior performance onwell-known molecule generation benchmarks. Further analysis shows that MolGencan accurately capture molecule distributions, implicitly learn theirstructural characteristics, and efficiently explore chemical space. Thepre-trained model, codes, and datasets are publicly available for futureresearch at ", "output": "Domain-Agnostic Molecular Generation with Self-feedback."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The ease and speed of spreading misinformation and propaganda on the Webmotivate the need to develop trustworthy technology for detecting fallacies innatural language arguments. However, state-of-the-art language modeling methodsexhibit a lack of robustness on tasks like logical fallacy classification thatrequire complex reasoning. In this paper, we propose a Case-Based Reasoningmethod that classifies new cases of logical fallacy by language-modeling-drivenretrieval and adaptation of historical cases. We design four complementarystrategies to enrich input representation for our model, based on externalinformation about goals, explanations, counterarguments, and argumentstructure. Our experiments in in-domain and out-of-domain settings indicatethat Case-Based Reasoning improves the accuracy and generalizability oflanguage models. Our ablation studies suggest that representations of similarcases have a strong impact on the model performance, that models perform wellwith fewer retrieved cases, and that the size of the case database has anegligible effect on the performance. Finally, we dive deeper into therelationship between the properties of the retrieved cases and the modelperformance.", "output": "Case-Based Reasoning with Language Models for Classification of Logical Fallacies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training deep networks and tuning hyperparameters on large datasets iscomputationally intensive. One of the primary research directions for efficienttraining is to reduce training costs by selecting well-generalizable subsets oftraining data. Compared to simple adaptive random subset selection baselines,existing intelligent subset selection approaches are not competitive due to thetime-consuming subset selection step, which involves computing model-dependentgradients and feature embeddings and applies greedy maximization of submodularobjectives. Our key insight is that removing the reliance on downstream modelparameters enables subset selection as a pre-processing step and enables one totrain multiple models at no additional cost. In this work, we propose MILO, amodel-agnostic subset selection framework that decouples the subset selectionfrom model training while enabling superior model convergence and performanceby using an easy-to-hard curriculum. Our empirical results indicate that MILOcan train models $3times - 10 times$ faster and tune hyperparameters$20times - 75 times$ faster than full-dataset training or tuning withoutcompromising performance.", "output": "MILO: Model-Agnostic Subset Selection Framework for Efficient Model Training and Tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novelreinforcement learning framework for history-dependent environments thatgeneralizes the contextual MDP framework to handle non-Markov environments,where contexts change over time. We consider special cases of the model, with afocus on logistic DCMDPs, which break the exponential dependence on historylength by leveraging aggregation functions to determine context transitions.This special structure allows us to derive an upper-confidence-bound stylealgorithm for which we establish regret bounds. Motivated by our theoreticalresults, we introduce a practical model-based algorithm for logistic DCMDPsthat plans in a latent space and uses optimism over history-dependent features.We demonstrate the efficacy of our approach on a recommendation task (usingMovieLens data) where user behavior dynamics evolve in response torecommendations.", "output": "Reinforcement Learning with History-Dependent Dynamic Contexts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Tensor completion is important to many areas such as computer vision, dataanalysis, and signal processing. Enforcing low-rank structures on completedtensors, a category of methods known as low-rank tensor completion, hasrecently been studied extensively. Whilst such methods attained great success,none considered exploiting numerical priors of tensor elements. Ignoringnumerical priors causes loss of important information regarding the data, andtherefore prevents the algorithms from reaching optimal accuracy. This workattempts to construct a new methodological framework called GCDTC (GeneralizedCP Decomposition Tensor Completion) for leveraging numerical priors andachieving higher accuracy in tensor completion. In this newly introducedframework, a generalized form of CP Decomposition is applied to low-rank tensorcompletion. This paper also proposes an algorithm known as SPTC (Smooth PoissonTensor Completion) for nonnegative integer tensor completion as aninstantiation of the GCDTC framework. A series of experiments on real-worlddata indicate that SPTC could produce results superior in completion accuracyto current state-of-the-art methods. Related code is available in thesupplemental materials.", "output": "Exploring Numerical Priors for Low-Rank Tensor Completion with Generalized CP Decomposition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent work have demonstrated that robustness (to \"corruption\") can be atodds with generalization. Adversarial training, for instance, aims to reducethe problematic susceptibility of modern neural networks to small dataperturbations. Surprisingly, overfitting is a major concern in adversarialtraining despite being mostly absent in standard training. We provide heretheoretical evidence for this peculiar \"robust overfitting\" phenomenon.Subsequently, we advance a novel distributionally robust loss function bridgingrobustness and generalization. We demonstrate both theoretically as well asempirically the loss to enjoy a certified level of robustness against twocommon types of corruption--data evasion and poisoning attacks--while ensuringguaranteed generalization. We show through careful numerical experiments thatour resulting holistic robust (HR) training procedure yields SOTA performance.Finally, we indicate that HR training can be interpreted as a direct extensionof adversarial training and comes with a negligible additional computationalburden. A ready-to-use python library implementing our algorithm is availableat ", "output": "Certified Robust Neural Networks: Generalization and Corruption Resistance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we introduce UnFuSeD, a novel approach to leverageself-supervised learning and reduce the need for large amounts of labeled datafor audio classification. Unlike prior works, which directly fine-tune aself-supervised pre-trained encoder on a target dataset, we use the encoder togenerate pseudo-labels for unsupervised fine-tuning before the actualfine-tuning step. We first train an encoder using a novel self-supervisedlearning algorithm (SSL) on an unlabeled audio dataset. Then, we use thatencoder to generate pseudo-labels on our target task dataset via clustering theextracted representations. These pseudo-labels are then used to guideself-distillation on a randomly initialized model, which we call unsupervisedfine-tuning. Finally, the resultant encoder is then fine-tuned on our targettask dataset. Through UnFuSeD, we propose the first system that moves away fromgeneric SSL paradigms in literature, which pre-train and fine-tune the sameencoder, and present a novel self-distillation-based system to leverage SSLpre-training for low-resource audio classification. In practice, UnFuSeDachieves state-of-the-art results on the LAPE Benchmark, significantlyoutperforming all our baselines. Additionally, UnFuSeD allows us to achievethis at a 40% reduction in the number of parameters over the previousstate-of-the-art system. We make all our codes publicly available.", "output": "UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent work has shown the promise of creating generalist, transformer-based,models for language, vision, and sequential decision-making problems. To createsuch models, we generally require centralized training objectives, data, andcompute. It is of interest if we can more flexibly create generalist policiesby merging together multiple, task-specific, individually trained policies. Inthis work, we take a preliminary step in this direction through merging, oraveraging, subsets of Decision Transformers in parameter space trained ondifferent MuJoCo locomotion problems, forming multi-task models withoutcentralized training. We also show that when merging policies, we can obtainbetter results if all policies start from common, pre-trained initializations.We also find improvements from larger pre-trained models, and utilizing Fisherinformation for merging. In general, we believe research in this directioncould help democratize and distribute the process of which forms generallycapable models.", "output": "Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Attention is the crucial cognitive ability that limits and selects whatinformation we observe. Previous work by Bolander et al. (2016) proposes amodel of attention based on dynamic epistemic logic (DEL) where agents areeither fully attentive or not attentive at all. While introducing the realisticfeature that inattentive agents believe nothing happens, the model does notrepresent the most essential aspect of attention: its selectivity. Here, wepropose a generalization that allows for paying attention to subsets of atomicformulas. We introduce the corresponding logic for propositional attention, andshow its axiomatization to be sound and complete. We then extend the frameworkto account for inattentive agents that, instead of assuming nothing happens,may default to a specific truth-value of what they failed to attend to (a sortof prior concerning the unattended atoms). This feature allows for a morecognitively plausible representation of the inattentional blindness phenomenon,where agents end up with false beliefs due to their failure to attend toconspicuous but unexpected events. Both versions of the model defineattention-based learning through appropriate DEL event models based on a fewand clear edge principles. While the size of such event models growexponentially both with the number of agents and the number of atoms, weintroduce a new logical language for describing event models syntactically andshow that using this language our event models can be represented linearly inthe number of agents and atoms. Furthermore, representing our event modelsusing this language is achieved by a straightforward formalisation of theaforementioned edge principles.", "output": "Attention! Dynamic Epistemic Logic Models of (In)attentive Agents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Entropy measures are effective features for time series classificationproblems. Traditional entropy measures, such as Shannon entropy, useprobability distribution function. However, for the effective separation oftime series, new entropy estimation methods are required to characterize thechaotic dynamic of the system. Our concept of Neural Network Entropy (NNetEn)is based on the classification of special datasets in relation to the entropyof the time series recorded in the reservoir of the neural network. NNetEnestimates the chaotic dynamics of time series in an original way and does nottake into account probability distribution functions. We propose two newclassification metrics: R2 Efficiency and Pearson Efficiency. The efficiency ofNNetEn is verified on separation of two chaotic time series of sine mappingusing dispersion analysis. For two close dynamic time series (r = 1.1918 and r= 1.2243), the F-ratio has reached the value of 124 and reflects highefficiency of the introduced method in classification problems. Theelectroenceph-alography signal classification for healthy persons and patientswith Alzheimer disease illustrates the practical application of the NNetEnfeatures. Our computations demonstrate the synergistic effect of increasingclassification accuracy when applying traditional entropy measures and theNNetEn concept conjointly. An implementation of the algorithms in Python ispresented.", "output": "Neural Network Entropy (NNetEn): Entropy-Based EEG Signal and Chaotic Time Series Classification, Python Package for NNetEn Calculation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce HATELEXICON, a lexicon of slurs and targets of hate speech forthe countries of Brazil, Germany, India and Kenya, to aid training andinterpretability of models. We demonstrate how our lexicon can be used tointerpret model predictions, showing that models developed to classify extremespeech rely heavily on target words when making predictions. Further, wepropose a method to aid shot selection for training in low-resource settingsvia HATELEXICON. In few-shot learning, the selection of shots is of paramountimportance to model performance. In our work, we simulate a few-shot settingfor German and Hindi, using HASOC data for training and the MultilingualHateCheck (MHC) as a benchmark. We show that selecting shots based on ourlexicon leads to models performing better on MHC than models trained on shotssampled randomly. Thus, when given only a few training examples, using ourlexicon to select shots containing more sociocultural information leads tobetter few-shot performance.", "output": "Sociocultural knowledge is needed for selection of shots in hate speech detection tasks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the rapidly increasing demand for oriented object detection e.g. inautonomous driving and remote sensing, the recently proposed paradigm involvingweakly-supervised detector H2RBox for learning rotated box (RBox) from the(currently) more readily-available horizontal box (HBox) has shown promise.This paper presents H2RBox-v2, to further bridge the gap betweenHBox-supervised and RBox-supervised oriented object detection. Specifically, wepropose to leverage the reflection symmetry via flip and rotate consistencies,using a weakly-supervised network branch similar to H2RBox, together with anovel self-supervised branch that learns orientations from the symmetryinherent in visual objects. The detector is further stabilized and enhanced bypractical techniques to cope with peripheral issues e.g. angular periodicity.To our best knowledge, H2RBox-v2 is the first symmetry-aware self-supervisedparadigm for oriented object detection. In particular, our method shows lesssusceptibility to low-quality annotation and insufficient training datacompared to H2RBox. Specifically, H2RBox-v2 achieves very close performance toa rotation annotation trained counterpart -- Rotated FCOS: 1)DOTA-v1.0/1.5/2.0: 72.31%/64.76%/50.33% vs. 72.44%/64.53%/51.77%; 2) HRSC:89.66% vs. 88.99%; 3) FAIR1M: 42.27% vs. 41.25%.", "output": "H2RBox-v2: Incorporating Symmetry for Boosting Horizontal Box Supervised Oriented Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Exploration is a fundamental aspect of reinforcement learning (RL), and itseffectiveness is a deciding factor in the performance of RL algorithms,especially when facing sparse extrinsic rewards. Recent studies have shown theeffectiveness of encouraging exploration with intrinsic rewards estimated fromnovelties in observations. However, there is a gap between the novelty of anobservation and an exploration, as both the stochasticity in the environmentand the agent's behavior may affect the observation. To evaluate exploratorybehaviors accurately, we propose DEIR, a novel method in which we theoreticallyderive an intrinsic reward with a conditional mutual information term thatprincipally scales with the novelty contributed by agent explorations, and thenimplement the reward with a discriminative forward model. Extensive experimentson both standard and advanced exploration tasks in MiniGrid show that DEIRquickly learns a better policy than the baselines. Our evaluations on ProcGendemonstrate both the generalization capability and the general applicability ofour intrinsic reward. Our source code is available at", "output": "DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial training is an effective learning technique to improve therobustness of deep neural networks. In this study, the influence of adversarialtraining on deep learning models in terms of fairness, robustness, andgeneralization is theoretically investigated under more general perturbationscope that different samples can have different perturbation directions (theadversarial and anti-adversarial directions) and varied perturbation bounds.Our theoretical explorations suggest that the combination of adversaries andanti-adversaries (samples with anti-adversarial perturbations) in training canbe more effective in achieving better fairness between classes and a bettertradeoff between robustness and generalization in some typical learningscenarios (e.g., noisy label learning and imbalance learning) compared withstandard adversarial training. On the basis of our theoretical findings, a moregeneral learning objective that combines adversaries and anti-adversaries withvaried bounds on each training sample is presented. Meta learning is utilizedto optimize the combination weights. Experiments on benchmark datasets underdifferent learning scenarios verify our theoretical findings and theeffectiveness of the proposed methodology.", "output": "Combining Adversaries with Anti-adversaries in Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite remarkable successes, deep reinforcement learning algorithms remainsample inefficient: they require an enormous amount of trial and error to findgood policies. Model-based algorithms promise sample efficiency by building anenvironment model that can be used for planning. Posterior Sampling forReinforcement Learning is such a model-based algorithm that has attractedsignificant interest due to its performance in the tabular setting. This paperintroduces Posterior Sampling for Deep Reinforcement Learning (PSDRL), thefirst truly scalable approximation of Posterior Sampling for ReinforcementLearning that retains its model-based essence. PSDRL combines efficientuncertainty quantification over latent state space models with a speciallytailored continual planning algorithm based on value-function approximation.Extensive experiments on the Atari benchmark show that PSDRL significantlyoutperforms previous state-of-the-art attempts at scaling up posterior samplingwhile being competitive with a state-of-the-art (model-based) reinforcementlearning method, both in sample efficiency and computational efficiency.", "output": "Posterior Sampling for Deep Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we introduce a framework ARBEx, a novel attentive featureextraction framework driven by Vision Transformer with reliability balancing tocope against poor class distributions, bias, and uncertainty in the facialexpression learning (FEL) task. We reinforce several data pre-processing andrefinement methods along with a window-based cross-attention ViT to squeeze thebest of the data. We also employ learnable anchor points in the embedding spacewith label distributions and multi-head self-attention mechanism to optimizeperformance against weak predictions with reliability balancing, which is astrategy that leverages anchor points, attention scores, and confidence valuesto enhance the resilience of label predictions. To ensure correct labelclassification and improve the models' discriminative power, we introduceanchor loss, which encourages large margins between anchor points.Additionally, the multi-head self-attention mechanism, which is also trainable,plays an integral role in identifying accurate labels. This approach providescritical elements for improving the reliability of predictions and has asubstantial positive effect on final prediction capabilities. Our adaptivemodel can be integrated with any deep neural network to forestall challenges invarious recognition tasks. Our strategy outperforms current state-of-the-artmethodologies, according to extensive experiments conducted in a variety ofcontexts.", "output": "ARBEx: Attentive Feature Extraction with Reliability Balancing for Robust Facial Expression Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing neural machine translation (NMT) studies mainly focus on developingdataset-specific models based on data from different tasks (e.g., documenttranslation and chat translation). Although the dataset-specific models haveachieved impressive performance, it is cumbersome as each dataset demands amodel to be designed, trained, and stored. In this work, we aim to unify thesetranslation tasks into a more general setting. Specifically, we propose a``versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) thatworks with data from different tasks, and can translate well in multiplesettings simultaneously, and theoretically it can be as many as possible.Through unified learning, UMLNMT is able to jointly train across multipletasks, implementing intelligent on-demand translation. On seven widely-usedtranslation tasks, including sentence translation, document translation, andchat translation, our UMLNMT results in substantial improvements overdataset-specific models with significantly reduced model deployment costs.Furthermore, UMLNMT can achieve competitive or better performance thanstate-of-the-art dataset-specific methods. Human evaluation and in-depthanalysis also demonstrate the superiority of our approach on generating diverseand high-quality translations. Additionally, we provide a new genre translationdataset about famous aphorisms with 186k Chinese->English sentence pairs.", "output": "Unified Model Learning for Various Neural Machine Translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Trained on a vast amount of data, Large Language models (LLMs) have achievedunprecedented success and generalization in modeling fairly complex textualinputs in the abstract space, making them powerful tools for zero-shotlearning. Such capability is extended to other modalities such as the visualdomain using cross-modal foundation models such as CLIP, and as a result,semantically meaningful representation are extractable from visual inputs.In this work, we leverage this capability and propose an approach that canprovide semantic insights into a model's patterns of failures and biases. Givena black box model, its training data, and task definition, we first calculateits task-related loss for each data point. We then extract a semanticallymeaningful representation for each training data point (such as CLIP embeddingsfrom its visual encoder) and train a lightweight diagnosis model which mapsthis semantically meaningful representation of a data point to its task loss.We show that an ensemble of such lightweight models can be used to generateinsights on the performance of the black-box model, in terms of identifying itspatterns of failures and biases.", "output": "LLM2Loss: Leveraging Language Models for Explainable Model Diagnostics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training a Large Visual Language Model (LVLM) from scratch, like GPT-4, isresource-intensive. Our paper presents a play-and-plug module for LargeLanguage Models (LLMs), namely Interactive Perception Network (IPN), aiming toachieve a LVLM by incorporating the image understanding capability into LLMs.Previous methods incorporate visual information into LLMs with a simple visualmapping network, where the image feature is projected into the embedding spaceof LLMs via a linear layer. Such mapping network projects the image featureonce yet does not consider the interaction between the image and the humaninput query. Hence, the obtained visual information with no connections withhuman intention may be inadequate for LLMs to make intention-followingresponses, which we term as static visual information. IPN addresses this issueby allowing the LLM to request the desired visual information aligned withvarious human instructions, which we term as the dynamic interaction betweenthe LLM and visual information. Specifically, IPN consists of a simple visualmapping network to provide the basic perception of an image for LLMs. It alsocontains additional modules responsible for acquiring requests from LLMs,performing request-based visual information interaction, and transmitting theresulting interacted visual information to LLMs, respectively. In this way,LLMs act to understand the human query, deliver the corresponding request tothe request-based visual information interaction module, and generate theresponse based on the interleaved multimodal information. We evaluate IPNthrough extensive experiments on multimodal question answering, reasoning, andso on, demonstrating that it significantly improves the zero-shot performanceof LVLMs on various multimodal tasks compared to previous methods.", "output": "LMEye: An Interactive Perception Network for Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large Language Models (LLMs) have significantly advanced natural languageprocessing (NLP) with their impressive language understanding and generationcapabilities. However, their performance may be suboptimal for domain-specifictasks that require specialized knowledge due to limited exposure to the relateddata. Additionally, the lack of transparency of most state-of-the-art (SOTA)LLMs, which can only be accessed via APIs, impedes further fine-tuning withdomain custom data. Moreover, providing private data to the LLMs' owner leadsto data privacy problems. To address these challenges, we propose the novelParametric Knowledge Guiding (PKG) framework, which equips LLMs with aknowledge-guiding module to access relevant knowledge without altering theLLMs' parameters. Our PKG is based on open-source \"white-box\" language models,allowing offline memory of any knowledge that LLMs require. We demonstrate thatour PKG framework can enhance the performance of \"black-box\" LLMs on a range ofdomain knowledge-intensive tasks that require factual (+7.9%), tabular(+11.9%), medical (+3.0%), and multimodal (+8.1%) knowledge.", "output": "Augmented Large Language Models with Parametric Knowledge Guiding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances in self-supervised learning and neural network scaling haveenabled the creation of large models, known as foundation models, which can beeasily adapted to a wide range of downstream tasks. The current paradigm forcomparing foundation models involves evaluating them with aggregate metrics onvarious benchmark datasets. This method of model comparison is heavilydependent on the chosen evaluation metric, which makes it unsuitable forsituations where the ideal metric is either not obvious or unavailable. In thiswork, we present a methodology for directly comparing the embedding spacegeometry of foundation models, which facilitates model comparison without theneed for an explicit evaluation metric. Our methodology is grounded in randomgraph theory and enables valid hypothesis testing of embedding similarity on aper-datum basis. Further, we demonstrate how our methodology can be extended tofacilitate population level model comparison. In particular, we show how ourframework can induce a manifold of models equipped with a distance functionthat correlates strongly with several downstream metrics. We remark on theutility of this population level model comparison as a first step towards ataxonomic science of foundation models.", "output": "Comparing Foundation Models using Data Kernels."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Zero-shot sketch-based image retrieval (ZS-SBIR) is challenging due to thecross-domain nature of sketches and photos, as well as the semantic gap betweenseen and unseen image distributions. Previous methods fine-tune pre-trainedmodels with various side information and learning strategies to learn a compactfeature space that is shared between the sketch and photo domains and bridgesseen and unseen classes. However, these efforts are inadequate in adaptingdomains and transferring knowledge from seen to unseen classes. In this paper,we present an effective ``Adapt and Align'' approach to address the keychallenges. Specifically, we insert simple and lightweight domain adapters tolearn new abstract concepts of the sketch domain and improve cross-domainrepresentation capabilities. Inspired by recent advances in image-textfoundation models (e.g., CLIP) on zero-shot scenarios, we explicitly align thelearned image embedding with a more semantic text embedding to achieve thedesired knowledge transfer from seen to unseen classes. Extensive experimentson three benchmark datasets and two popular backbones demonstrate thesuperiority of our method in terms of retrieval accuracy and flexibility.", "output": "Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In everyday life, humans often plan their actions by following step-by-stepinstructions in the form of goal-oriented scripts. Previous work has exploitedlanguage models (LMs) to plan for abstract goals of stereotypical activities(e.g., \"make a cake\"), but leaves more specific goals with multi-facetconstraints understudied (e.g., \"make a cake for diabetics\"). In this paper, wedefine the task of constrained language planning for the first time. We proposean overgenerate-then-filter approach to improve large language models (LLMs) onthis task, and use it to distill a novel constrained language planning dataset,CoScript, which consists of 55,000 scripts. Empirical results demonstrate thatour method significantly improves the constrained language planning ability ofLLMs, especially on constraint faithfulness. Furthermore, CoScript isdemonstrated to be quite effective in endowing smaller LMs with constrainedlanguage planning ability.", "output": "Distilling Script Knowledge from Large Language Models for Constrained Language Planning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent release of large language model (LLM) based chatbots, such asChatGPT, has attracted significant attention on foundation models. It is widelybelieved that foundation models will serve as the fundamental building blocksfor future AI systems. As foundation models are in their early stages, thedesign of foundation model based systems has not yet been systematicallyexplored. There is little understanding about the impact of introducingfoundation models in software architecture. Therefore, in this paper, wepropose a taxonomy of foundation model based systems, which classifies andcompares the characteristics of foundation models and design options offoundation model based systems. Our taxonomy comprises three categories:foundation model pretraining and fine-tuning, architecture design of foundationmodel based systems, and responsible-AI-by-design. This taxonomy providesconcrete guidance for making major design decisions when designing foundationmodel based systems and highlights trade-offs arising from design decisions.", "output": "A Framework for Designing Foundation Model based Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We show the convergence of Wasserstein inverse reinforcement learning formulti-objective optimizations with the projective subgradient method byformulating an inverse problem of the multi-objective optimization problem. Inaddition, we prove convergence of inverse reinforcement learning (maximumentropy inverse reinforcement learning, guided cost learning) with gradientdescent and the projective subgradient method.", "output": "A proof of convergence of inverse reinforcement learning for multi-objective optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Hawkes processes are a popular framework to model the occurrence ofsequential events, i.e., occurrence dynamics, in several fields such as socialdiffusion. In real-world scenarios, the inter-arrival time among events isirregular. However, existing neural network-based Hawkes process models notonly i) fail to capture such complicated irregular dynamics, but also ii)resort to heuristics to calculate the log-likelihood of events since they aremostly based on neural networks designed for regular discrete inputs. To thisend, we present the concept of Hawkes process based on controlled differentialequations (HP-CDE), by adopting the neural controlled differential equation(neural CDE) technology which is an analogue to continuous RNNs. Since HP-CDEcontinuously reads data, i) irregular time-series datasets can be properlytreated preserving their uneven temporal spaces, and ii) the log-likelihood canbe exactly computed. Moreover, as both Hawkes processes and neural CDEs arefirst developed to model complicated human behavioral dynamics, neuralCDE-based Hawkes processes are successful in modeling such occurrence dynamics.In our experiments with 4 real-world datasets, our method outperforms existingmethods by non-trivial margins.", "output": "Hawkes Process Based on Controlled Differential Equations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Causal reasoning ability is crucial for numerous NLP applications. Despitethe impressive emerging ability of ChatGPT in various NLP tasks, it is unclearhow well ChatGPT performs in causal reasoning. In this paper, we conduct thefirst comprehensive evaluation of the ChatGPT's causal reasoning capabilities.Experiments show that ChatGPT is not a good causal reasoner, but a good causalinterpreter. Besides, ChatGPT has a serious hallucination on causal reasoning,possibly due to the reporting biases between causal and non-causalrelationships in natural language, as well as ChatGPT's upgrading processes,such as RLHF. The In-Context Learning (ICL) and Chain-of-Though (COT)techniques can further exacerbate such causal hallucination. Additionally, thecausal reasoning ability of ChatGPT is sensitive to the words used to expressthe causal concept in prompts, and close-ended prompts perform better thanopen-ended prompts. For events in sentences, ChatGPT excels at capturingexplicit causality rather than implicit causality, and performs better insentences with lower event density and smaller lexical distance between events.", "output": "Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data-driven industrial health prognostics require rich training data todevelop accurate and reliable predictive models. However, stringent dataprivacy laws and the abundance of edge industrial data necessitatedecentralized data utilization. Thus, the industrial health prognostics fieldis well suited to significantly benefit from federated learning (FL), adecentralized and privacy-preserving learning technique. However, FL-basedhealth prognostics tasks have hardly been investigated due to the complexitiesof meaningfully aggregating model parameters trained from heterogeneous data toform a high performing federated model. Specifically, data heterogeneity amongedge devices, stemming from dissimilar degradation mechanisms and unequaldataset sizes, poses a critical statistical challenge for developing accuratefederated models. We propose a pioneering FL-based health prognostic model witha feature similarity-matched parameter aggregation algorithm todiscriminatingly learn from heterogeneous edge data. The algorithm searchesacross the heterogeneous locally trained models and matches neurons withprobabilistically similar feature extraction functions first, beforeselectively averaging them to form the federated model parameters. As thealgorithm only averages similar neurons, as opposed to conventional naiveaveraging of coordinate-wise neurons, the distinct feature extractors of localmodels are carried over with less dilution to the resultant federated model.Using both cyclic degradation data of Li-ion batteries and non-cyclic data ofturbofan engines, we demonstrate that the proposed method yields accuracyimprovements as high as 44.5% and 39.3% for state-of-health estimation andremaining useful life estimation, respectively.", "output": "A Federated Learning-based Industrial Health Prognostics for Heterogeneous Edge Devices using Matched Feature Extraction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Message passing neural networks (MPNNs) have been shown to suffer from thephenomenon of over-squashing that causes poor performance for tasks relying onlong-range interactions. This can be largely attributed to message passing onlyoccurring locally, over a node's immediate neighbours. Rewiring approachesattempting to make graphs 'more connected', and supposedly better suited tolong-range tasks, often lose the inductive bias provided by distance on thegraph since they make distant nodes communicate instantly at every layer. Inthis paper we propose a framework, applicable to any MPNN architecture, thatperforms a layer-dependent rewiring to ensure gradual densification of thegraph. We also propose a delay mechanism that permits skip connections betweennodes depending on the layer and their mutual distance. We validate ourapproach on several long-range tasks and show that it outperforms graphTransformers and multi-hop MPNNs.", "output": "DRew: Dynamically Rewired Message Passing with Delay."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There have been growing concerns regarding the out-of-domain generalizationability of natural language processing (NLP) models, particularly inquestion-answering (QA) tasks. Current synthesized data augmentation methodsfor QA are hampered by increased training costs. To address this issue, wepropose a novel approach that combines prompting methods and linear probingthen fine-tuning strategy, which does not entail additional cost. Our methodhas been theoretically and empirically shown to be effective in enhancing thegeneralization ability of both generative and discriminative models. Ourapproach outperforms state-of-the-art baselines, with an average increase in F1score of 4.5%-7.9%. Furthermore, our method can be easily integrated into anypre-trained models and offers a promising solution to the under-exploredcross-domain QA task. We release our source code at GitHub*.", "output": "Learning to Generalize for Cross-domain QA."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The increasing size of language models raises great research interests inparameter-efficient fine-tuning such as LoRA that freezes the pre-trainedmodel, and injects small-scale trainable parameters for multiple downstreamtasks (e.g., summarization, question answering and translation). To furtherenhance the efficiency of fine-tuning, we propose a framework that integratesLoRA and structured layer pruning. The integrated framework is validated on twocreated deidentified medical report summarization datasets based onMIMIC-IV-Note and two public medical dialogue datasets. By tuning 0.6%parameters of the original model and pruning over 30% Transformer-layers, ourframework can reduce 50% of GPU memory usage and speed up 100% of the trainingphase, while preserving over 92% generation qualities on free-textsequence-to-sequence tasks.", "output": "Parameter-Efficient Fine-Tuning with Layer Pruning on Free-Text Sequence-to-Sequence modeling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Online surgical phase recognition plays a significant role towards buildingcontextual tools that could quantify performance and oversee the execution ofsurgical workflows. Current approaches are limited since they train spatialfeature extractors using frame-level supervision that could lead to incorrectpredictions due to similar frames appearing at different phases, and poorlyfuse local and global features due to computational constraints which canaffect the analysis of long videos commonly encountered in surgicalinterventions. In this paper, we present a two-stage method, called Long VideoTransformer (LoViT) for fusing short- and long-term temporal information thatcombines a temporally-rich spatial feature extractor and a multi-scale temporalaggregator consisting of two cascaded L-Trans modules based on self-attention,followed by a G-Informer module based on ProbSparse self-attention forprocessing global temporal information. The multi-scale temporal head thencombines local and global features and classifies surgical phases using phasetransition-aware supervision. Our approach outperforms state-of-the-art methodson the Cholec80 and AutoLaparo datasets consistently. Compared to Trans-SVNet,LoViT achieves a 2.39 pp (percentage point) improvement in video-level accuracyon Cholec80 and a 3.14 pp improvement on AutoLaparo. Moreover, it achieves a5.25 pp improvement in phase-level Jaccard on AutoLaparo and a 1.55 ppimprovement on Cholec80. Our results demonstrate the effectiveness of ourapproach in achieving state-of-the-art performance of surgical phaserecognition on two datasets of different surgical procedures and temporalsequencing characteristics whilst introducing mechanisms that cope with longvideos.", "output": "LoViT: Long Video Transformer for Surgical Phase Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Protecting user privacy is a major concern for many machine learning systemsthat are deployed at scale and collect from a diverse set of population. Oneway to address this concern is by collecting and releasing data labels in anaggregated manner so that the information about a single user is potentiallycombined with others. In this paper, we explore the possibility of trainingmachine learning models with aggregated data labels, rather than individuallabels. Specifically, we consider two natural aggregation procedures suggestedby practitioners: curated bags where the data points are grouped based oncommon features and random bags where the data points are grouped randomly inbag of similar sizes. For the curated bag setting and for a broad range of lossfunctions, we show that we can perform gradient-based learning without anydegradation in performance that may result from aggregating data. Our method isbased on the observation that the sum of the gradients of the loss function onindividual data examples in a curated bag can be computed from the aggregatelabel without the need for individual labels. For the random bag setting, weprovide a generalization risk bound based on the Rademacher complexity of thehypothesis class and show how empirical risk minimization can be regularized toachieve the smallest risk bound. In fact, in the random bag setting, there is atrade-off between size of the bag and the achievable error rate as our boundindicates. Finally, we conduct a careful empirical study to confirm ourtheoretical findings. In particular, our results suggest that aggregatelearning can be an effective method for preserving user privacy whilemaintaining model accuracy.", "output": "Learning from Aggregated Data: Curated Bags versus Random Bags."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances in machine translation (MT) have shown that Minimum BayesRisk (MBR) decoding can be a powerful alternative to beam search decoding,especially when combined with neural-based utility functions. However, theperformance of MBR decoding depends heavily on how and how many candidates aresampled from the model. In this paper, we explore how different samplingapproaches for generating candidate lists for MBR decoding affect performance.We evaluate popular sampling approaches, such as ancestral, nucleus, and top-ksampling. Based on our insights into their limitations, we experiment with therecently proposed epsilon-sampling approach, which prunes away all tokens witha probability smaller than epsilon, ensuring that each token in a samplereceives a fair probability mass. Through extensive human evaluations, wedemonstrate that MBR decoding based on epsilon-sampling significantlyoutperforms not only beam search decoding, but also MBR decoding with all othertested sampling methods across four language pairs.", "output": "Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum Bayes Risk Decoding for Machine Translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transgender and non-binary (TGNB) individuals disproportionately experiencediscrimination and exclusion from daily life. Given the recent popularity andadoption of language generation technologies, the potential to furthermarginalize this population only grows. Although a multitude of NLP fairnessliterature focuses on illuminating and addressing gender biases, assessinggender harms for TGNB identities requires understanding how such identitiesuniquely interact with societal gender norms and how they differ from genderbinary-centric perspectives. Such measurement frameworks inherently requirecentering TGNB voices to help guide the alignment between gender-inclusive NLPand whom they are intended to serve. Towards this goal, we ground our work inthe TGNB community and existing interdisciplinary literature to assess how thesocial reality surrounding experienced marginalization by TGNB personscontributes to and persists within Open Language Generation (OLG). By firstunderstanding their marginalization stressors, we evaluate (1) misgendering and(2) harmful responses to gender disclosure. To do this, we introduce the TANGOdataset, comprising of template-based text curated from real-world text withina TGNB-oriented community. We discover a dominance of binary gender normswithin the models; LLMs least misgendered subjects in generated text whentriggered by prompts whose subjects used binary pronouns. Meanwhile,misgendering was most prevalent when triggering generation with singular theyand neopronouns. When prompted with gender disclosures, LLM text containedstigmatizing language and scored most toxic when triggered by TGNB genderdisclosure. Our findings warrant further research on how TGNB harms manifest inLLMs and serve as a broader case study toward concretely grounding the designof gender-inclusive AI in community voices and interdisciplinary literature.", "output": "\"I'm fully who I am\": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We prove Wasserstein inverse reinforcement learning enables the learner'sreward values to imitate the expert's reward values in a finite iteration formulti-objective optimizations. Moreover, we prove Wasserstein inversereinforcement learning enables the learner's optimal solutions to imitate theexpert's optimal solutions for multi-objective optimizations with lexicographicorder.", "output": "A proof of imitation of Wasserstein inverse reinforcement learning for multi-objective optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent popularity of large language models (LLMs) has brought asignificant impact to boundless fields, particularly through their open-endedecosystem such as the APIs, open-sourced models, and plugins. However, withtheir widespread deployment, there is a general lack of research thatthoroughly discusses and analyzes the potential risks concealed. In that case,we intend to conduct a preliminary but pioneering study covering therobustness, consistency, and credibility of LLMs systems. With most of therelated literature in the era of LLM uncharted, we propose an automatedworkflow that copes with an upscaled number of queries/responses. Overall, weconduct over a million queries to the mainstream LLMs including ChatGPT, LLaMA,and OPT. Core to our workflow consists of a data primitive, followed by anautomated interpreter that evaluates these LLMs under different adversarialmetrical systems. As a result, we draw several, and perhaps unfortunate,conclusions that are quite uncommon from this trendy community. Briefly, theyare: (i)-the minor but inevitable error occurrence in the user-generated queryinput may, by chance, cause the LLM to respond unexpectedly; (ii)-LLMs possesspoor consistency when processing semantically similar query input. In addition,as a side finding, we find that ChatGPT is still capable to yield the correctanswer even when the input is polluted at an extreme level. While thisphenomenon demonstrates the powerful memorization of the LLMs, it raisesserious concerns about using such data for LLM-involved evaluation in academicdevelopment. To deal with it, we propose a novel index associated with adataset that roughly decides the feasibility of using such data forLLM-involved evaluation. Extensive empirical studies are tagged to support theaforementioned claims.", "output": "Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Revolutionary advancements in Large Language Models have drastically reshapedour interactions with artificial intelligence systems. Despite this, a notablehindrance remains-the deficiency of a long-term memory mechanism within thesemodels. This shortfall becomes increasingly evident in situations demandingsustained interaction, such as personal companion systems and psychologicalcounseling. Therefore, we propose MemoryBank, a novel memory mechanism tailoredfor LLMs. MemoryBank enables the models to summon relevant memories,continually evolve through continuous memory updates, comprehend, and adapt toa user personality by synthesizing information from past interactions. To mimicanthropomorphic behaviors and selectively preserve memory, MemoryBankincorporates a memory updating mechanism, inspired by the Ebbinghaus ForgettingCurve theory, which permits the AI to forget and reinforce memory based on timeelapsed and the relative significance of the memory, thereby offering ahuman-like memory mechanism. MemoryBank is versatile in accommodating bothclosed-source models like ChatGPT and open-source models like ChatGLM. Weexemplify application of MemoryBank through the creation of an LLM-basedchatbot named SiliconFriend in a long-term AI Companion scenario. Further tunedwith psychological dialogs, SiliconFriend displays heightened empathy in itsinteractions. Experiment involves both qualitative analysis with real-worlduser dialogs and quantitative analysis with simulated dialogs. In the latter,ChatGPT acts as users with diverse characteristics and generates long-termdialog contexts covering a wide array of topics. The results of our analysisreveal that SiliconFriend, equipped with MemoryBank, exhibits a strongcapability for long-term companionship as it can provide emphatic response,recall relevant memories and understand user personality.", "output": "MemoryBank: Enhancing Large Language Models with Long-Term Memory."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Word embeddings, i.e., semantically meaningful vector representation ofwords, are largely influenced by the distributional hypothesis \"You shall knowa word by the company it keeps\" (Harris, 1954), whereas modern prediction-basedneural network embeddings rely on design choices and hyperparameteroptimization. Word embeddings like Word2Vec, GloVe etc. well capture thecontextuality and real-world analogies but contemporary convolution-based imageembeddings such as VGGNet, AlexNet, etc. do not capture contextual knowledge.The popular king-queen analogy does not hold true for most commonly used visionembeddings.In this paper, we introduce a pre-trained joint embedding (JE), namedIMAGINATOR, trained on 21K distinct image objects level from 1M image+textpairs. JE is a way to encode multimodal data into a vector space where the textmodality serves as the ground-ing key, which the complementary modality (inthis case, the image) is anchored with. IMAGINATOR encapsulates threeindividual representations: (i) object-object co-location, (ii) word-objectco-location, and (iii) word-object correlation. These three ways capturecomplementary aspects of the two modalities which are further combined toobtain the final JEs.Generated JEs are intrinsically evaluated to assess how well they capture thecontextuality and real-world analogies. We also evaluate pre-trained IMAGINATORJEs on three downstream tasks: (i) image captioning, (ii) Image2Tweet, and(iii) text-based image retrieval. IMAGINATOR establishes a new standard on theaforementioned down-stream tasks by outperforming the current SoTA on all theselected tasks. IMAGINATOR will be made publicly available. The codes areavailable at ", "output": "IMAGINATOR: Pre-Trained Image+Text Joint Embeddings using Word-Level Grounding of Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sampling-based path planning algorithms play an important role in autonomousrobotics. However, a common problem among the RRT-based algorithms is that theinitial path generated is not optimal and the convergence is too slow to beused in real-world applications. In this paper, we propose a novel image-basedlearning algorithm (CBAGAN-RRT) using a Convolutional Block AttentionGenerative Adversarial Network with a combination of spatial and channelattention and a novel loss function to design the heuristics, find a betteroptimal path, and improve the convergence of the algorithm both concerning timeand speed. The probability distribution of the paths generated from our GANmodel is used to guide the sampling process for the RRT algorithm. We train andtest our network on the dataset generated by cite{zhang2021generative} anddemonstrate that our algorithm outperforms the previous state-of-the-artalgorithms using both the image quality generation metrics like IOU Score, DiceScore, FID score, and path planning metrics like time cost and the number ofnodes. We conduct detailed experiments and ablation studies to illustrate thefeasibility of our study and show that our model performs well not only on thetraining dataset but also on the unseen test dataset. The advantage of ourapproach is that we can avoid the complicated preprocessing in the state space,our model can be generalized to complicated environments like those containingturns and narrow passages without loss of accuracy, and our model can be easilyintegrated with other sampling-based path planning algorithms.", "output": "CBAGAN-RRT: Convolutional Block Attention Generative Adversarial Network for Sampling-Based Path Planning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cardiac diseases are one of the leading mortality factors in modern,industrialized societies, which cause high expenses in public health systems.Due to high costs, developing analytical methods to improve cardiac diagnosticsis essential. The heart's electric activity was first modeled using a set ofnonlinear differential equations. Following this, variations of cardiac spectraoriginating from deterministic dynamics are investigated. Analyzing a normalhuman heart's power spectra offers His-Purkinje network, which possesses afractal-like structure. Phase space trajectories are extracted from the timeseries electrocardiogram (ECG) graph with third-order derivate Taylor Series.Here in this study, phase space analysis and Convolutional Neural Networks(CNNs) method are applied to 44 records via the MIT-BIH database recorded withMLII. In order to increase accuracy, a straight line is drawn between thehighest Q-R distance in the phase space images of the records. Binary CNNclassification is used to determine healthy or unhealthy hearts. With a 90.90%accuracy rate, this model could classify records according to their heartstatus.", "output": "Understanding of Normal and Abnormal Hearts by Phase Space Analysis and Convolutional Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Compression for machines is an emerging field, where inputs are encoded whileoptimizing the performance of downstream automated analysis. In scalable codingfor humans and machines, the compressed representation used for machines isfurther utilized to enable input reconstruction. Often performed by jointlyoptimizing the compression scheme for both machine task and human perception,this results in sub-optimal rate-distortion (RD) performance for the machineside. We focus on the case of images, proposing to utilize the pre-existingresidual coding capabilities of video codecs such as VVC to create a scalablecodec from any image compression for machines (ICM) scheme. Using our approachwe improve an existing scalable codec to achieve superior RD performance on themachine task, while remaining competitive for human perception. Moreover, ourapproach can be trained post-hoc for any given ICM scheme, and without creatinga coupling between the quality of the machine analysis and human vision.", "output": "VVC+M: Plug and Play Scalable Image Coding for Humans and Machines."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While current talking head models are capable of generating photorealistictalking head videos, they provide limited pose controllability. Most methodsrequire specific video sequences that should exactly contain the head posedesired, being far from user-friendly pose control. Three-dimensional morphablemodels (3DMM) offer semantic pose control, but they fail to capture certainexpressions. We present a novel method that utilizes parametric control of headorientation and facial expression over a pre-trained neural-talking head model.To enable this, we introduce a landmark-parameter morphable model (LPMM), whichoffers control over the facial landmark domain through a set of semanticparameters. Using LPMM, it is possible to adjust specific head pose factors,without distorting other facial attributes. The results show our approachprovides intuitive rig-like control over neural talking head models, allowingboth parameter and image-based inputs.", "output": "LPMM: Intuitive Pose Control for Neural Talking-Head Model via Landmark-Parameter Morphable Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The advancement of Deep Learning (DL) is driven by efficient Deep NeuralNetwork (DNN) design and new hardware accelerators. Current DNN design isprimarily tailored for general-purpose use and deployment on commerciallyviable platforms. Inference at the edge requires low latency, compact andpower-efficient models, and must be cost-effective. Digital processors based ontypical von Neumann architectures are not conducive to edge AI given the largeamounts of required data movement in and out of memory. Conversely,analog/mixed signal in-memory computing hardware accelerators can easilytranscend the memory wall of von Neuman architectures when acceleratinginference workloads. They offer increased area and power efficiency, which areparamount in edge resource-constrained environments. In this paper, we proposeAnalogNAS, a framework for automated DNN design targeting deployment on analogIn-Memory Computing (IMC) inference accelerators. We conduct extensive hardwaresimulations to demonstrate the performance of AnalogNAS on State-Of-The-Art(SOTA) models in terms of accuracy and deployment efficiency on various TinyMachine Learning (TinyML) tasks. We also present experimental results that showAnalogNAS models achieving higher accuracy than SOTA models when implemented ona 64-core IMC chip based on Phase Change Memory (PCM). The AnalogNAS searchcode is released: ", "output": "AnalogNAS: A Neural Network Design Framework for Accurate Inference with Analog In-Memory Computing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automatic generation of fonts can be an important aid to typeface design.Many current approaches regard glyphs as pixelated images, which presentartifacts when scaling and inevitable quality losses after vectorization. Onthe other hand, existing vector font synthesis methods either fail to representthe shape concisely or require vector supervision during training. To push thequality of vector font synthesis to the next level, we propose a noveldual-part representation for vector glyphs, where each glyph is modeled as acollection of closed \"positive\" and \"negative\" path pairs. The glyph contour isthen obtained by boolean operations on these paths. We first learn such arepresentation only from glyph images and devise a subsequent contourrefinement step to align the contour with an image representation to furtherenhance details. Our method, named DualVector, outperforms state-of-the-artmethods in vector font synthesis both quantitatively and qualitatively. Oursynthesized vector fonts can be easily converted to common digital font formatslike TrueType Font for practical use. The code is released at", "output": "DualVector: Unsupervised Vector Font Synthesis with Dual-Part Representation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Estimating the 3DoF rotation from a single RGB image is an important yetchallenging problem. As a popular approach, probabilistic rotation modelingadditionally carries prediction uncertainty information, compared tosingle-prediction rotation regression. For modeling probabilistic distributionover SO(3), it is natural to use Gaussian-like Bingham distribution and matrixFisher, however they are shown to be sensitive to outlier predictions, e.g.$180^circ$ error and thus are unlikely to converge with optimal performance.In this paper, we draw inspiration from multivariate Laplace distribution andpropose a novel rotation Laplace distribution on SO(3). Our rotation Laplacedistribution is robust to the disturbance of outliers and enforces muchgradient to the low-error region that it can improve. In addition, we show thatour method also exhibits robustness to small noises and thus toleratesimperfect annotations. With this benefit, we demonstrate its advantages insemi-supervised rotation regression, where the pseudo labels are noisy. Tofurther capture the multi-modal rotation solution space for symmetric objects,we extend our distribution to rotation Laplace mixture model and demonstrateits effectiveness. Our extensive experiments show that our proposeddistribution and the mixture model achieve state-of-the-art performance in allthe rotation regression experiments over both probabilistic andnon-probabilistic baselines.", "output": "Towards Robust Probabilistic Modeling on SO(3) via Rotation Laplace Distribution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-sensor clues have shown promise for object segmentation, but inherentnoise in each sensor, as well as the calibration error in practice, may biasthe segmentation accuracy. In this paper, we propose a novel approach by miningthe Cross-Modal Semantics to guide the fusion and decoding of multimodalfeatures, with the aim of controlling the modal contribution based on relativeentropy. We explore semantics among the multimodal inputs in two aspects: themodality-shared consistency and the modality-specific variation. Specifically,we propose a novel network, termed XMSNet, consisting of (1) all-roundattentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layerself-supervision. On the one hand, the AF block explicitly dissociates theshared and specific representation and learns to weight the modal contributionby adjusting the proportion, region, and pattern, depending upon the quality.On the other hand, our CFD initially decodes the shared feature and thenrefines the output through specificity-aware querying. Further, we enforcesemantic consistency across the decoding layers to enable interaction acrossnetwork hierarchies, improving feature discriminability. Exhaustive comparisonon eleven datasets with depth or thermal clues, and on two challenging tasks,namely salient and camouflage object segmentation, validate our effectivenessin terms of both performance and robustness.", "output": "Object Segmentation by Mining Cross-Modal Semantics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite tremendous progress in generating high-quality images using diffusionmodels, synthesizing a sequence of animated frames that are both photorealisticand temporally coherent is still in its infancy. While off-the-shelfbillion-scale datasets for image generation are available, collecting similarvideo data of the same scale is still challenging. Also, training a videodiffusion model is computationally much more expensive than its imagecounterpart. In this work, we explore finetuning a pretrained image diffusionmodel with video data as a practical solution for the video synthesis task. Wefind that naively extending the image noise prior to video noise prior in videodiffusion leads to sub-optimal performance. Our carefully designed video noiseprior leads to substantially better performance. Extensive experimentalvalidation shows that our model, Preserve Your Own Correlation (PYoCo), attainsSOTA zero-shot text-to-video results on the UCF-101 and MSR-VTT benchmarks. Italso achieves SOTA video generation quality on the small-scale UCF-101benchmark with a $10times$ smaller model using significantly less computationthan the prior art.", "output": "Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The emergence of Neural Radiance Fields (NeRF) for novel view synthesis hasled to increased interest in 3D scene editing. One important task in editing isremoving objects from a scene while ensuring visual reasonability and multiviewconsistency. However, current methods face challenges such as time-consumingobject labelling, limited capability to remove specific targets, andcompromised rendering quality after removal. This paper proposes a novelobject-removing pipeline, named OR-NeRF, that can remove objects from 3D sceneswith either point or text prompts on a single view, achieving betterperformance in less time than previous works. Our method uses a pointsprojection strategy to rapidly spread user annotations to all views,significantly reducing the processing burden. This algorithm allows us toleverage the recent 2D segmentation model Segment-Anything (SAM) to predictmasks with improved precision and efficiency. Additionally, we obtain colourand depth priors through 2D inpainting methods. Finally, our algorithm employsdepth supervision and perceptual loss for scene reconstruction to maintainconsistency in geometry and appearance after object removal. Experimentalresults demonstrate that our method achieves better editing quality with lesstime than previous works, considering both quality and quantity.", "output": "OR-NeRF: Object Removing from 3D Scenes Guided by Multiview Segmentation with Neural Radiance Fields."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The large-scale deployment of autonomous vehicles is yet to come, and one ofthe major remaining challenges lies in urban dense traffic scenarios. In suchcases, it remains challenging to predict the future evolution of the scene andfuture behaviors of objects, and to deal with rare adverse events such as thesudden appearance of occluded objects. In this paper, we present ReasonNet, anovel end-to-end driving framework that extensively exploits both temporal andglobal information of the driving scene. By reasoning on the temporal behaviorof objects, our method can effectively process the interactions andrelationships among features in different frames. Reasoning about the globalinformation of the scene can also improve overall perception performance andbenefit the detection of adverse events, especially the anticipation ofpotential danger from occluded objects. For comprehensive evaluation onocclusion events, we also release publicly a driving simulation benchmarkDriveOcclusionSim consisting of diverse occlusion events. We conduct extensiveexperiments on multiple CARLA benchmarks, where our model outperforms all priormethods, ranking first on the sensor track of the public CARLA Leaderboard.", "output": "ReasonNet: End-to-End Driving with Temporal and Global Reasoning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper investigates the challenge of learning image manifolds,specifically pose manifolds, of 3D objects using limited training data. Itproposes a DNN approach to manifold learning and for predicting images ofobjects for novel, continuous 3D rotations. The approach uses two distinctconcepts: (1) Geometric Style-GAN (Geom-SGAN), which maps images tolow-dimensional latent representations and maintains the (first-order) manifoldgeometry. That is, it seeks to preserve the pairwise distances between basepoints and their tangent spaces, and (2) uses Euler's elastica to smoothlyinterpolate between directed points (points + tangent directions) in thelow-dimensional latent space. When mapped back to the larger image space, theresulting interpolations resemble videos of rotating objects. Extensiveexperiments establish the superiority of this framework in learning paths onrotation manifolds, both visually and quantitatively, relative tostate-of-the-art GANs and VAEs.", "output": "Learning Pose Image Manifolds Using Geometry-Preserving GANs and Elasticae."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There is a rapidly growing need for multimodal content moderation (CM) asmore and more content on social media is multimodal in nature. Existingunimodal CM systems may fail to catch harmful content that crosses modalities(e.g., memes or videos), which may lead to severe consequences. In this paper,we present a novel CM model, Asymmetric Mixed-Modal Moderation (AM3), to targetmultimodal and unimodal CM tasks. Specifically, to address the asymmetry insemantics between vision and language, AM3 has a novel asymmetric fusionarchitecture that is designed to not only fuse the common knowledge in bothmodalities but also to exploit the unique information in each modality. Unlikeprevious works that focus on fusing the two modalities while overlooking theintrinsic difference between the information conveyed in multimodality and inunimodality (asymmetry in modalities), we propose a novel cross-modalitycontrastive loss to learn the unique knowledge that only appears inmultimodality. This is critical as some harmful intent may only be conveyedthrough the intersection of both modalities. With extensive experiments, weshow that AM3 outperforms all existing state-of-the-art methods on bothmultimodal and unimodal CM benchmarks.", "output": "Rethinking Multimodal Content Moderation from an Asymmetric Angle with Mixed-modality."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Traditional supervised learning tasks require a label for every instance inthe training set, but in many real-world applications, labels are onlyavailable for collections (bags) of instances. This problem setting, known asmultiple instance learning (MIL), is particularly relevant in the medicaldomain, where high-resolution images are split into smaller patches, but labelsapply to the image as a whole. Recent MIL models are able to capturecorrespondences between patches by employing self-attention, allowing them toweigh each patch differently based on all other patches in the bag. However,these approaches still do not consider the relative spatial relationshipsbetween patches within the larger image, which is especially important incomputational pathology. To this end, we introduce a novel MIL model withdistance-aware self-attention (DAS-MIL), which explicitly takes into accountrelative spatial information when modelling the interactions between patches.Unlike existing relative position representations for self-attention which arediscrete, our approach introduces continuous distance-dependent terms into thecomputation of the attention weights, and is the first to apply relativeposition representations in the context of MIL. We evaluate our model on acustom MNIST-based MIL dataset that requires the consideration of relativespatial information, as well as on CAMELYON16, a publicly available cancermetastasis detection dataset, where we achieve a test AUROC score of 0.91. Onboth datasets, our model outperforms existing MIL approaches that employabsolute positional encodings, as well as existing relative positionrepresentation schemes applied to MIL. Our code is available at", "output": "Deep Multiple Instance Learning with Distance-Aware Self-Attention."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative AI models like DALL-E 2 can interpret textual prompts and generatehigh-quality images exhibiting human creativity. Though public enthusiasm isbooming, systematic auditing of potential gender biases in AI-generated imagesremains scarce. We addressed this gap by examining the prevalence of twooccupational gender biases (representational and presentational biases) in15,300 DALL-E 2 images spanning 153 occupations, and assessed potential biasamplification by benchmarking against 2021 census labor statistics and GoogleImages. Our findings reveal that DALL-E 2 underrepresents women inmale-dominated fields while overrepresenting them in female-dominatedoccupations. Additionally, DALL-E 2 images tend to depict more women than menwith smiling faces and downward-pitching heads, particularly infemale-dominated (vs. male-dominated) occupations. Our computational algorithmauditing study demonstrates more pronounced representational and presentationalbiases in DALL-E 2 compared to Google Images and calls for feministinterventions to prevent such bias-laden AI-generated images to feedback intothe media ecology.", "output": "Smiling Women Pitching Down: Auditing Representational and Presentational Gender Biases in Image Generative AI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "NeRF is a popular model that efficiently represents 3D objects from 2Dimages. However, vanilla NeRF has a few important limitations. NeRF must betrained on each object separately. The training time is long since we encodethe object's shape and color in neural network weights. Moreover, NeRF does notgeneralize well to unseen data. In this paper, we present MultiPlaneNeRF -- afirst model that simultaneously solves all the above problems. Our model worksdirectly on 2D images. We project 3D points on 2D images to producenon-trainable representations. The projection step is not parametrized, and avery shallow decoder can efficiently process the representation. Using existingimages as part of NeRF can significantly reduce the number of parameters sincewe train only a small implicit decoder. Furthermore, we can trainMultiPlaneNeRF on a large data set and force our implicit decoder to generalizeacross many objects. Consequently, we can only replace the 2D images (withoutadditional training) to produce a NeRF representation of the new object. In theexperimental section, we demonstrate that MultiPlaneNeRF achieves comparableresults to state-of-the-art models for synthesizing new views and hasgeneralization properties.", "output": "MultiPlaneNeRF: Neural Radiance Field with Non-Trainable Representation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a software that predicts non-cleft facial images for patients withcleft lip, thereby facilitating the understanding, awareness and discussion ofcleft lip surgeries. To protect patients privacy, we design a softwareframework using image inpainting, which does not require cleft lip images fortraining, thereby mitigating the risk of model leakage. We implement a novelmulti-task architecture that predicts both the non-cleft facial image andfacial landmarks, resulting in better performance as evaluated by surgeons. Thesoftware is implemented with PyTorch and is usable with consumer-level colorimages with a fast prediction speed, enabling effective deployment.", "output": "INCLG: Inpainting for Non-Cleft Lip Generation with a Multi-Task Image Processing Network."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "CNN-based face detection methods have achieved significant progress in recentyears. In addition to the strong representation ability of CNN, post-processingmethods are also very important for the performance of face detection. Ingeneral, the face detection method predicts several candidate bounding-boxesfor one face. NMS is used to filter out inaccurate candidate boxes to get themost accurate box. The principle of NMS is to select the box with a higherscore as the basic box and then delete the box which has a large overlappingarea with the basic box but has a lower score. However, the current NMS methodand its improved versions do not perform well when face image quality is pooror faces are in a cluster. In these situations, even after NMS filtering, thereis often a face corresponding to multiple predicted boxes. To reduce this kindof negative result, in this paper, we propose a new NMS method that operates inthe reverse order of other NMS methods. Our method performs well on low-qualityand tiny face samples. Experiments demonstrate that our method is effective asa post-processor for different face detection methods.", "output": "Inverted Non-maximum Suppression for more Accurate and Neater Face Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "LIDAR and RADAR are two commonly used sensors in autonomous driving systems.The extrinsic calibration between the two is crucial for effective sensorfusion. The challenge arises due to the low accuracy and sparse information inRADAR measurements. This paper presents a novel solution for 3D RADAR-LIDARcalibration in autonomous systems. The method employs simple targets togenerate data, including correspondence registration and a one-stepoptimization algorithm. The optimization aims to minimize the reprojectionerror while utilizing a small multi-layer perception (MLP) to performregression on the return energy of the sensor around the targets. The proposedapproach uses a deep learning framework such as PyTorch and can be optimizedthrough gradient descent. The experiment uses a 360-degree Ouster-128 LIDAR anda 360-degree Navtech RADAR, providing raw measurements. The results validatethe effectiveness of the proposed method in achieving improved estimates ofextrinsic calibration parameters.", "output": "Improving Extrinsics between RADAR and LIDAR using Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There is a lot of research effort devoted by researcher into developingdifferent techniques for neural networks compression, yet the community seemsto lack standardised ways of evaluating and comparing between differentcompression techniques, which is key to identifying the most suitablecompression technique for different applications. In this paper we contributetowards standardisation of neural network compression by providing a review ofevaluation metrics. These metrics have been implemented into NetZIP, astandardised neural network compression bench. We showcase some of the metricsreviewed using three case studies focusing on object classification, objectdetection, and edge devices.", "output": "Evaluation Metrics for CNNs Compression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This study aimed to solve the semantic gap and misalignment issue betweenencoding and decoding because of multiple convolutional and pooling operationsin U-Net when segmenting subabdominal MRI images during rectal cancertreatment. A MRI Image Segmentation is proposed based on a multi-scale featurepyramid network and dual attention mechanism. Our innovation is the design oftwo modules: 1) a dilated convolution and multi-scale feature pyramid networkare used in the encoding to avoid the semantic gap. 2) a dual attentionmechanism is designed to maintain spatial information of U-Net and reducemisalignment. Experiments on a subabdominal MRI image dataset show the proposedmethod achieves better performance than others methods. In conclusion, amulti-scale feature pyramid network can reduce the semantic gap, and the dualattention mechanism can make an alignment of features between encoding anddecoding.", "output": "A Subabdominal MRI Image Segmentation Algorithm Based on Multi-Scale Feature Pyramid Network and Dual Attention Mechanism."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Shadow removal is to restore shadow regions to their shadow-free counterpartswhile leaving non-shadow regions unchanged. State-of-the-art shadow removalmethods train deep neural networks on collected shadow & shadow-free imagepairs, which are desired to complete two distinct tasks via shared weights,i.e., data restoration for shadow regions and identical mapping for non-shadowregions. We find that these two tasks exhibit poor compatibility, and usingshared weights for these two tasks could lead to the model being optimizedtowards only one task instead of both during the training process. Note thatsuch a key issue is not identified by existing deep learning-based shadowremoval methods. To address this problem, we propose to handle these two tasksseparately and leverage the identical mapping results to guide the shadowrestoration in an iterative manner. Specifically, our method consists of threecomponents: an identical mapping branch (IMB) for non-shadow regionsprocessing, an iterative de-shadow branch (IDB) for shadow regions restorationbased on identical results, and a smart aggregation block (SAB). The IMB aimsto reconstruct an image that is identical to the input one, which can benefitthe restoration of the non-shadow regions without explicitly distinguishingbetween shadow and non-shadow regions. Utilizing the multi-scale featuresextracted by the IMB, the IDB can effectively transfer information fromnon-shadow regions to shadow regions progressively, facilitating the process ofshadow removal. The SAB is designed to adaptive integrate features from bothIMB and IDB. Moreover, it generates a finely tuned soft shadow mask that guidesthe process of removing shadows. Extensive experiments demonstrate our methodoutperforms all the state-of-the-art shadow removal approaches on the widelyused shadow removal datasets.", "output": "Learning Restoration is Not Enough: Transfering Identical Mapping for Single-Image Shadow Removal."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks have consistently shown great performance in severalreal-world use cases like autonomous vehicles, satellite imaging, etc.,effectively leveraging large corpora of labeled training data. However,learning unbiased models depends on building a dataset that is representativeof a diverse range of realistic scenarios for a given task. This is challengingin many settings where data comes from high-volume streams, with each scenariooccurring in random interleaved episodes at varying frequencies. We studyrealistic streaming settings where data instances arrive in and are sampledfrom an episodic multi-distributional data stream. Using submodular informationmeasures, we propose STREAMLINE, a novel streaming active learning frameworkthat mitigates scenario-driven slice imbalance in the working labeled data viaa three-step procedure of slice identification, slice-aware budgeting, and dataselection. We extensively evaluate STREAMLINE on real-world streaming scenariosfor image classification and object detection tasks. We observe that STREAMLINEimproves the performance on infrequent yet critical slices of the data overcurrent baselines by up to $5%$ in terms of accuracy on our imageclassification tasks and by up to $8%$ in terms of mAP on our object detectiontasks.", "output": "STREAMLINE: Streaming Active Learning for Realistic Multi-Distributional Settings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is not uncommon that real-world data are distributed with a long tail. Forsuch data, the learning of deep neural networks becomes challenging because itis hard to classify tail classes correctly. In the literature, several existingmethods have addressed this problem by reducing classifier bias provided thatthe features obtained with long-tailed data are representative enough. However,we find that training directly on long-tailed data leads to uneven embeddingspace. That is, the embedding space of head classes severely compresses that oftail classes, which is not conducive to subsequent classifier learning.%further improving model performance. This paper therefore studies the problemof long-tailed visual recognition from the perspective of feature level. Weintroduce feature augmentation to balance the embedding distribution. Thefeatures of different classes are perturbed with varying amplitudes in Gaussianform. Based on these perturbed features, two novel logit adjustment methods areproposed to improve model performance at a modest computational overhead.Subsequently, the distorted embedding spaces of all classes can be calibrated.In such balanced-distributed embedding spaces, the biased classifier can beeliminated by simply retraining the classifier with class-balanced samplingdata. Extensive experiments conducted on benchmark datasets demonstrate thesuperior performance of the proposed method over the state-of-the-art ones.", "output": "Adjusting Logit in Gaussian Form for Long-Tailed Visual Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automatic segmentation of medical images is a key step for diagnostic andinterventional tasks. However, achieving this requires large amounts ofannotated volumes, which can be tedious and time-consuming task for expertannotators. In this paper, we introduce DeepEdit, a deep learning-based methodfor volumetric medical image annotation, that allows automatic andsemi-automatic segmentation, and click-based refinement. DeepEdit combines thepower of two methods: a non-interactive (i.e. automatic segmentation usingnnU-Net, UNET or UNETR) and an interactive segmentation method (i.e. DeepGrow),into a single deep learning model. It allows easy integration ofuncertainty-based ranking strategies (i.e. aleatoric and epistemic uncertaintycomputation) and active learning. We propose and implement a method fortraining DeepEdit by using standard training combined with user interactionsimulation. Once trained, DeepEdit allows clinicians to quickly segment theirdatasets by using the algorithm in auto segmentation mode or by providingclicks via a user interface (i.e. 3D Slicer, OHIF). We show the value ofDeepEdit through evaluation on the PROSTATEx dataset for prostate/prostaticlesions and the Multi-Atlas Labeling Beyond the Cranial Vault (BTCV) datasetfor abdominal CT segmentation, using state-of-the-art network architectures asbaseline for comparison. DeepEdit could reduce the time and effort annotating3D medical images compared to DeepGrow alone. Source code is available at", "output": "DeepEdit: Deep Editable Learning for Interactive Segmentation of 3D Medical Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have recently dominated image synthesis and other relatedgenerative tasks. However, the iterative denoising process is expensive incomputations at inference time, making diffusion models less practical forlow-latency and scalable real-world applications. Post-training quantization ofdiffusion models can significantly reduce the model size and accelerate thesampling process without requiring any re-training. Nonetheless, applyingexisting post-training quantization methods directly to low-bit diffusionmodels can significantly impair the quality of generated samples. Specifically,for each denoising step, quantization noise leads to deviations in theestimated mean and mismatches with the predetermined variance schedule.Moreover, as the sampling process proceeds, the quantization noise mayaccumulate, resulting in a low signal-to-noise ratio (SNR) in late denoisingsteps. To address these challenges, we propose a unified formulation for thequantization noise and diffusion perturbed noise in the quantized denoisingprocess. We first disentangle the quantization noise into its correlated andresidual uncorrelated parts regarding its full-precision counterpart. Thecorrelated part can be easily corrected by estimating the correlationcoefficient. For the uncorrelated part, we calibrate the denoising varianceschedule to absorb the excess variance resulting from quantization. Moreover,we propose a mixed-precision scheme to choose the optimal bitwidth for eachdenoising step, which prefers low bits to accelerate the early denoising stepswhile high bits maintain the high SNR for the late steps. Extensive experimentsdemonstrate that our method outperforms previous post-training quantizeddiffusion models in generating high-quality samples, with only a 0.06 increasein FID score compared to full-precision LDM-4 on ImageNet 256x256, while saving19.9x bit operations.", "output": "PTQD: Accurate Post-Training Quantization for Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Weakly supervised learning based on scribble annotations in target extractionof remote sensing images has drawn much interest due to scribbles' flexibilityin denoting winding objects and low cost of manually labeling. However,scribbles are too sparse to identify object structure and detailed information,bringing great challenges in target localization and boundary description. Toalleviate these problems, in this paper, we construct two innerstructure-constraints, a deformation consistency loss and a trainable activecontour loss, together with a scribble-constraint to supervise the optimizationof the encoder-decoder network without introducing any auxiliary module orextra operation based on prior cues. Comprehensive experiments demonstrate ourmethod's superiority over five state-of-the-art algorithms in this field.Source code is available at ", "output": "Scribble-Supervised Target Extraction Method Based on Inner Structure-Constraint for Remote Sensing Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A number of deep models trained on high-quality and valuable images have beendeployed in practical applications, which may pose a leakage risk of dataprivacy. Learning differentially private generative models can sidestep thischallenge through indirect data access. However, such differentially privategenerative models learned by existing approaches can only generate images witha low-resolution of less than 128x128, hindering the widespread usage ofgenerated images in downstream training. In this work, we propose learningdifferentially private probabilistic models (DPPM) to generate high-resolutionimages with differential privacy guarantee. In particular, we first train amodel to fit the distribution of the training data and make it satisfydifferential privacy by performing a randomized response mechanism duringtraining process. Then we perform Hamiltonian dynamics sampling along with thedifferentially private movement direction predicted by the trainedprobabilistic model to obtain the privacy-preserving images. In this way, it ispossible to apply these images to different downstream tasks while protectingprivate information. Notably, compared to other state-of-the-art differentiallyprivate generative approaches, our approach can generate images up to 256x256with remarkable visual quality and data utility. Extensive experiments show theeffectiveness of our approach.", "output": "Learning Differentially Private Probabilistic Models for Privacy-Preserving Image Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unrestricted adversarial attacks typically manipulate the semantic content ofan image (e.g., color or texture) to create adversarial examples that are botheffective and photorealistic, demonstrating their ability to deceive humanperception and deep neural networks with stealth and success. However, currentworks usually sacrifice unrestricted degrees and subjectively select some imagecontent to guarantee the photorealism of unrestricted adversarial examples,which limits its attack performance. To ensure the photorealism of adversarialexamples and boost attack performance, we propose a novel unrestricted attackframework called Content-based Unrestricted Adversarial Attack. By leveraging alow-dimensional manifold that represents natural images, we map the images ontothe manifold and optimize them along its adversarial direction. Therefore,within this framework, we implement Adversarial Content Attack based on StableDiffusion and can generate high transferable unrestricted adversarial exampleswith various adversarial contents. Extensive experimentation and visualizationdemonstrate the efficacy of ACA, particularly in surpassing state-of-the-artattacks by an average of 13.3-50.4% and 16.8-48.0% in normally trained modelsand defense methods, respectively.", "output": "Content-based Unrestricted Adversarial Attack."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent times, contrastive learning based loss functions have becomeincreasingly popular for visual self-supervised representation learning owingto their state-of-the-art (SOTA) performance. Most of the modern contrastivelearning loss functions like SimCLR are Info-NCE based and generalize only toone positive and multiple negatives per anchor. A recent state-of-the-art,supervised contrastive (SupCon) loss, extends self-supervised contrastivelearning to supervised setting by generalizing to multiple positives andmultiple negatives in a batch and improves upon the cross-entropy loss. In thispaper, we propose a novel contrastive loss function - Tuned ContrastiveLearning (TCL) loss, that generalizes to multiple positives and multiplenegatives within a batch and offers parameters to tune and improve the gradientresponses from hard positives and hard negatives. We provide theoreticalanalysis of our loss function's gradient response and show mathematically howit is better than that of SupCon loss. Empirically, we compare our lossfunction with SupCon loss and cross-entropy loss in a supervised setting onmultiple classification-task datasets. We also show the stability of our lossfunction to various hyper-parameter settings. Finally, we compare TCL withvarious SOTA self-supervised learning methods and show that our loss functionachieves performance on par with SOTA methods in both supervised andself-supervised settings.", "output": "Tuned Contrastive Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Action knowledge involves the understanding of textual, visual, and temporalaspects of actions. We introduce the Action Dynamics Benchmark (ActionBench)containing two carefully designed probing tasks: Action Antonym and VideoReversal, which targets multimodal alignment capabilities and temporalunderstanding skills of the model, respectively. Despite recent video-languagemodels' (VidLM) impressive performance on various benchmark tasks, ourdiagnostic tasks reveal their surprising deficiency (near-random performance)in action knowledge, suggesting that current models rely on object recognitionabilities as a shortcut for action understanding. To remedy this, we propose anovel framework, Paxion, along with a new Discriminative Video DynamicsModeling (DVDM) objective. The Paxion framework utilizes a Knowledge Patchernetwork to encode new action knowledge and a Knowledge Fuser component tointegrate the Patcher into frozen VidLMs without compromising their existingcapabilities. Due to limitations of the widely-used Video-Text Contrastive(VTC) loss for learning action knowledge, we introduce the DVDM objective totrain the Knowledge Patcher. DVDM forces the model to encode the correlationbetween the action text and the correct ordering of video frames. Our extensiveanalyses show that Paxion and DVDM together effectively fill the gap in actionknowledge understanding (~50% to 80%), while maintaining or improvingperformance on a wide spectrum of both object- and action-centric downstreamtasks.", "output": "Paxion: Patching Action Knowledge in Video-Language Foundation Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The unauthorized use of personal data for commercial purposes and theclandestine acquisition of private data for training machine learning modelscontinue to raise concerns. In response to these issues, researchers haveproposed availability attacks that aim to render data unexploitable. However,many current attack methods are rendered ineffective by adversarial training.In this paper, we re-examine the concept of unlearnable examples and discernthat the existing robust error-minimizing noise presents an inaccurateoptimization objective. Building on these observations, we introduce a noveloptimization paradigm that yields improved protection results with reducedcomputational time requirements. We have conducted extensive experiments tosubstantiate the soundness of our approach. Moreover, our method establishes arobust foundation for future research in this area.", "output": "Re-thinking Data Availablity Attacks Against Deep Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although recent personalization methods have democratized high-resolutionimage synthesis by enabling swift concept acquisition with minimal examples andlightweight computation, they also present an exploitable avenue for highaccessible backdoor attacks. This paper investigates a critical and unexploredaspect of text-to-image (T2I) diffusion models - their potential vulnerabilityto backdoor attacks via personalization. Our study focuses on a zero-daybackdoor vulnerability prevalent in two families of personalization methods,epitomized by Textual Inversion and DreamBooth.Compared to traditional backdoorattacks, our proposed method can facilitate more precise, efficient, and easilyaccessible attacks with a lower barrier to entry. We provide a comprehensivereview of personalization in T2I diffusion models, highlighting the operationand exploitation potential of this backdoor vulnerability. To be specific, bystudying the prompt processing of Textual Inversion and DreamBooth, we havedevised dedicated backdoor attacks according to the different ways of dealingwith unseen tokens and analyzed the influence of triggers and concept images onthe attack effect. Our empirical study has shown that the nouveau-tokenbackdoor attack has better attack performance while legacy-token backdoorattack is potentially harder to defend.", "output": "Zero-Day Backdoor Attack against Text-to-Image Diffusion Models via Personalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, vision language pre-training frameworks have madesignificant progress in natural language processing and computer vision,achieving remarkable performance improvement on various downstream tasks.However, when extended to point cloud data, existing works mainly focus onbuilding task-specific models, and fail to extract universal 3D vision-languageembedding that generalize well. We carefully investigate three common tasks insemantic 3D scene understanding, and derive key insights into the developmentof a pre-training model. Motivated by these observations, we propose avision-language pre-training framework 3DVLP (3D vision-language pre-trainingwith object contrastive learning), which transfers flexibly on 3Dvision-language downstream tasks. 3DVLP takes visual grounding as the proxytask and introduces Object-level IoU-guided Detection (OID) loss to obtainhigh-quality proposals in the scene. Moreover, we design Object-levelCross-Contrastive alignment (OCC) task and Object-level Self-Contrastivelearning (OSC) task to align the objects with descriptions and distinguishdifferent objects in the scene, respectively. Extensive experiments verify theexcellent performance of 3DVLP on three 3D vision-language tasks, reflectingits superiority in semantic 3D scene understanding.", "output": "Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models, such as Stable Diffusion, have shown incredible performanceon text-to-image generation. Since text-to-image generation often requiresmodels to generate visual concepts with fine-grained details and attributesspecified in text prompts, can we leverage the powerful representations learnedby pre-trained diffusion models for discriminative tasks such as image-textmatching? To answer this question, we propose a novel approach, DiscriminativeStable Diffusion (DSD), which turns pre-trained text-to-image diffusion modelsinto few-shot discriminative learners. Our approach uses the cross-attentionscore of a Stable Diffusion model to capture the mutual influence betweenvisual and textual information and fine-tune the model via attention-basedprompt learning to perform image-text matching. By comparing DSD withstate-of-the-art methods on several benchmark datasets, we demonstrate thepotential of using pre-trained diffusion models for discriminative tasks withsuperior results on few-shot image-text matching.", "output": "Discriminative Diffusion Models as Few-shot Vision and Language Learners."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a novel framework, i.e., Segment Any Anomaly + (SAA+), forzero-shot anomaly segmentation with hybrid prompt regularization to improve theadaptability of modern foundation models. Existing anomaly segmentation modelstypically rely on domain-specific fine-tuning, limiting their generalizationacross countless anomaly patterns. In this work, inspired by the greatzero-shot generalization ability of foundation models like Segment Anything, wefirst explore their assembly to leverage diverse multi-modal prior knowledgefor anomaly localization. For non-parameter foundation model adaptation toanomaly segmentation, we further introduce hybrid prompts derived from domainexpert knowledge and target image context as regularization. Our proposed SAA+model achieves state-of-the-art performance on several anomaly segmentationbenchmarks, including VisA, MVTec-AD, MTD, and KSDD2, in the zero-shot setting.We will release the code athref{", "output": "Segment Any Anomaly without Training via Hybrid Prompt Regularization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The transformer extends its success from the language to the vision domain.Because of the stacked self-attention and cross-attention blocks, theacceleration deployment of vision transformer on GPU hardware is challengingand also rarely studied. This paper thoroughly designs a compression scheme tomaximally utilize the GPU-friendly 2:4 fine-grained structured sparsity andquantization. Specially, an original large model with dense weight parametersis first pruned into a sparse one by 2:4 structured pruning, which considersthe GPU's acceleration of 2:4 structured sparse pattern with FP16 data type,then the floating-point sparse model is further quantized into a fixed-pointone by sparse-distillation-aware quantization aware training, which considersGPU can provide an extra speedup of 2:4 sparse calculation with integertensors. A mixed-strategy knowledge distillation is used during the pruning andquantization process. The proposed compression scheme is flexible to supportsupervised and unsupervised learning styles. Experiment results show GPUSQ-ViTscheme achieves state-of-the-art compression by reducing vision transformermodels 6.4-12.7 times on model size and 30.3-62 times on FLOPs with negligibleaccuracy degradation on ImageNet classification, COCO detection and ADE20Ksegmentation benchmarking tasks. Moreover, GPUSQ-ViT can boost actualdeployment performance by 1.39-1.79 times and 3.22-3.43 times of latency andthroughput on A100 GPU, and 1.57-1.69 times and 2.11-2.51 times improvement oflatency and throughput on AGX Orin.", "output": "Boost Vision Transformer with GPU-Friendly Sparsity and Quantization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In MRI, images of the same contrast (e.g., T1) from the same subject can shownoticeable differences when acquired using different hardware, sequences, orscan parameters. These differences in images create a domain gap that needs tobe bridged by a step called image harmonization, in order to process the imagessuccessfully using conventional or deep learning-based image analysis (e.g.,segmentation). Several methods, including deep learning-based approaches, havebeen proposed to achieve image harmonization. However, they often requiredatasets of multiple characteristics for deep learning training and may stillbe unsuccessful when applied to images of an unseen domain. To address thislimitation, we propose a novel concept called \"Blind Harmonization,\" whichutilizes only target domain data for training but still has the capability ofharmonizing unseen domain images. For the implementation of BlindHarmonization, we developed BlindHarmony using an unconditional flow modeltrained on target domain data. The harmonized image is optimized to have acorrelation with the input source domain image while ensuring that the latentvector of the flow model is close to the center of the Gaussian. BlindHarmonywas evaluated using simulated and real datasets and compared with conventionalmethods. BlindHarmony achieved a noticeable performance in both datasets,highlighting its potential for future use in clinical settings.", "output": "BlindHarmony: \"Blind\" Harmonization for MR Images via Flow model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Effective connectivity can describe the causal patterns among brain regions.These patterns have the potential to reveal the pathological mechanism andpromote early diagnosis and effective drug development for cognitive disease.However, the current studies mainly focus on using empirical functional timeseries to calculate effective connections, which may not comprehensivelycapture the complex causal relationships between brain regions. In this paper,a novel Multi-resolution Spatiotemporal Enhanced Transformer Denoising (MSETD)network with an adversarially functional diffusion model is proposed to mapfunctional magnetic resonance imaging (fMRI) into effective connectivity formild cognitive impairment (MCI) analysis. To be specific, the denoisingframework leverages a conditional diffusion process that progressivelytranslates the noise and conditioning fMRI to effective connectivity in anend-to-end manner. To ensure reverse diffusion quality and diversity, themulti-resolution enhanced transformer generator is designed to extract localand global spatiotemporal features. Furthermore, a multi-scale diffusivetransformer discriminator is devised to capture the temporal patterns atdifferent scales for generation stability. Evaluations of the ADNI datasetsdemonstrate the feasibility and efficacy of the proposed model. The proposedmodel not only achieves superior prediction performance compared with othercompeting methods but also identifies MCI-related causal connections that areconsistent with clinical studies.", "output": "Multi-resolution Spatiotemporal Enhanced Transformer Denoising with Functional Diffusive GANs for Constructing Brain Effective Connectivity in MCI analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce OpenShape, a method for learning multi-modal jointrepresentations of text, image, and point clouds. We adopt the commonly usedmulti-modal contrastive learning framework for representation alignment, butwith a specific focus on scaling up 3D representations to enable open-world 3Dshape understanding. To achieve this, we scale up training data by ensemblingmultiple 3D datasets and propose several strategies to automatically filter andenrich noisy text descriptions. We also explore and compare strategies forscaling 3D backbone networks and introduce a novel hard negative mining modulefor more efficient training. We evaluate OpenShape on zero-shot 3Dclassification benchmarks and demonstrate its superior capabilities foropen-world recognition. Specifically, OpenShape achieves a zero-shot accuracyof 46.8% on the 1,156-category Objaverse-LVIS benchmark, compared to less than10% for existing methods. OpenShape also achieves an accuracy of 85.3% onModelNet40, outperforming previous zero-shot baseline methods by 20% andperforming on par with some fully-supervised methods. Furthermore, we show thatour learned embeddings encode a wide range of visual and semantic concepts(e.g., subcategories, color, shape, style) and facilitate fine-grained text-3Dand image-3D interactions. Due to their alignment with CLIP embeddings, ourlearned shape representations can also be integrated with off-the-shelfCLIP-based models for various applications, such as point cloud captioning andpoint cloud-conditioned image generation.", "output": "OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial attack is commonly regarded as a huge threat to neural networksbecause of misleading behavior. This paper presents an opposite perspective:adversarial attacks can be harnessed to improve neural models if amendedcorrectly. Unlike traditional adversarial defense or adversarial trainingschemes that aim to improve the adversarial robustness, the proposedadversarial amendment (AdvAmd) method aims to improve the original accuracylevel of neural models on benign samples. We thoroughly analyze thedistribution mismatch between the benign and adversarial samples. Thisdistribution mismatch and the mutual learning mechanism with the same learningratio applied in prior art defense strategies is the main cause leading theaccuracy degradation for benign samples. The proposed AdvAmd is demonstrated tosteadily heal the accuracy degradation and even leads to a certain accuracyboost of common neural models on benign classification, object detection, andsegmentation tasks. The efficacy of the AdvAmd is contributed by three keycomponents: mediate samples (to reduce the influence of distribution mismatchwith a fine-grained amendment), auxiliary batch norm (to solve the mutuallearning mechanism and the smoother judgment surface), and AdvAmd loss (toadjust the learning ratios according to different attack vulnerabilities)through quantitative and ablation experiments.", "output": "Adversarial Amendment is the Only Force Capable of Transforming an Enemy into a Friend."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion Probability Models (DPMs) have made impressive advancements invarious machine learning domains. However, achieving high-quality syntheticsamples typically involves performing a large number of sampling steps, whichimpedes the possibility of real-time sample synthesis. Traditional acceleratedsampling algorithms via knowledge distillation rely on pre-trained modelweights and discrete time step scenarios, necessitating additional trainingsessions to achieve their goals. To address these issues, we propose theCatch-Up Distillation (CUD), which encourages the current moment output of thevelocity estimation model ``catch up'' with its previous moment output.Specifically, CUD adjusts the original Ordinary Differential Equation (ODE)training objective to align the current moment output with both the groundtruth label and the previous moment output, utilizing Runge-Kutta-basedmulti-step alignment distillation for precise ODE estimation while preventingasynchronous updates. Furthermore, we investigate the design space for CUDsunder continuous time-step scenarios and analyze how to determine the suitablestrategies. To demonstrate CUD's effectiveness, we conduct thorough ablationand comparison experiments on CIFAR-10, MNIST, and ImageNet-64. On CIFAR-10, weobtain a FID of 2.80 by sampling in 15 steps under one-session training and thenew state-of-the-art FID of 3.37 by sampling in one step with additionaltraining. This latter result necessitated only 62w iterations with a batch sizeof 128, in contrast to Consistency Distillation, which demanded 210w iterationswith a larger batch size of 256.", "output": "Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks frequently suffer from performance degradation when thetraining data is long-tailed because several majority classes dominate thetraining, resulting in a biased model. Recent studies have made a great effortin solving this issue by obtaining good representations from data space, butfew of them pay attention to the influence of feature norm on the predictedresults. In this paper, we therefore address the long-tailed problem fromfeature space and thereby propose the feature-balanced loss. Specifically, weencourage larger feature norms of tail classes by giving them relativelystronger stimuli. Moreover, the stimuli intensity is gradually increased in theway of curriculum learning, which improves the generalization of the tailclasses, meanwhile maintaining the performance of the head classes. Extensiveexperiments on multiple popular long-tailed recognition benchmarks demonstratethat the feature-balanced loss achieves superior performance gains comparedwith the state-of-the-art methods.", "output": "Feature-Balanced Loss for Long-Tailed Visual Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As Deepfake contents continue to proliferate on the internet, advancing facemanipulation forensics has become a pressing issue. To combat this emergingthreat, previous methods mainly focus on studying how to distinguish authenticand manipulated face images. Despite impressive, image-level classificationlacks explainability and is limited to some specific application scenarios.Existing forgery localization methods suffer from imprecise and inconsistentpixel-level annotations. To alleviate these problems, this paper firstre-constructs the FaceForensics++ dataset by introducing pixel-levelannotations, then builds an extensive benchmark for localizing tamperedregions. Next, a novel Multi-Spectral Class Center Network (MSCCNet) isproposed for face manipulation detection and localization. Specifically,inspired by the power of frequency-related forgery traces, we designMulti-Spectral Class Center (MSCC) module to learn more generalizable andsemantic-agnostic features. Based on the features of different frequency bands,the MSCC module collects multispectral class centers and computespixel-to-class relations. Applying multi-spectral class-level representationssuppresses the semantic information of the visual concepts, which isinsensitive to manipulations. Furthermore, we propose a Multi-level FeaturesAggregation (MFA) module to employ more low-level forgery artifacts andstructure textures. Experimental results quantitatively and qualitativelyindicate the effectiveness and superiority of the proposed MSCCNet oncomprehensive localization benchmarks. We expect this work to inspire morestudies on pixel-level face manipulation localization. The annotations and codewill be available.", "output": "Multi-spectral Class Center Network for Face Manipulation Detection and Localization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vision-language pre-training (VLP) models have been demonstrated to beeffective in many computer vision applications. In this paper, we considerdeveloping a VLP model in the medical domain for making computer-aideddiagnoses (CAD) based on image scans and text descriptions in electronic healthrecords, as done in practice. To achieve our goal, we present a lightweight CADsystem MedBLIP, a new paradigm for bootstrapping VLP from off-the-shelf frozenpre-trained image encoders and frozen large language models. We design aMedQFormer module to bridge the gap between 3D medical images and 2Dpre-trained image encoders and language models as well. To evaluate theeffectiveness of our MedBLIP, we collect more than 30,000 image volumes fromfive public Alzheimer's disease (AD) datasets, i.e., ADNI, NACC, OASIS, AIBL,and MIRIAD. On this largest AD dataset we know, our model achieves the SOTAperformance on the zero-shot classification of healthy, mild cognitiveimpairment (MCI), and AD subjects, and shows its capability of making medicalvisual question answering (VQA). The code and pre-trained models is availableonline: ", "output": "MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "DEtection TRansformer (DETR) and its variants (DETRs) achieved impressiveperformance in general object detection. However, in crowded pedestriandetection, the performance of DETRs is still unsatisfactory due to theinappropriate sample selection method which results in more false positives. Tosettle the issue, we propose a simple but effective sample selection method forDETRs, Sample Selection for Crowded Pedestrians (SSCP), which consists of theconstraint-guided label assignment scheme (CGLA) and the utilizability-awarefocal loss (UAFL). Our core idea is to select learnable samples for DETRs andadaptively regulate the loss weights of samples based on their utilizability.Specifically, in CGLA, we proposed a new cost function to ensure that onlylearnable positive training samples are retained and the rest are negativetraining samples. Further, considering the utilizability of samples, wedesigned UAFL to adaptively assign different loss weights to learnable positivesamples depending on their gradient ratio and IoU. Experimental results showthat the proposed SSCP effectively improves the baselines without introducingany overhead in inference. Especially, Iter Deformable DETR is improved to39.7(-2.0)% MR on Crowdhuman and 31.8(-0.4)% MR on Citypersons.", "output": "Selecting Learnable Training Samples is All DETRs Need in Crowded Pedestrian Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes a transformer-based learned image compression system. Itis capable of achieving variable-rate compression with a single model whilesupporting the region-of-interest (ROI) functionality. Inspired by prompttuning, we introduce prompt generation networks to condition thetransformer-based autoencoder of compression. Our prompt generation networksgenerate content-adaptive tokens according to the input image, an ROI mask, anda rate parameter. The separation of the ROI mask and the rate parameter allowsan intuitive way to achieve variable-rate and ROI coding simultaneously.Extensive experiments validate the effectiveness of our proposed method andconfirm its superiority over the other competing methods.", "output": "Transformer-based Variable-rate Image Compression with Region-of-interest Control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Domain gap between synthetic and real data in visual regression (eg 6D poseestimation) is bridged in this paper via global feature alignment and localrefinement on the coarse classification of discretized anchor classes in targetspace, which imposes a piece-wise target manifold regularization intodomain-invariant representation learning. Specifically, our method incorporatesan explicit self-supervised manifold regularization, revealing consistentcumulative target dependency across domains, to a self-training scheme (eg thepopular Self-Paced Self-Training) to encourage more discriminative transferablerepresentations of regression tasks. Moreover, learning unified implicit neuralfunctions to estimate relative direction and distance of targets to theirnearest class bins aims to refine target classification predictions, which cangain robust performance against inconsistent feature scaling sensitive to UDAregressors. Experiment results on three public benchmarks of the challenging 6Dpose estimation task can verify the effectiveness of our method, consistentlyachieving superior performance to the state-of-the-art for UDA on 6D poseestimation.", "output": "Manifold-Aware Self-Training for Unsupervised Domain Adaptation on Regressing 6D Object Pose."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work describes a Tree Ring Detection method for complete Cross-Sectionsof trees (CS-TRD). The method is based on the detection, processing, andconnection of edges corresponding to the tree's growth rings. The methoddepends on the parameters for the Canny Devernay edge detector ($sigma$ andtwo thresholds), a resize factor, the number of rays, and the pith location.The first five parameters are fixed by default. The pith location can be markedmanually or using an automatic pith detection algorithm. Besides the pithlocalization, the CS-TRD method is fully automated and achieves an F-Score of89% in the UruDendro dataset (of Pinus Taeda) with a mean execution time of 17seconds and of 97% in the Kennel dataset (of Abies Alba) with an averageexecution time 11 seconds.", "output": "CS-TRD: a Cross Sections Tree Ring Detection method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion model based language-guided image editing has achieved greatsuccess recently. However, existing state-of-the-art diffusion models strugglewith rendering correct text and text style during generation. To tackle thisproblem, we propose a universal self-supervised text editing diffusion model(DiffUTE), which aims to replace or modify words in the source image withanother one while maintaining its realistic appearance. Specifically, we buildour model on a diffusion model and carefully modify the network structure toenable the model for drawing multilingual characters with the help of glyph andposition information. Moreover, we design a self-supervised learning frameworkto leverage large amounts of web data to improve the representation ability ofthe model. Experimental results show that our method achieves an impressiveperformance and enables controllable editing on in-the-wild images with highfidelity. Our code will be avaliable inurl{", "output": "DiffUTE: Universal Text Editing Diffusion Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces a novel explainable image quality evaluation approachcalled X-IQE, which leverages visual large language models (LLMs) to evaluatetext-to-image generation methods by generating textual explanations. X-IQEutilizes a hierarchical Chain of Thought (CoT) to enable MiniGPT-4 to produceself-consistent, unbiased texts that are highly correlated with humanevaluation. It offers several advantages, including the ability to distinguishbetween real and generated images, evaluate text-image alignment, and assessimage aesthetics without requiring model training or fine-tuning. X-IQE is morecost-effective and efficient compared to human evaluation, while significantlyenhancing the transparency and explainability of deep image quality evaluationmodels. We validate the effectiveness of our method as a benchmark using imagesgenerated by prevalent diffusion models. X-IQE demonstrates similar performanceto state-of-the-art (SOTA) evaluation methods on COCO Caption, while overcomingthe limitations of previous evaluation models on DrawBench, particularly inhandling ambiguous generation prompts and text recognition in generated images.Project website:", "output": "X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This research paper proposes a Latent Diffusion Model for 3D (LDM3D) thatgenerates both image and depth map data from a given text prompt, allowingusers to generate RGBD images from text prompts. The LDM3D model is fine-tunedon a dataset of tuples containing an RGB image, depth map and caption, andvalidated through extensive experiments. We also develop an application calledDepthFusion, which uses the generated RGB images and depth maps to createimmersive and interactive 360-degree-view experiences using TouchDesigner. Thistechnology has the potential to transform a wide range of industries, fromentertainment and gaming to architecture and design. Overall, this paperpresents a significant contribution to the field of generative AI and computervision, and showcases the potential of LDM3D and DepthFusion to revolutionizecontent creation and digital experiences. A short video summarizing theapproach can be found at ", "output": "LDM3D: Latent Diffusion Model for 3D."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As a fundamental problem in computer vision, 3D point cloud registration(PCR) aims to seek the optimal pose to align a point cloud pair. In this paper,we present a 3D registration method with maximal cliques (MAC). The key insightis to loosen the previous maximum clique constraint, and mine more localconsensus information in a graph for accurate pose hypotheses generation: 1) Acompatibility graph is constructed to render the affinity relationship betweeninitial correspondences. 2) We search for maximal cliques in the graph, each ofwhich represents a consensus set. We perform node-guided clique selection then,where each node corresponds to the maximal clique with the greatest graphweight. 3) Transformation hypotheses are computed for the selected cliques bythe SVD algorithm and the best hypothesis is used to perform registration.Extensive experiments on U3M, 3DMatch, 3DLoMatch and KITTI demonstrate that MACeffectively increases registration accuracy, outperforms variousstate-of-the-art methods and boosts the performance of deep-learned methods.MAC combined with deep-learned methods achieves state-of-the-art registrationrecall of 95.7% / 78.9% on 3DMatch / 3DLoMatch.", "output": "3D Registration with Maximal Cliques."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have gained increasing attention for their impressivegeneration abilities but currently struggle with rendering accurate andcoherent text. To address this issue, we introduce textbf{TextDiffuser},focusing on generating images with visually appealing text that is coherentwith backgrounds. TextDiffuser consists of two stages: first, a Transformermodel generates the layout of keywords extracted from text prompts, and thendiffusion models generate images conditioned on the text prompt and thegenerated layout. Additionally, we contribute the first large-scale text imagesdataset with OCR annotations, textbf{MARIO-10M}, containing 10 millionimage-text pairs with text recognition, detection, and character-levelsegmentation annotations. We further collect the textbf{MARIO-Eval} benchmarkto serve as a comprehensive tool for evaluating text rendering quality. Throughexperiments and user studies, we show that TextDiffuser is flexible andcontrollable to create high-quality text images using text prompts alone ortogether with text template images, and conduct text inpainting to reconstructincomplete images with text. The code, model, and dataset will be available aturl{", "output": "TextDiffuser: Diffusion Models as Text Painters."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The vulnerability of deep neural networks to adversarial perturbations hasbeen widely perceived in the computer vision community. From a securityperspective, it poses a critical risk for modern vision systems, e.g., thepopular Deep Learning as a Service (DLaaS) frameworks. For protectingoff-the-shelf deep models while not modifying them, current algorithmstypically detect adversarial patterns through discriminative decomposition ofnatural-artificial data. However, these decompositions are biased towardsfrequency or spatial discriminability, thus failing to capture subtleadversarial patterns comprehensively. More seriously, they are typicallyinvertible, meaning successful defense-aware (secondary) adversarial attack(i.e., evading the detector as well as fooling the model) is practical underthe assumption that the adversary is fully aware of the detector (i.e., theKerckhoffs's principle). Motivated by such facts, we propose an accurate andsecure adversarial example detector, relying on a spatial-frequencydiscriminative decomposition with secret keys. It expands the above works ontwo aspects: 1) the introduced Krawtchouk basis provides betterspatial-frequency discriminability and thereby is more suitable for capturingadversarial patterns than the common trigonometric or wavelet basis; 2) theextensive parameters for decomposition are generated by a pseudo-randomfunction with secret keys, hence blocking the defense-aware adversarial attack.Theoretical and numerical analysis demonstrates the increased accuracy andsecurity of our detector w.r.t. a number of state-of-the-art algorithms.", "output": "Towards an Accurate and Secure Detector against Adversarial Perturbations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Learning is currently used to perform multiple tasks, such as objectrecognition, face recognition, and natural language processing. However, DeepNeural Networks (DNNs) are vulnerable to perturbations that alter the networkprediction (adversarial examples), raising concerns regarding its usage incritical areas, such as self-driving vehicles, malware detection, andhealthcare. This paper compiles the most recent adversarial attacks, grouped bythe attacker capacity, and modern defenses clustered by protection strategies.We also present the new advances regarding Vision Transformers, summarize thedatasets and metrics used in the context of adversarial settings, and comparethe state-of-the-art results under different attacks, finishing with theidentification of open issues.", "output": "How Deep Learning Sees the World: A Survey on Adversarial Attacks & Defenses."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Incremental few-shot semantic segmentation (IFSS) aims to incrementallyextend a semantic segmentation model to novel classes according to only a fewpixel-level annotated data, while preserving its segmentation capability onpreviously learned base categories. This task faces a severe semantic-aliasingissue between base and novel classes due to data imbalance, which makessegmentation results unsatisfactory. To alleviate this issue, we propose theSemantic-guided Relation Alignment and Adaptation (SRAA) method that fullyconsiders the guidance of prior semantic information. Specifically, we firstconduct Semantic Relation Alignment (SRA) in the base step, so as tosemantically align base class representations to their semantics. As a result,the embeddings of base classes are constrained to have relatively low semanticcorrelations to categories that are different from them. Afterwards, based onthe semantically aligned base categories, Semantic-Guided Adaptation (SGA) isemployed during the incremental learning stage. It aims to ensure affinitiesbetween visual and semantic embeddings of encountered novel categories, therebymaking the feature representations be consistent with their semanticinformation. In this way, the semantic-aliasing issue can be suppressed. Weevaluate our model on the PASCAL VOC 2012 and the COCO dataset. Theexperimental results on both these two datasets exhibit its competitiveperformance, which demonstrates the superiority of our method.", "output": "Advancing Incremental Few-shot Semantic Segmentation via Semantic-guided Relation Alignment and Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present VideoFactory, an innovative framework for generating high-qualityopen-domain videos. VideoFactory excels in producing high-definition(1376x768), widescreen (16:9) videos without watermarks, creating an engaginguser experience. Generating videos guided by text instructions posessignificant challenges, such as modeling the complex relationship between spaceand time, and the lack of large-scale text-video paired data. Previousapproaches extend pretrained text-to-image generation models by adding temporal1D convolution/attention modules for video generation. However, theseapproaches overlook the importance of jointly modeling space and time,inevitably leading to temporal distortions and misalignment between texts andvideos. In this paper, we propose a novel approach that strengthens theinteraction between spatial and temporal perceptions. In particular, we utilizea swapped cross-attention mechanism in 3D windows that alternates the \"query\"role between spatial and temporal blocks, enabling mutual reinforcement foreach other. To fully unlock model capabilities for high-quality videogeneration, we curate a large-scale video dataset called HD-VG-130M. Thisdataset comprises 130 million text-video pairs from the open-domain, ensuringhigh-definition, widescreen and watermark-free characters. Objective metricsand user studies demonstrate the superiority of our approach in terms ofper-frame quality, temporal correlation, and text-video alignment, with clearmargins.", "output": "VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper addresses the problem of translating night-time thermal infraredimages, which are the most adopted image modalities to analyze night-timescenes, to daytime color images (NTIT2DC), which provide better perceptions ofobjects. We introduce a novel model that focuses on enhancing the quality ofthe target generation without merely colorizing it. The proposed structuralaware (StawGAN) enables the translation of better-shaped and high-definitionobjects in the target domain. We test our model on aerial images of theDroneVeichle dataset containing RGB-IR paired images. The proposed approachproduces a more accurate translation with respect to other state-of-the-artimage translation models. The source code is available at", "output": "StawGAN: Structural-Aware Generative Adversarial Networks for Infrared Image Translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video-assisted transoral tracheal intubation (TI) necessitates using anendoscope that helps the physician insert a tracheal tube into the glottisinstead of the esophagus. The growing trend of robotic-assisted TI wouldrequire a medical robot to distinguish anatomical features like an experiencedphysician which can be imitated by utilizing supervised deep-learningtechniques. However, the real datasets of oropharyngeal organs are ofteninaccessible due to limited open-source data and patient privacy. In this work,we propose a domain adaptive Sim-to-Real framework called IoU-RankingBlend-ArtFlow (IRB-AF) for image segmentation of oropharyngeal organs. Theframework includes an image blending strategy called IoU-Ranking Blend (IRB)and style-transfer method ArtFlow. Here, IRB alleviates the problem of poorsegmentation performance caused by significant datasets domain differences;while ArtFlow is introduced to reduce the discrepancies between datasetsfurther. A virtual oropharynx image dataset generated by the SOFA framework isused as the learning subject for semantic segmentation to deal with the limitedavailability of actual endoscopic images. We adapted IRB-AF with thestate-of-the-art domain adaptive segmentation models. The results demonstratethe superior performance of our approach in further improving the segmentationaccuracy and training stability.", "output": "Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Real-world image manipulation has achieved fantastic progress in recentyears. GAN inversion, which aims to map the real image to the latent codefaithfully, is the first step in this pipeline. However, existing GAN inversionmethods fail to achieve high reconstruction quality and fast inference at thesame time. In addition, existing methods are built on 2D GANs and lackexplicitly mechanisms to enforce multi-view consistency.In this work, wepresent a novel meta-auxiliary framework, while leveraging the newly developed3D GANs as generator. The proposed method adopts a two-stage strategy. In thefirst stage, we invert the input image to an editable latent code usingoff-the-shelf inversion techniques. The auxiliary network is proposed to refinethe generator parameters with the given image as input, which both predictsoffsets for weights of convolutional layers and sampling positions of volumerendering. In the second stage, we perform meta-learning to fast adapt theauxiliary network to the input image, then the final reconstructed image issynthesized via the meta-learned auxiliary network. Extensive experiments showthat our method achieves better performances on both inversion and editingtasks.", "output": "Meta-Auxiliary Network for 3D GAN Inversion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Low-light image enhancement (LLIE) is an ill-posed inverse problem due to thelack of knowledge of the desired image which is obtained under idealillumination conditions. Low-light conditions give rise to two main issues: asuppressed image histogram and inconsistent relative color distributions withlow signal-to-noise ratio. In order to address these problems, we propose anovel approach named FLIGHT-Net using a sequence of neural architecture blocks.The first block regulates illumination conditions through pixel-wise scenedependent illumination adjustment. The output image is produced in the outputof the second block, which includes channel attention and denoising sub-blocks.Our highly efficient neural network architecture delivers state-of-the-artperformance with only 25K parameters. The method's code, pretrained models andresulting images will be publicly available.", "output": "FLIGHT Mode On: A Feather-Light Network for Low-Light Image Enhancement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In knowledge distillation, the knowledge from the teacher model is often toocomplex for the student model to thoroughly process. However, good teachers inreal life always simplify complex material before teaching it to students.Inspired by this fact, we propose student-friendly knowledge distillation (SKD)to simplify teacher output into new knowledge representations, which makes thelearning of the student model easier and more effective. SKD contains asoftening processing and a learning simplifier. First, the softening processinguses the temperature hyperparameter to soften the output logits of the teachermodel, which simplifies the output to some extent and makes it easier for thelearning simplifier to process. The learning simplifier utilizes the attentionmechanism to further simplify the knowledge of the teacher model and is jointlytrained with the student model using the distillation loss, which means thatthe process of simplification is correlated with the training objective of thestudent model and ensures that the simplified new teacher knowledgerepresentation is more suitable for the specific student model. Furthermore,since SKD does not change the form of the distillation loss, it can be easilycombined with other distillation methods that are based on the logits orfeatures of intermediate layers to enhance its effectiveness. Therefore, SKDhas wide applicability. The experimental results on the CIFAR-100 and ImageNetdatasets show that our method achieves state-of-the-art performance whilemaintaining high training efficiency.", "output": "Student-friendly Knowledge Distillation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the increasing interest and rapid development of methods for Ultra-HighResolution (UHR) segmentation, a large-scale benchmark covering a wide range ofscenes with full fine-grained dense annotations is urgently needed tofacilitate the field. To this end, the URUR dataset is introduced, in themeaning of Ultra-High Resolution dataset with Ultra-Rich Context. As the namesuggests, URUR contains amounts of images with high enough resolution (3,008images of size 5,120x5,120), a wide range of complex scenes (from 63 cities),rich-enough context (1 million instances with 8 categories) and fine-grainedannotations (about 80 billion manually annotated pixels), which is far superiorto all the existing UHR datasets including DeepGlobe, Inria Aerial, UDD, etc..Moreover, we also propose WSDNet, a more efficient and effective framework forUHR segmentation especially with ultra-rich context. Specifically, multi-levelDiscrete Wavelet Transform (DWT) is naturally integrated to release computationburden while preserve more spatial details, along with a Wavelet Smooth Loss(WSL) to reconstruct original structured context and texture with a smoothconstrain. Experiments on several UHR datasets demonstrate its state-of-the-artperformance. The dataset is available at ", "output": "Ultra-High Resolution Segmentation with Ultra-Rich Context: A Novel Benchmark."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Using only image-sentence pairs, weakly-supervised visual-textual groundingaims to learn region-phrase correspondences of the respective entity mentions.Compared to the supervised approach, learning is more difficult since boundingboxes and textual phrases correspondences are unavailable. In light of this, wepropose the Semantic Prior Refinement Model (SPRM), whose predictions areobtained by combining the output of two main modules. The first untrainedmodule aims to return a rough alignment between textual phrases and boundingboxes. The second trained module is composed of two sub-components that refinethe rough alignment to improve the accuracy of the final phrase-bounding boxalignments. The model is trained to maximize the multimodal similarity betweenan image and a sentence, while minimizing the multimodal similarity of the samesentence and a new unrelated image, carefully selected to help the most duringtraining. Our approach shows state-of-the-art results on two popular datasets,Flickr30k Entities and ReferIt, shining especially on ReferIt with a 9.6%absolute improvement. Moreover, thanks to the untrained component, it reachescompetitive performances just using a small fraction of training examples.", "output": "Weakly-Supervised Visual-Textual Grounding with Semantic Prior Refinement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "How can we reliably transfer affect models trained in controlled laboratoryconditions (in-vitro) to uncontrolled real-world settings (in-vivo)? Theinformation gap between in-vitro and in-vivo applications defines a corechallenge of affective computing. This gap is caused by limitations related toaffect sensing including intrusiveness, hardware malfunctions and availabilityof sensors. As a response to these limitations, we introduce the concept ofprivileged information for operating affect models in real-world scenarios (inthe wild). Privileged information enables affect models to be trained acrossmultiple modalities available in a lab, and ignore, without significantperformance drops, those modalities that are not available when they operate inthe wild. Our approach is tested in two multimodal affect databases one ofwhich is designed for testing models of affect in the wild. By training ouraffect models using all modalities and then using solely raw footage frames fortesting the models, we reach the performance of models that fuse all availablemodalities for both training and testing. The results are robust across bothclassification and regression affect modeling tasks which are dominantparadigms in affective computing. Our findings make a decisive step towardsrealizing affect interaction in the wild.", "output": "From the Lab to the Wild: Affect Modeling via Privileged Information."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative modeling has recently undergone remarkable advancements, primarilypropelled by the transformative implications of Diffusion Probabilistic Models(DPMs). The impressive capability of these models, however, often entailssignificant computational overhead during both training and inference. Totackle this challenge, we present Diff-Pruning, an efficient compression methodtailored for learning lightweight diffusion models from pre-existing ones,without the need for extensive re-training. The essence of Diff-Pruning isencapsulated in a Taylor expansion over pruned timesteps, a process thatdisregards non-contributory diffusion steps and ensembles informative gradientsto identify important weights. Our empirical assessment, undertaken across fourdiverse datasets highlights two primary benefits of our proposed method: 1)Efficiency: it enables approximately a 50% reduction in FLOPs at a mere 10% to20% of the original training expenditure; 2) Consistency: the pruned diffusionmodels inherently preserve generative behavior congruent with their pre-trainedprogenitors. Code is available at url{", "output": "Structural Pruning for Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Pansharpening is a process of merging a highresolution panchromatic (PAN)image and a low-resolution multispectral (LRMS) image to create a singlehigh-resolution multispectral (HRMS) image. Most of the existing deeplearningbased pansharpening methods have poor generalization ability and thetraditional model-based pansharpening methods need careful manual explorationfor the image structure prior. To alleviate these issues, this paper proposesan unsupervised pansharpening method by combining the diffusion model with thelow-rank matrix factorization technique. Specifically, we assume that the HRMSimage is decomposed into the product of two low-rank tensors, i.e., the basetensor and the coefficient matrix. The base tensor lies on the image field andhas low spectral dimension, we can thus conveniently utilize a pre-trainedremote sensing diffusion model to capture its image structures. Additionally,we derive a simple yet quite effective way to preestimate the coefficientmatrix from the observed LRMS image, which preserves the spectral informationof the HRMS. Extensive experimental results on some benchmark datasetsdemonstrate that our proposed method performs better than traditionalmodel-based approaches and has better generalization ability than deeplearning-based techniques. The code is released in", "output": "Unsupervised Pansharpening via Low-rank Diffusion Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Hyperbolic manifolds for visual representation learning allow for effectivelearning of semantic class hierarchies by naturally embedding tree-likestructures with low distortion within a low-dimensional representation space.The highly separable semantic class hierarchies produced by hyperbolic learninghave shown to be powerful in low-shot tasks, however, their application inself-supervised learning is yet to be explored fully. In this work, we explorethe use of hyperbolic representation space for self-supervised representationlearning for prototype-based clustering approaches. First, we extend the MaskedSiamese Networks to operate on the Poincar'e ball model of hyperbolic space,secondly, we place prototypes on the ideal boundary of the Poincar'e ball.Unlike previous methods we project to the hyperbolic space at the output of theencoder network and utilise a hyperbolic projection head to ensure that therepresentations used for downstream tasks remain hyperbolic. Empirically wedemonstrate the ability of these methods to perform comparatively to Euclideanmethods in lower dimensions for linear evaluation tasks, whilst showingimprovements in extreme few-shot learning tasks.", "output": "HMSN: Hyperbolic Self-Supervised Learning by Clustering with Ideal Prototypes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The adversarial patch attack aims to fool image classifiers within a bounded,contiguous region of arbitrary changes, posing a real threat to computer visionsystems (e.g., autonomous driving, content moderation, biometricauthentication, medical imaging) in the physical world. To address this problemin a trustworthy way, proposals have been made for certified patch defensesthat ensure the robustness of classification models and prevent future patchattacks from breaching the defense. State-of-the-art certified defenses can becompatible with any model architecture, as well as achieve high clean andcertified accuracy. Although the methods are adaptive to arbitrary patchpositions, they inevitably need to access the size of the adversarial patch,which is unreasonable and impractical in real-world attack scenarios. Toimprove the feasibility of the architecture-agnostic certified defense in ablack-box setting (i.e., position and size of the patch are both unknown), wepropose a novel two-stage Iterative Black-box Certified Defense method, termedIBCD.In the first stage, it estimates the patch size in a search-based mannerby evaluating the size relationship between the patch and mask with pixelmasking. In the second stage, the accuracy results are calculated by theexisting white-box certified defense methods with the estimated patch size. Theexperiments conducted on two popular model architectures and two datasetsverify the effectiveness and efficiency of IBCD.", "output": "Architecture-agnostic Iterative Black-box Certified Defense against Adversarial Patches."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reducing the number of bits needed to encode the weights and activations ofneural networks is highly desirable as it speeds up their training andinference time while reducing memory consumption. For these reasons, researchin this area has attracted significant attention toward developing neuralnetworks that leverage lower-precision computing, such as mixed-precisiontraining. Interestingly, none of the existing approaches has investigated pure16-bit floating-point settings. In this paper, we shed light on the overlookedefficiency of pure 16-bit floating-point neural networks. As such, we provide acomprehensive theoretical analysis to investigate the factors contributing tothe differences observed between 16-bit and 32-bit models. We formalize theconcepts of floating-point error and tolerance, enabling us to quantitativelyexplain the conditions under which a 16-bit model can closely approximate theresults of its 32-bit counterpart. This theoretical exploration offersperspective that is distinct from the literature which attributes the successof low-precision neural networks to its regularization effect. This in-depthanalysis is supported by an extensive series of experiments. Our findingsdemonstrate that pure 16-bit floating-point neural networks can achieve similaror even better performance than their mixed-precision and 32-bit counterparts.We believe the results presented in this paper will have significantimplications for machine learning practitioners, offering an opportunity toreconsider using pure 16-bit networks in various applications.", "output": "In Defense of Pure 16-bit Floating-Point Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Synthesizing visual content that meets users' needs often requires flexibleand precise controllability of the pose, shape, expression, and layout of thegenerated objects. Existing approaches gain controllability of generativeadversarial networks (GANs) via manually annotated training data or a prior 3Dmodel, which often lack flexibility, precision, and generality. In this work,we study a powerful yet much less explored way of controlling GANs, that is, to\"drag\" any points of the image to precisely reach target points in auser-interactive manner, as shown in Fig.1. To achieve this, we proposeDragGAN, which consists of two main components: 1) a feature-based motionsupervision that drives the handle point to move towards the target position,and 2) a new point tracking approach that leverages the discriminativegenerator features to keep localizing the position of the handle points.Through DragGAN, anyone can deform an image with precise control over wherepixels go, thus manipulating the pose, shape, expression, and layout of diversecategories such as animals, cars, humans, landscapes, etc. As thesemanipulations are performed on the learned generative image manifold of a GAN,they tend to produce realistic outputs even for challenging scenarios such ashallucinating occluded content and deforming shapes that consistently followthe object's rigidity. Both qualitative and quantitative comparisonsdemonstrate the advantage of DragGAN over prior approaches in the tasks ofimage manipulation and point tracking. We also showcase the manipulation ofreal images through GAN inversion.", "output": "Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "3D object detection plays a crucial role in numerous intelligent visionsystems. Detection in the open world inevitably encounters various adversescenes, such as dense fog, heavy rain, and low light conditions. Althoughexisting efforts primarily focus on diversifying network architecture ortraining schemes, resulting in significant progress in 3D object detection,most of these learnable modules fail in adverse scenes, thereby hinderingdetection performance. To address this issue, this paper proposes a monocular3D detection model designed to perceive twin depth in adverse scenes, termedMonoTDP, which effectively mitigates the degradation of detection performancein various harsh environments. Specifically, we first introduce an adaptivelearning strategy to aid the model in handling uncontrollable weatherconditions, significantly resisting degradation caused by various degradingfactors. Then, to address the depth/content loss in adverse regions, we proposea novel twin depth perception module that simultaneously estimates scene andobject depth, enabling the integration of scene-level features and object-levelfeatures. Additionally, we assemble a new adverse 3D object detection datasetencompassing a wide range of challenging scenes, including rainy, foggy, andlow light weather conditions, with each type of scene containing 7,481 images.Experimental results demonstrate that our proposed method outperforms currentstate-of-the-art approaches by an average of 3.12% in terms of AP_R40 for carcategory across various adverse environments.", "output": "MonoTDP: Twin Depth Perception for Monocular 3D Object Detection in Adverse Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Ocular Toxoplasmosis (OT), is a common eye infection caused by T. gondii thatcan cause vision problems. Diagnosis is typically done through a clinicalexamination and imaging, but these methods can be complicated and costly,requiring trained personnel. To address this issue, we have created a benchmarkstudy that evaluates the effectiveness of existing pre-trained networks usingtransfer learning techniques to detect OT from fundus images. Furthermore, wehave also analysed the performance of transfer-learning based segmentationnetworks to segment lesions in the images. This research seeks to provide aguide for future researchers looking to utilise DL techniques and develop acheap, automated, easy-to-use, and accurate diagnostic method. We haveperformed in-depth analysis of different feature extraction techniques in orderto find the most optimal one for OT classification and segmentation of lesions.For classification tasks, we have evaluated pre-trained models such as VGG16,MobileNetV2, InceptionV3, ResNet50, and DenseNet121 models. Among them,MobileNetV2 outperformed all other models in terms of Accuracy (Acc), Recall,and F1 Score outperforming the second-best model, InceptionV3 by 0.7% higherAcc. However, DenseNet121 achieved the best result in terms of Precision, whichwas 0.1% higher than MobileNetv2. For the segmentation task, this work hasexploited U-Net architecture. In order to utilize transfer learning the encoderblock of the traditional U-Net was replaced by MobileNetV2, InceptionV3,ResNet34, and VGG16 to evaluate different architectures moreover two differenttwo different loss functions (Dice loss and Jaccard loss) were exploited inorder to find the most optimal one. The MobileNetV2/U-Net outperformed ResNet34by 0.5% and 2.1% in terms of Acc and Dice Score, respectively when Jaccard lossfunction is employed during the training.", "output": "Benchmarking Deep Learning Frameworks for Automated Diagnosis of Ocular Toxoplasmosis: A Comprehensive Approach to Classification and Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectivelyassess the human perceptual quality of omnidirectional images (ODIs) withoutrelying on pristine-quality image information. It is becoming more significantwith the increasing advancement of virtual reality (VR) technology. However,the quality assessment of ODIs is severely hampered by the fact that theexisting BOIQA pipeline lacks the modeling of the observer's browsing process.To tackle this issue, we propose a novel multi-sequence network for BOIQAcalled Assessor360, which is derived from the realistic multi-assessor ODIquality assessment procedure. Specifically, we propose a generalized RecursiveProbability Sampling (RPS) method for the BOIQA task, combining content anddetailed information to generate multiple pseudo viewport sequences from agiven starting point. Additionally, we design a Multi-scale Feature Aggregation(MFA) module with Distortion-aware Block (DAB) to fuse distorted and semanticfeatures of each viewport. We also devise TMM to learn the viewport transitionin the temporal domain. Extensive experimental results demonstrate thatAssessor360 outperforms state-of-the-art methods on multiple OIQA datasets.", "output": "Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Weakly-Supervised Concealed Object Segmentation (WSCOS) aims to segmentobjects well blended with surrounding environments using sparsely-annotateddata for model training. It remains a challenging task since (1) it is hard todistinguish concealed objects from the background due to the intrinsicsimilarity and (2) the sparsely-annotated training data only provide weaksupervision for model learning. In this paper, we propose a new WSCOS method toaddress these two challenges. To tackle the intrinsic similarity challenge, wedesign a multi-scale feature grouping module that first groups features atdifferent granularities and then aggregates these grouping results. By groupingsimilar features together, it encourages segmentation coherence, helping obtaincomplete segmentation results for both single and multiple-object images. Forthe weak supervision challenge, we utilize the recently-proposed visionfoundation model, Segment Anything Model (SAM), and use the provided sparseannotations as prompts to generate segmentation masks, which are used to trainthe model. To alleviate the impact of low-quality segmentation masks, wefurther propose a series of strategies, including multi-augmentation resultensemble, entropy-based pixel-level weighting, and entropy-based image-levelselection. These strategies help provide more reliable supervision to train thesegmentation model. We verify the effectiveness of our method on various WSCOStasks, and experiments demonstrate that our method achieves state-of-the-artperformance on these tasks.", "output": "Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances in deep learning-based medical image segmentation studiesachieve nearly human-level performance in fully supervised manner. However,acquiring pixel-level expert annotations is extremely expensive and laboriousin medical imaging fields. Unsupervised domain adaptation (UDA) can alleviatethis problem, which makes it possible to use annotated data in one imagingmodality to train a network that can successfully perform segmentation ontarget imaging modality with no labels. In this work, we propose SDC-UDA, asimple yet effective volumetric UDA framework for slice-direction continuouscross-modality medical image segmentation which combines intra- and inter-sliceself-attentive image translation, uncertainty-constrained pseudo-labelrefinement, and volumetric self-training. Our method is distinguished fromprevious methods on UDA for medical image segmentation in that it can obtaincontinuous segmentation in the slice direction, thereby ensuring higheraccuracy and potential in clinical practice. We validate SDC-UDA with multiplepublicly available cross-modality medical image segmentation datasets andachieve state-of-the-art segmentation performance, not to mention the superiorslice-direction continuity of prediction compared to previous studies.", "output": "SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The objective of Audio-Visual Segmentation (AVS) is to locate soundingobjects within visual scenes by accurately predicting pixelwise segmentationmasks. In this paper, we present the following contributions: (i), we propose ascalable and annotation-free pipeline for generating artificial data for theAVS task. We leverage existing image segmentation and audio datasets to drawlinks between category labels, image-mask pairs, and audio samples, whichallows us to easily compose (image, audio, mask) triplets for training AVSmodels; (ii), we introduce a novel Audio-Aware Transformer (AuTR) architecturethat features an audio-aware query-based transformer decoder. This architectureenables the model to search for sounding objects with the guidance of audiosignals, resulting in more accurate segmentation; (iii), we present extensiveexperiments conducted on both synthetic and real datasets, which demonstratethe effectiveness of training AVS models with synthetic data generated by ourproposed pipeline. Additionally, our proposed AuTR architecture exhibitssuperior performance and strong generalization ability on public benchmarks.The project page is ", "output": "Annotation-free Audio-Visual Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the problem of registration for medical CT images from a novelperspective -- the sensitivity to degree of deformations in CT images. Althoughsome learning-based methods have shown success in terms of average accuracy,their ability to handle regions with local large deformation (LLD) maysignificantly decrease compared to dealing with regions with minor deformation.This motivates our research into this issue. Two main causes of LLDs are organmotion and changes in tissue structure, with the latter often being a long-termprocess. In this paper, we propose a novel registration model calledCascade-Dilation Inter-Layer Differential Network (CDIDN), which exhibits bothhigh deformation impedance capability (DIC) and accuracy. CDIDN improves itsresilience to LLDs in CT images by enhancing LLDs in the displacement field(DF). It uses a feature-based progressive decomposition of LLDs, blendingfeature flows of different levels into a main flow in a top-down manner. Itleverages Inter-Layer Differential Module (IDM) at each level to locally refinethe main flow and globally smooth the feature flow, and also integrates featurevelocity fields that can effectively handle feature deformations of variousdegrees. We assess CDIDN using lungs as representative organs with largedeformation. Our findings show that IDM significantly enhances LLDs of the DF,by which improves the DIC and accuracy of the model. Compared with otheroutstanding learning-based methods, CDIDN exhibits the best DIC and excellentaccuracy. Based on vessel enhancement and enhanced LLDs of the DF, we propose anovel method to accurately track the appearance, disappearance, enlargement,and shrinkage of pulmonary lesions, which effectively addresses detection ofearly lesions and peripheral lung lesions, issues of false enlargement, falseshrinkage, and mutilation of lesions.", "output": "CDIDN: A Registration Model with High Deformation Impedance Capability for Long-Term Tracking of Pulmonary Lesion Dynamics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural Radiance Fields (NeRF) has demonstrated remarkable 3D reconstructioncapabilities with dense view images. However, its performance significantlydeteriorates under sparse view settings. We observe that learning the 3Dconsistency of pixels among different views is crucial for improvingreconstruction quality in such cases. In this paper, we propose ConsistentNeRF,a method that leverages depth information to regularize both multi-view andsingle-view 3D consistency among pixels. Specifically, ConsistentNeRF employsdepth-derived geometry information and a depth-invariant loss to concentrate onpixels that exhibit 3D correspondence and maintain consistent depthrelationships. Extensive experiments on recent representative works reveal thatour approach can considerably enhance model performance in sparse viewconditions, achieving improvements of up to 94% in PSNR, 76% in SSIM, and 31%in LPIPS compared to the vanilla baselines across various benchmarks, includingDTU, NeRF Synthetic, and LLFF.", "output": "ConsistentNeRF: Enhancing Neural Radiance Fields with 3D Consistency for Sparse View Synthesis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual Question Answering (VQA) is an emerging area of interest forresearches, being a recent problem in natural language processing and imageprediction. In this area, an algorithm needs to answer questions about certainimages. As of the writing of this survey, 25 recent studies were analyzed.Besides, 6 datasets were analyzed and provided their link to download. In thiswork, several recent pieces of research in this area were investigated and adeeper analysis and comparison among them were provided, including results, thestate-of-the-art, common errors, and possible points of improvement for futureresearchers.", "output": "Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Inspired by the traditional partial differential equation (PDE) approach forimage denoising, we propose a novel neural network architecture, referred asNODE-ImgNet, that combines neural ordinary differential equations (NODEs) withconvolutional neural network (CNN) blocks. NODE-ImgNet is intrinsically a PDEmodel, where the dynamic system is learned implicitly without the explicitspecification of the PDE. This naturally circumvents the typical issuesassociated with introducing artifacts during the learning process. By invokingsuch a NODE structure, which can also be viewed as a continuous variant of aresidual network (ResNet) and inherits its advantage in image denoising, ourmodel achieves enhanced accuracy and parameter efficiency. In particular, ourmodel exhibits consistent effectiveness in different scenarios, includingdenoising gray and color images perturbed by Gaussian noise, as well asreal-noisy images, and demonstrates superiority in learning from small imagedatasets.", "output": "NODE-ImgNet: a PDE-informed effective and robust model for image denoising."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Past work demonstrated that using neural networks, we can extend unfinishedmusic pieces while maintaining the music style of the musician. With recentadvancements in large language models and diffusion models, we are now capableof generating comics with an interesting storyline while maintaining the artstyle of the artist. In this paper, we used ChatGPT to generate storylines anddialogue and then generated the comic using stable diffusion. We introduced anovel way to evaluate AI-generated stories, and we achieved SOTA performance oncharacter fidelity and art style by fine-tuning stable diffusion using LoRA,ControlNet, etc.", "output": "Generating coherent comic with rich story using ChatGPT and Stable Diffusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Contemporary face detection algorithms have to deal with many challenges suchas variations in pose, illumination, and scale. A subclass of the facedetection problem that has recently gained increasing attention is occludedface detection, or more specifically, the detection of masked faces. Threeyears on since the advent of the COVID-19 pandemic, there is still a completelack of evidence regarding how well existing face detection algorithms performon masked faces. This article first offers a brief review of state-of-the-artface detectors and detectors made for the masked face problem, along with areview of the existing masked face datasets. We evaluate and compare theperformances of a well-representative set of face detectors at masked facedetection and conclude with a discussion on the possible contributing factorsto their performance.", "output": "A Comparative Study of Face Detection Algorithms for Masked Face Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent progress in generative models has resulted in models that produce bothrealistic as well as relevant images for most textual inputs. These models arebeing used to generate millions of images everyday, and hold the potential todrastically impact areas such as generative art, digital marketing and dataaugmentation. Given their outsized impact, it is important to ensure that thegenerated content reflects the artifacts and surroundings across the globe,rather than over-representing certain parts of the world. In this paper, wemeasure the geographical representativeness of common nouns (e.g., a house)generated through DALL.E 2 and Stable Diffusion models using a crowdsourcedstudy comprising 540 participants across 27 countries. For deliberatelyunderspecified inputs without country names, the generated images most reflectthe surroundings of the United States followed by India, and the topgenerations rarely reflect surroundings from all other countries (average scoreless than 3 out of 5). Specifying the country names in the input increases therepresentativeness by 1.44 points on average for DALL.E 2 and 0.75 for StableDiffusion, however, the overall scores for many countries still remain low,highlighting the need for future models to be more geographically inclusive.Lastly, we examine the feasibility of quantifying the geographicalrepresentativeness of generated images without conducting user studies.", "output": "Inspecting the Geographical Representativeness of Images from Text-to-Image Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Typical generative diffusion models rely on a Gaussian diffusion process fortraining the backward transformations, which can then be used to generatesamples from Gaussian noise. However, real world data often takes place indiscrete-state spaces, including many scientific applications. Here, we developa theoretical formulation for arbitrary discrete-state Markov processes in theforward diffusion process using exact (as opposed to variational) analysis. Werelate the theory to the existing continuous-state Gaussian diffusion as wellas other approaches to discrete diffusion, and identify the correspondingreverse-time stochastic process and score function in the continuous-timesetting, and the reverse-time mapping in the discrete-time setting. As anexample of this framework, we introduce ``Blackout Diffusion'', which learns toproduce samples from an empty image instead of from noise. Numericalexperiments on the CIFAR-10, Binarized MNIST, and CelebA datasets confirm thefeasibility of our approach. Generalizing from specific (Gaussian) forwardprocesses to discrete-state processes without a variational approximation shedslight on how to interpret diffusion models, which we discuss.", "output": "Blackout Diffusion: Generative Diffusion Models in Discrete-State Spaces."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Foundation models (e.g., CLIP or DINOv2) have shown their impressive learningand transferring capabilities on a wide range of visual tasks, by training on alarge corpus of data and adapting to specific downstream tasks. It is, however,interesting that foundation models have not been fully explored for universaldomain adaptation (UniDA), which is to learn models using labeled data in asource domain and unlabeled data in a target one, such that the learned modelscan successfully adapt to the target data. In this paper, we make comprehensiveempirical studies of state-of-the-art UniDA methods using foundation models. Wefirst demonstrate that, while foundation models greatly improve the performanceof the baseline methods that train the models on the source data alone,existing UniDA methods generally fail to improve over the baseline. Thissuggests that new research efforts are very necessary for UniDA usingfoundation models. To this end, we propose a very simple method of target datadistillation on the CLIP model, and achieves consistent improvement over thebaseline across all the UniDA benchmarks. Our studies are under a newlyproposed evaluation metric of universal classification rate (UCR), which isthreshold- and ratio-free and addresses the threshold-sensitive issueencountered when using the existing H-score metric.", "output": "Universal Domain Adaptation from Foundation Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Speech-driven gesture generation is highly challenging due to the randomjitters of human motion. In addition, there is an inherent asynchronousrelationship between human speech and gestures. To tackle these challenges, weintroduce a novel quantization-based and phase-guided motion-matchingframework. Specifically, we first present a gesture VQ-VAE module to learn acodebook to summarize meaningful gesture units. With each code representing aunique gesture, random jittering problems are alleviated effectively. We thenuse Levenshtein distance to align diverse gestures with different speech.Levenshtein distance based on audio quantization as a similarity metric ofcorresponding speech of gestures helps match more appropriate gestures withspeech, and solves the alignment problem of speech and gestures well. Moreover,we introduce phase to guide the optimal gesture matching based on the semanticsof context or rhythm of audio. Phase guides when text-based or speech-basedgestures should be performed to make the generated gestures more natural.Extensive experiments show that our method outperforms recent approaches onspeech-driven gesture generation. Our code, database, pre-trained models, anddemos are available at ", "output": "QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unifying semi-supervised learning (SSL) and open-set recognition into asingle learning policy would facilitate the development of cost-efficient andapplication-grade classifiers. However, previous attempts do not clarify thedifference between unobserved novel categories (those only seen during testing)and observed novel categories (those present in unlabelled training data). Thisstudy introduces Open-Set Learning with Augmented Category by ExploitingUnlabelled Data (Open-LACU), the first policy that generalises between bothnovel category types. We adapt the state-of-the-art OSR method of MarginGenerative Adversarial Networks (Margin-GANs) into several Open-LACUconfigurations, setting the benchmarks for Open-LACU and offering uniqueinsights into novelty detection using Margin-GANs. Finally, we highlight thesignificance of the Open-LACU policy by discussing the applications of semanticsegmentation in remote sensing, object detection in radiology and diseaseidentification through cough analysis. These applications include observed andunobserved novel categories, making Open-LACU essential for trainingclassifiers in these big data domains.", "output": "Unified machine learning: Open-set learning with augmented category by exploiting unlabelled data (Open-LACU)."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes the Parallel Residual Bi-Fusion Feature Pyramid Network(PRB-FPN) for fast and accurate single-shot object detection. Feature Pyramid(FP) is widely used in recent visual detection, however the top-down pathway ofFP cannot preserve accurate localization due to pooling shifting. The advantageof FP is weakened as deeper backbones with more layers are used. In addition,it cannot keep up accurate detection of both small and large objects at thesame time. To address these issues, we propose a new parallel FP structure withbi-directional (top-down and bottom-up) fusion and associated improvements toretain high-quality features for accurate localization. We provide thefollowing design improvements: (1) A parallel bifusion FP structure with abottom-up fusion module (BFM) to detect both small and large objects at oncewith high accuracy. (2) A concatenation and re-organization (CORE) moduleprovides a bottom-up pathway for feature fusion, which leads to thebi-directional fusion FP that can recover lost information from lower-layerfeature maps. (3) The CORE feature is further purified to retain richercontextual information. Such CORE purification in both top-down and bottom-uppathways can be finished in only a few iterations. (4) The adding of a residualdesign to CORE leads to a new Re-CORE module that enables easy training andintegration with a wide range of deeper or lighter backbones. The proposednetwork achieves state-of-the-art performance on the UAVDT17 and MS COCOdatasets. Code is available at ", "output": "Parallel Residual Bi-Fusion Feature Pyramid Network for Accurate Single-Shot Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is often the case that data are with multiple views in real-worldapplications. Fully exploring the information of each view is significant formaking data more representative. However, due to various limitations andfailures in data collection and pre-processing, it is inevitable for real datato suffer from view missing and data scarcity. The coexistence of these twoissues makes it more challenging to achieve the pattern classification task.Currently, to our best knowledge, few appropriate methods can well-handle thesetwo issues simultaneously. Aiming to draw more attention from the community tothis challenge, we propose a new task in this paper, called few-shot partialmulti-view learning, which focuses on overcoming the negative impact of theview-missing issue in the low-data regime. The challenges of this task aretwofold: (i) it is difficult to overcome the impact of data scarcity under theinterference of missing views; (ii) the limited number of data exacerbatesinformation scarcity, thus making it harder to address the view-missing issuein turn. To address these challenges, we propose a new unified Gaussiandense-anchoring method. The unified dense anchors are learned for the limitedpartial multi-view data, thereby anchoring them into a unified denserepresentation space where the influence of data scarcity and view missing canbe alleviated. We conduct extensive experiments to evaluate our method. Theresults on Cub-googlenet-doc2vec, Handwritten, Caltech102, Scene15, Animal,ORL, tieredImagenet, and Birds-200-2011 datasets validate its effectiveness.", "output": "Few-shot Partial Multi-view Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Face manipulation technology is advancing very rapidly, and new methods arebeing proposed day by day. The aim of this work is to propose a deepfakedetector that can cope with the wide variety of manipulation methods andscenarios encountered in the real world. Our key insight is that each personhas specific characteristics that a synthetic generator likely cannotreproduce. Accordingly, we extract audio-visual features which characterize theidentity of a person, and use them to create a person-of-interest (POI)deepfake detector. We leverage a contrastive learning paradigm to learn themoving-face and audio segment embeddings that are most discriminative for eachidentity. As a result, when the video and/or audio of a person is manipulated,its representation in the embedding space becomes inconsistent with the realidentity, allowing reliable detection. Training is carried out exclusively onreal talking-face video; thus, the detector does not depend on any specificmanipulation method and yields the highest generalization ability. In addition,our method can detect both single-modality (audio-only, video-only) andmulti-modality (audio-video) attacks, and is robust to low-quality or corruptedvideos. Experiments on a wide variety of datasets confirm that our methodensures a SOTA performance, especially on low quality videos. Code is publiclyavailable on-line at ", "output": "Audio-Visual Person-of-Interest DeepFake Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A growing body of work has shown that deep neural networks are susceptible toadversarial examples. These take the form of small perturbations applied to themodel's input which lead to incorrect predictions. Unfortunately, mostliterature focuses on visually imperceivable perturbations to be applied todigital images that often are, by design, impossible to be deployed to physicaltargets. We present Adversarial Scratches: a novel L0 black-box attack, whichtakes the form of scratches in images, and which possesses much greaterdeployability than other state-of-the-art attacks. Adversarial Scratchesleverage B'ezier Curves to reduce the dimension of the search space andpossibly constrain the attack to a specific location. We test AdversarialScratches in several scenarios, including a publicly available API and imagesof traffic signs. Results show that, often, our attack achieves higher foolingrate than other deployable state-of-the-art methods, while requiringsignificantly fewer queries and modifying very few pixels.", "output": "Adversarial Scratches: Deployable Attacks to CNN Classifiers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformer-based models have delivered impressive results on many tasks,particularly vision and language tasks. In many model training situations,conventional configurations are typically adopted. For example, we often setthe base model with hidden dimensions (i.e. model width) to be 768 and thenumber of transformer layers (i.e. model depth) to be 12. In this paper, werevisit these conventional configurations. Through theoretical analysis andexperimental evaluation, we show that the masked autoencoder is effective inalleviating the over-smoothing issue in deep transformer training. Based onthis finding, we propose Bamboo, an idea of using deeper and narrowertransformer configurations, for masked autoencoder training. On ImageNet, withsuch a simple change in configuration, re-designed model achieves 87.1% top-1accuracy and outperforms SoTA models like MAE and BEiT. On language tasks,re-designed model outperforms BERT with default setting by 1.1 points onaverage, on GLUE datasets.", "output": "A Study on Transformer Configuration and Training Objective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In a clinical setting it is essential that deployed image processing systemsare robust to the full range of inputs they might encounter and, in particular,do not make confidently wrong predictions. The most popular approach to safeprocessing is to train networks that can provide a measure of theiruncertainty, but these tend to fail for inputs that are far outside thetraining data distribution. Recently, generative modelling approaches have beenproposed as an alternative; these can quantify the likelihood of a data sampleexplicitly, filtering out any out-of-distribution (OOD) samples before furtherprocessing is performed. In this work, we focus on image segmentation andevaluate several approaches to network uncertainty in the far-OOD and near-OODcases for the task of segmenting haemorrhages in head CTs. We find all of theseapproaches are unsuitable for safe segmentation as they provide confidentlywrong predictions when operating OOD. We propose performing full 3D OODdetection using a VQ-GAN to provide a compressed latent representation of theimage and a transformer to estimate the data likelihood. Our approachsuccessfully identifies images in both the far- and near-OOD cases. We find astrong relationship between image likelihood and the quality of a model'ssegmentation, making this approach viable for filtering images unsuitable forsegmentation. To our knowledge, this is the first time transformers have beenapplied to perform OOD detection on 3D image data. Code is available atgithub.com/marksgraham/transformer-ood.", "output": "Transformer-based out-of-distribution detection for clinically safe segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we address the problem of 4D facial expressions generation.This is usually addressed by animating a neutral 3D face to reach an expressionpeak, and then get back to the neutral state. In the real world though, peopleshow more complex expressions, and switch from one expression to another. Wethus propose a new model that generates transitions between differentexpressions, and synthesizes long and composed 4D expressions. This involvesthree sub-problems: (i) modeling the temporal dynamics of expressions, (ii)learning transitions between them, and (iii) deforming a generic mesh. Wepropose to encode the temporal evolution of expressions using the motion of aset of 3D landmarks, that we learn to generate by training a manifold-valuedGAN (Motion3DGAN). To allow the generation of composed expressions, this modelaccepts two labels encoding the starting and the ending expressions. The finalsequence of meshes is generated by a Sparse2Dense mesh Decoder (S2D-Dec) thatmaps the landmark displacements to a dense, per-vertex displacement of a knownmesh topology. By explicitly working with motion trajectories, the model istotally independent from the identity. Extensive experiments on five publicdatasets show that our proposed approach brings significant improvements withrespect to previous solutions, while retaining good generalization to unseendata.", "output": "Generating Multiple 4D Expression Transitions by Learning Face Landmark Trajectories."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Changes to hyperparameters can have a dramatic effect on model accuracy.Thus, the tuning of hyperparameters plays an important role in optimizingmachine-learning models. An integral part of the hyperparameter-tuning processis the evaluation of model checkpoints, which is done through the use of\"validators\". In a supervised setting, these validators evaluate checkpoints bycomputing accuracy on a validation set that has labels. In contrast, in anunsupervised setting, the validation set has no such labels. Without anylabels, it is impossible to compute accuracy, so validators must estimateaccuracy instead. But what is the best approach to estimating accuracy? In thispaper, we consider this question in the context of unsupervised domainadaptation (UDA). Specifically, we propose three new validators, and we compareand rank them against five other existing validators, on a large dataset of1,000,000 checkpoints. Extensive experimental results show that two of ourproposed validators achieve state-of-the-art performance in various settings.Finally, we find that in many cases, the state-of-the-art is obtained by asimple baseline method. To the best of our knowledge, this is the largestempirical study of UDA validators to date. Code is available at", "output": "Three New Validators and a Large-Scale Benchmark Ranking for Unsupervised Domain Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cross-view geolocalization, a supplement or replacement for GPS, localizes anagent within a search area by matching images taken from a ground-view camerato overhead images taken from satellites or aircraft. Although the viewpointdisparity between ground and overhead images makes cross-view geolocalizationchallenging, significant progress has been made assuming that the ground agenthas access to a panoramic camera. For example, our prior work (WAG) introducedchanges in search area discretization, training loss, and particle filterweighting that enabled city-scale panoramic cross-view geolocalization.However, panoramic cameras are not widely used in existing robotic platformsdue to their complexity and cost. Non-panoramic cross-view geolocalization ismore applicable for robotics, but is also more challenging. This paper presentsRestricted FOV Wide-Area Geolocalization (ReWAG), a cross-view geolocalizationapproach that generalizes WAG for use with standard, non-panoramic groundcameras by creating pose-aware embeddings and providing a strategy toincorporate particle pose into the Siamese network. ReWAG is a neural networkand particle filter system that is able to globally localize a mobile agent ina GPS-denied environment with only odometry and a 90 degree FOV camera,achieving similar localization accuracy as what WAG achieved with a panoramiccamera and improving localization accuracy by a factor of 100 compared to abaseline vision transformer (ViT) approach. A video highlight that demonstratesReWAG's convergence on a test path of several dozen kilometers is available at", "output": "Wide-Area Geolocalization with a Limited Field of View Camera."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning-enabled medical imaging analysis has become a vital part ofthe current automatic diagnosis system. However, machine learning models havebeen shown to demonstrate a systematic bias towards certain subgroups ofpeople, e.g., giving a worse predictive performance to old females. It isharmful and dangerous in such a sensitive area and therefore researchers havebeen working on developing bias mitigation algorithms to address the fairnessissue in the general machine learning field. However, given the specificcharacteristics of medical imaging, fairness in medical image analysis (MedIA)requires additional efforts. Hence, in this survey, we give a comprehensivereview of the current progress of fairness study and that in MedIA.Specifically, we first discuss the definitions of fairness and analyze thesource of bias in medical imaging. Then, we discuss current research onfairness for MedIA and present a collection of public medical imaging datasetsthat can be used for evaluating fairness in MedIA. Furthermore, we conductextensive experiments to evaluate the fairness of several different tasks formedical imaging, including classification, object detection, and landmarkdetection. Finally, we discuss the challenges and potential future directionsin developing fair MedIA.", "output": "Progress and Prospects for Fairness in Healthcare and Medical Image Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Chest X-rays (CXRs) are a widely used imaging modality for the diagnosis andprognosis of lung disease. The image analysis tasks vary. Examples includepathology detection and lung segmentation. There is a large body of work wheremachine learning algorithms are developed for specific tasks. A significantrecent example is Coronavirus disease (covid-19) detection using CXR data.However, the traditional diagnostic tool design methods based on supervisedlearning are burdened by the need to provide training data annotation, whichshould be of good quality for better clinical outcomes. Here, we propose analternative solution, a new self-supervised paradigm, where a generalrepresentation from CXRs is learned using a group-masked self-supervisedframework. The pre-trained model is then fine-tuned for domain-specific taskssuch as covid-19, pneumonia detection, and general health screening. We showthat the same pre-training can be used for the lung segmentation task. Ourproposed paradigm shows robust performance in multiple downstream tasks whichdemonstrates the success of the pre-training. Moreover, the performance of thepre-trained models on data with significant drift during test time proves thelearning of a better generic representation. The methods are further validatedby covid-19 detection in a unique small-scale pediatric data set. Theperformance gain in accuracy (~25%) is significant when compared to asupervised transformer-based method. This adds credence to the strength andreliability of our proposed framework and pre-training strategy.", "output": "SPCXR: Self-supervised Pretraining using Chest X-rays Towards a Domain Specific Foundation Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the approximation of shift-invariant or equivariant functions bydeep fully convolutional networks from the dynamical systems perspective. Weprove that deep residual fully convolutional networks and theircontinuous-layer counterpart can achieve universal approximation of thesesymmetric functions at constant channel width. Moreover, we show that the samecan be achieved by non-residual variants with at least 2 channels in each layerand convolutional kernel size of at least 2. In addition, we show that theserequirements are necessary, in the sense that networks with fewer channels orsmaller kernels fail to be universal approximators.", "output": "On the Universal Approximation Property of Deep Fully Convolutional Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-task learning (MTL) models have demonstrated impressive results incomputer vision, natural language processing, and recommender systems. Eventhough many approaches have been proposed, how well these approaches balancedifferent tasks on each parameter still remains unclear. In this paper, wepropose to measure the task dominance degree of a parameter by the totalupdates of each task on this parameter. Specifically, we compute the totalupdates by the exponentially decaying Average of the squared Updates (AU) on aparameter from the corresponding task.Based on this novel metric, we observethat many parameters in existing MTL methods, especially those in the highershared layers, are still dominated by one or several tasks. The dominance of AUis mainly due to the dominance of accumulative gradients from one or severaltasks. Motivated by this, we propose a Task-wise Adaptive learning rateapproach, AdaTask in short, to separate the emph{accumulative gradients} andhence the learning rate of each task for each parameter in adaptive learningrate approaches (e.g., AdaGrad, RMSProp, and Adam). Comprehensive experimentson computer vision and recommender system MTL datasets demonstrate that AdaTasksignificantly improves the performance of dominated tasks, resulting SOTAaverage task-wise performance. Analysis on both synthetic and real-worlddatasets shows AdaTask balance parameters in every shared layer well.", "output": "AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Properly-calibrated sensors are the prerequisite for a dependable autonomousdriving system. However, most prior methods focus on extrinsic calibrationbetween sensors, and few focus on the misalignment between the sensors and thevehicle coordinate system. Existing targetless approaches rely on specificprior knowledge, such as driving routes and road features, to handle thismisalignment. This work removes these limitations and proposes more generalcalibration methods for four commonly used sensors: Camera, LiDAR, GNSS/INS,and millimeter-wave Radar. By utilizing sensor-specific patterns: imagefeature, 3D LiDAR points, GNSS/INS solved pose, and radar speed, we design fourcorresponding methods to mainly calibrate the rotation from sensor to carduring normal driving within minutes, composing a toolbox named SensorX2car.Real-world and simulated experiments demonstrate the practicality of ourproposed methods. Meanwhile, the related codes have been open-sourced tobenefit the community. To the best of our knowledge, SensorX2car is the firstopen-source sensor-to-car calibration toolbox. The code is available at", "output": "SensorX2car: Sensors-to-car calibration for autonomous driving in road scenarios."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Plant phenotyping is typically a time-consuming and expensive endeavor,requiring large groups of researchers to meticulously measure biologicallyrelevant plant traits, and is the main bottleneck in understanding plantadaptation and the genetic architecture underlying complex traits at populationscale. In this work, we address these challenges by leveraging few-shotlearning with convolutional neural networks (CNNs) to segment the leaf body andvisible venation of 2,906 P. trichocarpa leaf images obtained in the field. Incontrast to previous methods, our approach (i) does not require experimental orimage pre-processing, (ii) uses the raw RGB images at full resolution, and(iii) requires very few samples for training (e.g., just eight images for veinsegmentation). Traits relating to leaf morphology and vein topology areextracted from the resulting segmentations using traditional open-sourceimage-processing tools, validated using real-world physical measurements, andused to conduct a genome-wide association study to identify genes controllingthe traits. In this way, the current work is designed to provide the plantphenotyping community with (i) methods for fast and accurate image-basedfeature extraction that require minimal training data, and (ii) a newpopulation-scale data set, including 68 different leaf phenotypes, for domainscientists and machine learning researchers. All of the few-shot learning code,data, and results are made publicly available.", "output": "Few-Shot Learning Enables Population-Scale Analysis of Leaf Traits in Populus trichocarpa."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Noisy partial label learning (noisy PLL) is an important branch of weaklysupervised learning. Unlike PLL where the ground-truth label must conceal inthe candidate label set, noisy PLL relaxes this constraint and allows theground-truth label may not be in the candidate label set. To address thischallenging problem, most of the existing works attempt to detect noisy samplesand estimate the ground-truth label for each noisy sample. However, detectionerrors are unavoidable. These errors can accumulate during training andcontinuously affect model optimization. To this end, we propose a novelframework for noisy PLL with theoretical guarantees, called ``Adjusting LabelImportance Mechanism (ALIM)''. It aims to reduce the negative impact ofdetection errors by trading off the initial candidate set and model outputs.ALIM is a plug-in strategy that can be integrated with existing PLL approaches.Experimental results on benchmark datasets demonstrate that our method canachieve state-of-the-art performance on noisy PLL.textcolor[rgb]{0.93,0.0,0.47}{Our code can be found in SupplementaryMaterial}.", "output": "ALIM: Adjusting Label Importance Mechanism for Noisy Partial Label Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "3D reconstruction plays an increasingly important role in modernphotogrammetric systems. Conventional satellite or aerial-based remote sensing(RS) platforms can provide the necessary data sources for the 3D reconstructionof large-scale landforms and cities. Even with low-altitude UAVs (UnmannedAerial Vehicles), 3D reconstruction in complicated situations, such as urbancanyons and indoor scenes, is challenging due to frequent tracking failuresbetween camera frames and high data collection costs. Recently, sphericalimages have been extensively used due to the capability of recordingsurrounding environments from one camera exposure. In contrast to perspectiveimages with limited FOV (Field of View), spherical images can cover the wholescene with full horizontal and vertical FOV and facilitate camera tracking anddata acquisition in these complex scenes. With the rapid evolution andextensive use of professional and consumer-grade spherical cameras, sphericalimages show great potential for the 3D modeling of urban and indoor scenes.Classical 3D reconstruction pipelines, however, cannot be directly used forspherical images. Besides, there exist few software packages that are designedfor the 3D reconstruction of spherical images. As a result, this researchprovides a thorough survey of the state-of-the-art for 3D reconstruction ofspherical images in terms of data acquisition, feature detection and matching,image orientation, and dense matching as well as presenting promisingapplications and discussing potential prospects. We anticipate that this studyoffers insightful clues to direct future research.", "output": "3D reconstruction from spherical images: A review of techniques, applications, and prospects."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Tensor completion is important to many areas such as computer vision, dataanalysis, and signal processing. Enforcing low-rank structures on completedtensors, a category of methods known as low-rank tensor completion, hasrecently been studied extensively. Whilst such methods attained great success,none considered exploiting numerical priors of tensor elements. Ignoringnumerical priors causes loss of important information regarding the data, andtherefore prevents the algorithms from reaching optimal accuracy. This workattempts to construct a new methodological framework called GCDTC (GeneralizedCP Decomposition Tensor Completion) for leveraging numerical priors andachieving higher accuracy in tensor completion. In this newly introducedframework, a generalized form of CP Decomposition is applied to low-rank tensorcompletion. This paper also proposes an algorithm known as SPTC (Smooth PoissonTensor Completion) for nonnegative integer tensor completion as aninstantiation of the GCDTC framework. A series of experiments on real-worlddata indicate that SPTC could produce results superior in completion accuracyto current state-of-the-art methods. Related code is available in thesupplemental materials.", "output": "Exploring Numerical Priors for Low-Rank Tensor Completion with Generalized CP Decomposition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Event cameras are neuromorphic vision sensors representing visual informationas sparse and asynchronous event streams. Most state-of-the-art event-basedmethods project events into dense frames and process them with conventionallearning models. However, these approaches sacrifice the sparsity and hightemporal resolution of event data, resulting in a large model size and highcomputational complexity. To fit the sparse nature of events and sufficientlyexplore the relationship between them, we develop a novel attention-aware modelnamed Event Voxel Set Transformer (EVSTr) for spatiotemporal representationlearning on event streams. It first converts the event stream into voxel setsand then hierarchically aggregates voxel features to obtain robustrepresentations. The core of EVSTr is an event voxel transformer encoder toextract discriminative spatiotemporal features, which consists of twowell-designed components, including a Multi-Scale Neighbor Embedding Layer(MNEL) for local information aggregation and a Voxel Self-Attention Layer(VSAL) for global feature interactions. Enabling the network to incorporate along-range temporal structure, we introduce a segment modeling strategy tolearn motion patterns from a sequence of segmented voxel sets. We evaluate theproposed model on two event-based recognition tasks: object classification andaction recognition. Comprehensive experiments show that EVSTr achievesstate-of-the-art performance while maintaining low model complexity.Additionally, we present a new dataset (NeuroHAR) recorded in challengingvisual scenarios to complement the lack of real-world event-based datasets foraction recognition.", "output": "Event Voxel Set Transformer for Spatiotemporal Representation Learning on Event Streams."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Computer graphics images (CGIs) are artificially generated by means ofcomputer programs and are widely perceived under various scenarios, such asgames, streaming media, etc. In practice, the quality of CGIs consistentlysuffers from poor rendering during production, inevitable compression artifactsduring the transmission of multimedia applications, and low aesthetic qualityresulting from poor composition and design. However, few works have beendedicated to dealing with the challenge of computer graphics image qualityassessment (CGIQA). Most image quality assessment (IQA) metrics are developedfor natural scene images (NSIs) and validated on databases consisting of NSIswith synthetic distortions, which are not suitable for in-the-wild CGIs. Tobridge the gap between evaluating the quality of NSIs and CGIs, we construct alarge-scale in-the-wild CGIQA database consisting of 6,000 CGIs (CGIQA-6k) andcarry out the subjective experiment in a well-controlled laboratory environmentto obtain the accurate perceptual ratings of the CGIs. Then, we propose aneffective deep learning-based no-reference (NR) IQA model by utilizing bothdistortion and aesthetic quality representation. Experimental results show thatthe proposed method outperforms all other state-of-the-art NR IQA methods onthe constructed CGIQA-6k database and other CGIQA-related databases. Thedatabase will be released to facilitate further research.", "output": "Subjective and Objective Quality Assessment for in-the-Wild Computer Graphics Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Our paper seeks to transfer the hairstyle of a reference image to an inputphoto for virtual hair try-on. We target a variety of challenges scenarios,such as transforming a long hairstyle with bangs to a pixie cut, which requiresremoving the existing hair and inferring how the forehead would look, ortransferring partially visible hair from a hat-wearing person in a differentpose. Past solutions leverage StyleGAN for hallucinating any missing parts andproducing a seamless face-hair composite through so-called GAN inversion orprojection. However, there remains a challenge in controlling thehallucinations to accurately transfer hairstyle and preserve the face shape andidentity of the input. To overcome this, we propose a multi-view optimizationframework that uses \"two different views\" of reference composites tosemantically guide occluded or ambiguous regions. Our optimization sharesinformation between two poses, which allows us to produce high fidelity andrealistic results from incomplete references. Our framework produceshigh-quality results and outperforms prior work in a user study that consistsof significantly more challenging hair transfer scenarios than previouslystudied. Project page: ", "output": "StyleGAN Salon: Multi-View Latent Optimization for Pose-Invariant Hairstyle Transfer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the rapidly increasing demand for oriented object detection e.g. inautonomous driving and remote sensing, the recently proposed paradigm involvingweakly-supervised detector H2RBox for learning rotated box (RBox) from the(currently) more readily-available horizontal box (HBox) has shown promise.This paper presents H2RBox-v2, to further bridge the gap betweenHBox-supervised and RBox-supervised oriented object detection. Specifically, wepropose to leverage the reflection symmetry via flip and rotate consistencies,using a weakly-supervised network branch similar to H2RBox, together with anovel self-supervised branch that learns orientations from the symmetryinherent in visual objects. The detector is further stabilized and enhanced bypractical techniques to cope with peripheral issues e.g. angular periodicity.To our best knowledge, H2RBox-v2 is the first symmetry-aware self-supervisedparadigm for oriented object detection. In particular, our method shows lesssusceptibility to low-quality annotation and insufficient training datacompared to H2RBox. Specifically, H2RBox-v2 achieves very close performance toa rotation annotation trained counterpart -- Rotated FCOS: 1)DOTA-v1.0/1.5/2.0: 72.31%/64.76%/50.33% vs. 72.44%/64.53%/51.77%; 2) HRSC:89.66% vs. 88.99%; 3) FAIR1M: 42.27% vs. 41.25%.", "output": "H2RBox-v2: Incorporating Symmetry for Boosting Horizontal Box Supervised Oriented Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents our 3rd place solution in both Descriptor Track andMatching Track of the Meta AI Video Similarity Challenge (VSC2022), acompetition aimed at detecting video copies. Our approach builds upon existingimage copy detection techniques and incorporates several strategies to exploiton the properties of video data, resulting in a simple yet powerful solution.By employing our proposed method, we achieved substantial improvements inaccuracy compared to the baseline results (Descriptor Track: 38% improvement,Matching Track: 60% improvement). Our code is publicly available here:", "output": "3rd Place Solution to Meta AI Video Similarity Challenge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Segment Anything Model (SAM) is the first foundation model for generalimage segmentation. It designed a novel promotable segmentation task, ensuringzero-shot image segmentation using the pre-trained model via two main modesincluding automatic everything and manual prompt. SAM has achieved impressiveresults on various natural image segmentation tasks. However, medical imagesegmentation (MIS) is more challenging due to the complex modalities, fineanatomical structures, uncertain and complex object boundaries, and wide-rangeobject scales. Meanwhile, zero-shot and efficient MIS can well reduce theannotation time and boost the development of medical image analysis. Hence, SAMseems to be a potential tool and its performance on large medical datasetsshould be further validated. We collected and sorted 52 open-source datasets,and built a large medical segmentation dataset with 16 modalities, 68 objects,and 553K slices. We conducted a comprehensive analysis of different SAM testingstrategies on the so-called COSMOS 553K dataset. Extensive experiments validatethat SAM performs better with manual hints like points and boxes for objectperception in medical images, leading to better performance in prompt modecompared to everything mode. Additionally, SAM shows remarkable performance insome specific objects and modalities, but is imperfect or even totally fails inother situations. Finally, we analyze the influence of different factors (e.g.,the Fourier-based boundary complexity and size of the segmented objects) onSAM's segmentation performance. Extensive experiments validate that SAM'szero-shot segmentation capability is not sufficient to ensure its directapplication to the MIS.", "output": "Segment Anything Model for Medical Images?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we introduce a framework ARBEx, a novel attentive featureextraction framework driven by Vision Transformer with reliability balancing tocope against poor class distributions, bias, and uncertainty in the facialexpression learning (FEL) task. We reinforce several data pre-processing andrefinement methods along with a window-based cross-attention ViT to squeeze thebest of the data. We also employ learnable anchor points in the embedding spacewith label distributions and multi-head self-attention mechanism to optimizeperformance against weak predictions with reliability balancing, which is astrategy that leverages anchor points, attention scores, and confidence valuesto enhance the resilience of label predictions. To ensure correct labelclassification and improve the models' discriminative power, we introduceanchor loss, which encourages large margins between anchor points.Additionally, the multi-head self-attention mechanism, which is also trainable,plays an integral role in identifying accurate labels. This approach providescritical elements for improving the reliability of predictions and has asubstantial positive effect on final prediction capabilities. Our adaptivemodel can be integrated with any deep neural network to forestall challenges invarious recognition tasks. Our strategy outperforms current state-of-the-artmethodologies, according to extensive experiments conducted in a variety ofcontexts.", "output": "ARBEx: Attentive Feature Extraction with Reliability Balancing for Robust Facial Expression Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present CLUSTSEG, a general, transformer-based framework that tacklesdifferent image segmentation tasks (i.e., superpixel, semantic, instance, andpanoptic) through a unified neural clustering scheme. Regarding queries ascluster centers, CLUSTSEG is innovative in two aspects:1) cluster centers areinitialized in heterogeneous ways so as to pointedly address task-specificdemands (e.g., instance- or category-level distinctiveness), yet withoutmodifying the architecture; and 2) pixel-cluster assignment, formalized in across-attention fashion, is alternated with cluster center update, yet withoutlearning additional parameters. These innovations closely link CLUSTSEG to EMclustering and make it a transparent and powerful framework that yieldssuperior results across the above segmentation tasks.", "output": "CLUSTSEG: Clustering for Universal Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Trained on a vast amount of data, Large Language models (LLMs) have achievedunprecedented success and generalization in modeling fairly complex textualinputs in the abstract space, making them powerful tools for zero-shotlearning. Such capability is extended to other modalities such as the visualdomain using cross-modal foundation models such as CLIP, and as a result,semantically meaningful representation are extractable from visual inputs.In this work, we leverage this capability and propose an approach that canprovide semantic insights into a model's patterns of failures and biases. Givena black box model, its training data, and task definition, we first calculateits task-related loss for each data point. We then extract a semanticallymeaningful representation for each training data point (such as CLIP embeddingsfrom its visual encoder) and train a lightweight diagnosis model which mapsthis semantically meaningful representation of a data point to its task loss.We show that an ensemble of such lightweight models can be used to generateinsights on the performance of the black-box model, in terms of identifying itspatterns of failures and biases.", "output": "LLM2Loss: Leveraging Language Models for Explainable Model Diagnostics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Subject-driven text-to-image generation aims to generate customized images ofthe given subject based on the text descriptions, which has drawn increasingattention recently. Existing methods mainly resort to finetuning a pretrainedgenerative model, where the identity-relevant information and theidentity-irrelevant information are entangled in the latent embedding space.However, the highly entangled latent embedding may lead to the failure ofsubject-driven text-to-image generation as follows: (i) the identity-irrelevantinformation hidden in the entangled embedding may dominate the generationprocess, resulting in the generated images heavily dependent on the irrelevantinformation while ignoring the given text descriptions; (ii) theidentity-relevant information carried in the entangled embedding can not beappropriately preserved, resulting in identity change of the subject in thegenerated images. To tackle the problems, we propose DisenBooth, anidentity-preserving disentangled tuning framework for subject-driventext-to-image generation in this paper. Specifically, DisenBooth finetunes thepretrained diffusion model in the denoising process. Different from previousworks that utilize an entangled embedding to denoise each image, DisenBoothinstead utilizes disentangled embeddings to respectively preserve the subjectidentity and capture the identity-irrelevant information. We further design thenovel weak denoising and contrastive embedding auxiliary tuning objectives toachieve the disentanglement. Extensive experiments show that our proposedDisenBooth framework outperforms baseline models for subject-driventext-to-image generation with the identity-preserved embedding. Additionally,by combining the identity-preserved embedding and identity-irrelevantembedding, DisenBooth demonstrates more generation flexibility andcontrollability.", "output": "DisenBooth: Identity-Preserving Disentangled Tuning for Subject-Driven Text-to-Image Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training a Large Visual Language Model (LVLM) from scratch, like GPT-4, isresource-intensive. Our paper presents a play-and-plug module for LargeLanguage Models (LLMs), namely Interactive Perception Network (IPN), aiming toachieve a LVLM by incorporating the image understanding capability into LLMs.Previous methods incorporate visual information into LLMs with a simple visualmapping network, where the image feature is projected into the embedding spaceof LLMs via a linear layer. Such mapping network projects the image featureonce yet does not consider the interaction between the image and the humaninput query. Hence, the obtained visual information with no connections withhuman intention may be inadequate for LLMs to make intention-followingresponses, which we term as static visual information. IPN addresses this issueby allowing the LLM to request the desired visual information aligned withvarious human instructions, which we term as the dynamic interaction betweenthe LLM and visual information. Specifically, IPN consists of a simple visualmapping network to provide the basic perception of an image for LLMs. It alsocontains additional modules responsible for acquiring requests from LLMs,performing request-based visual information interaction, and transmitting theresulting interacted visual information to LLMs, respectively. In this way,LLMs act to understand the human query, deliver the corresponding request tothe request-based visual information interaction module, and generate theresponse based on the interleaved multimodal information. We evaluate IPNthrough extensive experiments on multimodal question answering, reasoning, andso on, demonstrating that it significantly improves the zero-shot performanceof LVLMs on various multimodal tasks compared to previous methods.", "output": "LMEye: An Interactive Perception Network for Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Zero-shot sketch-based image retrieval (ZS-SBIR) is challenging due to thecross-domain nature of sketches and photos, as well as the semantic gap betweenseen and unseen image distributions. Previous methods fine-tune pre-trainedmodels with various side information and learning strategies to learn a compactfeature space that is shared between the sketch and photo domains and bridgesseen and unseen classes. However, these efforts are inadequate in adaptingdomains and transferring knowledge from seen to unseen classes. In this paper,we present an effective ``Adapt and Align'' approach to address the keychallenges. Specifically, we insert simple and lightweight domain adapters tolearn new abstract concepts of the sketch domain and improve cross-domainrepresentation capabilities. Inspired by recent advances in image-textfoundation models (e.g., CLIP) on zero-shot scenarios, we explicitly align thelearned image embedding with a more semantic text embedding to achieve thedesired knowledge transfer from seen to unseen classes. Extensive experimentson three benchmark datasets and two popular backbones demonstrate thesuperiority of our method in terms of retrieval accuracy and flexibility.", "output": "Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This research paper describes a realtime system for identifying American SignLanguage (ASL) movements that employs modern computer vision and machinelearning approaches. The suggested method makes use of the Mediapipe libraryfor feature extraction and a Convolutional Neural Network (CNN) for ASL gestureclassification. The testing results show that the suggested system can detectall ASL alphabets with an accuracy of 99.95%, indicating its potential for usein communication devices for people with hearing impairments. The proposedapproach can also be applied to additional sign languages with similar handmotions, potentially increasing the quality of life for people with hearingloss. Overall, the study demonstrates the effectiveness of using Mediapipe andCNN for real-time sign language recognition, making a significant contributionto the field of computer vision and machine learning.", "output": "Mediapipe and CNNs for Real-Time ASL Gesture Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, there has been rapid development in learned imagecompression techniques that prioritize ratedistortion-perceptual compression,preserving fine details even at lower bit-rates. However, currentlearning-based image compression methods often sacrifice human-friendlycompression and require long decoding times. In this paper, we proposeenhancements to the backbone network and loss function of existing imagecompression model, focusing on improving human perception and efficiency. Ourproposed approach achieves competitive subjective results compared tostate-of-the-art end-to-end learned image compression methods and classicmethods, while requiring less decoding time and offering human-friendlycompression. Through empirical evaluation, we demonstrate the effectiveness ofour proposed method in achieving outstanding performance, with more than 25%bit-rate saving at the same subjective quality.", "output": "HFLIC: Human Friendly Perceptual Learned Image Compression with Reinforced Transform."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural Architecture Search (NAS) aims to automatically excavate the optimalnetwork architecture with superior test performance. Recent neural architecturesearch (NAS) approaches rely on validation loss or accuracy to find thesuperior network for the target data. In this paper, we investigate a newneural architecture search measure for excavating architectures with bettergeneralization. We demonstrate that the flatness of the loss surface can be apromising proxy for predicting the generalization capability of neural networkarchitectures. We evaluate our proposed method on various search spaces,showing similar or even better performance compared to the state-of-the-art NASmethods. Notably, the resultant architecture found by flatness measuregeneralizes robustly to various shifts in data distribution (e.g.ImageNet-V2,-A,-O), as well as various tasks such as object detection andsemantic segmentation. Code is available at ", "output": "GeNAS: Neural Architecture Search with Better Generalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Online surgical phase recognition plays a significant role towards buildingcontextual tools that could quantify performance and oversee the execution ofsurgical workflows. Current approaches are limited since they train spatialfeature extractors using frame-level supervision that could lead to incorrectpredictions due to similar frames appearing at different phases, and poorlyfuse local and global features due to computational constraints which canaffect the analysis of long videos commonly encountered in surgicalinterventions. In this paper, we present a two-stage method, called Long VideoTransformer (LoViT) for fusing short- and long-term temporal information thatcombines a temporally-rich spatial feature extractor and a multi-scale temporalaggregator consisting of two cascaded L-Trans modules based on self-attention,followed by a G-Informer module based on ProbSparse self-attention forprocessing global temporal information. The multi-scale temporal head thencombines local and global features and classifies surgical phases using phasetransition-aware supervision. Our approach outperforms state-of-the-art methodson the Cholec80 and AutoLaparo datasets consistently. Compared to Trans-SVNet,LoViT achieves a 2.39 pp (percentage point) improvement in video-level accuracyon Cholec80 and a 3.14 pp improvement on AutoLaparo. Moreover, it achieves a5.25 pp improvement in phase-level Jaccard on AutoLaparo and a 1.55 ppimprovement on Cholec80. Our results demonstrate the effectiveness of ourapproach in achieving state-of-the-art performance of surgical phaserecognition on two datasets of different surgical procedures and temporalsequencing characteristics whilst introducing mechanisms that cope with longvideos.", "output": "LoViT: Long Video Transformer for Surgical Phase Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automated brain tumor segmentation methods are well established, reachingperformance levels with clear clinical utility. Most algorithms require fourinput magnetic resonance imaging (MRI) modalities, typically T1-weighted imageswith and without contrast enhancement, T2-weighted images, and FLAIR images.However, some of these sequences are often missing in clinical practice, e.g.,because of time constraints and/or image artifacts (such as patient motion).Therefore, substituting missing modalities to recover segmentation performancein these scenarios is highly desirable and necessary for the more widespreadadoption of such algorithms in clinical routine. In this work, we report theset-up of the Brain MR Image Synthesis Benchmark (BraSyn), organized inconjunction with the Medical Image Computing and Computer-Assisted Intervention(MICCAI) 2023. The objective of the challenge is to benchmark image synthesismethods that realistically synthesize missing MRI modalities given multipleavailable images to facilitate automated brain tumor segmentation pipelines.The image dataset is multi-modal and diverse, created in collaboration withvarious hospitals and research institutions.", "output": "The Brain Tumor Segmentation (BraTS) Challenge 2023: Brain MR Image Synthesis for Tumor Segmentation (BraSyn)."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, deep reinforcement learning (RL) has been proposed to learn thetractography procedure and train agents to reconstruct the structure of thewhite matter without manually curated reference streamlines. While theperformances reported were competitive, the proposed framework is complex, andlittle is still known about the role and impact of its multiple parts. In thiswork, we thoroughly explore the different components of the proposed framework,such as the choice of the RL algorithm, seeding strategy, the input signal andreward function, and shed light on their impact. Approximately 7,400 modelswere trained for this work, totalling nearly 41,000 hours of GPU time. Our goalis to guide researchers eager to explore the possibilities of deep RL fortractography by exposing what works and what does not work with the category ofapproach. As such, we ultimately propose a series of recommendations concerningthe choice of RL algorithm, the input to the agents, the reward function andmore to help future work using reinforcement learning for tractography. We alsorelease the open source codebase, trained models, and datasets for users andresearchers wanting to explore reinforcement learning for tractography.", "output": "What Matters in Reinforcement Learning for Tractography."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video snapshot compressive imaging (SCI) uses a two-dimensional detector tocapture consecutive video frames during a single exposure time. Following this,an efficient reconstruction algorithm needs to be designed to reconstruct thedesired video frames. Although recent deep learning-based state-of-the-art(SOTA) reconstruction algorithms have achieved good results in most tasks, theystill face the following challenges due to excessive model complexity and GPUmemory limitations: 1) these models need high computational cost, and 2) theyare usually unable to reconstruct large-scale video frames at high compressionratios. To address these issues, we develop an efficient network for video SCIby using dense connections and space-time factorization mechanism within asingle residual block, dubbed EfficientSCI. The EfficientSCI network can wellestablish spatial-temporal correlation by using convolution in the spatialdomain and Transformer in the temporal domain, respectively. We are the firsttime to show that an UHD color video with high compression ratio can bereconstructed from a snapshot 2D measurement using a single end-to-end deeplearning model with PSNR above 32 dB. Extensive results on both simulation andreal data show that our method significantly outperforms all previous SOTAalgorithms with better real-time performance. The code is at", "output": "EfficientSCI: Densely Connected Network with Space-time Factorization for Large-scale Video Snapshot Compressive Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video question answering aims at answering a question about the video contentby reasoning the alignment semantics within them. However, since relyingheavily on human instructions, i.e., annotations or priors, current contrastivelearning-based VideoQA methods remains challenging to perform fine-grainedvisual-linguistic alignments. In this work, we innovatively resort to gametheory, which can simulate complicated relationships among multiple playerswith specific interaction strategies, e.g., video, question, and answer asternary players, to achieve fine-grained alignment for VideoQA task.Specifically, we carefully design a VideoQA-specific interaction strategy totailor the characteristics of VideoQA, which can mathematically generate thefine-grained visual-linguistic alignment label without label-intensive efforts.Our TG-VQA outperforms existing state-of-the-art by a large margin (more than5%) on long-term and short-term VideoQA datasets, verifying its effectivenessand generalization ability. Thanks to the guidance of game-theoreticinteraction, our model impressively convergences well on limited data (${10}^4~videos$), surpassing most of those pre-trained on large-scale data($10^7~videos$).", "output": "TG-VQA: Ternary Game of Video Question Answering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Brain signal visualization has emerged as an active research area, serving asa critical interface between the human visual system and computer visionmodels. Although diffusion models have shown promise in analyzing functionalmagnetic resonance imaging (fMRI) data, including reconstructing high-qualityimages consistent with original visual stimuli, their accuracy in extractingsemantic and silhouette information from brain signals remains limited. In thisregard, we propose a novel approach, referred to as Controllable Mind VisualDiffusion Model (CMVDM). CMVDM extracts semantic and silhouette informationfrom fMRI data using attribute alignment and assistant networks. Additionally,a residual block is incorporated to capture information beyond semantic andsilhouette features. We then leverage a control model to fully exploit theextracted information for image synthesis, resulting in generated images thatclosely resemble the visual stimuli in terms of semantics and silhouette.Through extensive experimentation, we demonstrate that CMVDM outperformsexisting state-of-the-art methods both qualitatively and quantitatively.", "output": "Controllable Mind Visual Diffusion Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video frame interpolation aims to generate high-quality intermediate framesfrom boundary frames and increase frame rate. While existing linear, symmetricand nonlinear models are used to bridge the gap from the lack of inter-framemotion, they cannot reconstruct real motions. Event cameras, however, are idealfor capturing inter-frame dynamics with their extremely high temporalresolution. In this paper, we propose an event-and-frame-based video frameinterpolation method named IDO-VFI that assigns varying amounts of computationfor different sub-regions via optical flow guidance. The proposed method firstestimates the optical flow based on frames and events, and then decides whetherto further calculate the residual optical flow in those sub-regions via aGumbel gating module according to the optical flow amplitude. Intermediateframes are eventually generated through a concise Transformer-based fusionnetwork. Our proposed method maintains high-quality performance while reducingcomputation time and computational effort by 10% and 17% respectively onVimeo90K datasets, compared with a unified process on the whole region.Moreover, our method outperforms state-of-the-art frame-only andframes-plus-events methods on multiple video frame interpolation benchmarks.Codes and models are available at ", "output": "IDO-VFI: Identifying Dynamics via Optical Flow Guidance for Video Frame Interpolation with Events."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we focus on the problem of Medical Visual Question Answering(MedVQA), which is crucial in efficiently interpreting medical images withvital clinic-relevant information. Firstly, we reframe the problem of MedVQA asa generation task that naturally follows the human-machine interaction, wepropose a generative-based model for medical visual understanding by aligningvisual information from a pre-trained vision encoder with a large languagemodel. Secondly, we establish a scalable pipeline to construct a large-scalemedical visual question-answering dataset, named PMC-VQA, which contains 227kVQA pairs of 149k images that cover various modalities or diseases. Thirdly, wepre-train our proposed model on PMC-VQA and then fine-tune it on multiplepublic benchmarks, e.g., VQA-RAD and SLAKE, outperforming existing work by alarge margin. Additionally, we propose a test set that has undergone manualverification, which is significantly more challenging, even the best modelsstruggle to solve.", "output": "PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated domain adaptation (FDA) aims to collaboratively transfer knowledgefrom source clients (domains) to the related but different target client,without communicating the local data of any client. Moreover, the sourceclients have different data distributions, leading to extremely challenging inknowledge transfer. Despite the recent progress in FDA, we empirically findthat existing methods can not leverage models of heterogeneous domains and thusthey fail to achieve excellent performance. In this paper, we propose amodel-based method named FDAC, aiming to address {bf F}ederated {bf D}omain{bf A}daptation based on {bf C}ontrastive learning and Vision Transformer(ViT). In particular, contrastive learning can leverage the unlabeled data totrain excellent models and the ViT architecture performs better thanconvolutional neural networks (CNNs) in extracting adaptable features. To thebest of our knowledge, FDAC is the first attempt to learn transferablerepresentations by manipulating the latent architecture of ViT under thefederated setting. Furthermore, FDAC can increase the target data diversity bycompensating from each source model with insufficient knowledge of samples andfeatures, based on domain augmentation and semantic matching. Extensiveexperiments on several real datasets demonstrate that FDAC outperforms all thecomparative methods in most conditions. Moreover, FDCA can also improvecommunication efficiency which is another key factor in the federated setting.", "output": "Model-Contrastive Federated Domain Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual text evokes an image in a person's mind, while non-visual text failsto do so. A method to automatically detect visualness in text will unlock theability to augment text with relevant images, as neural text-to-imagegeneration and retrieval models operate on the implicit assumption that theinput text is visual in nature. We curate a dataset of 3,620 English sentencesand their visualness scores provided by multiple human annotators.Additionally, we use documents that contain text and visual assets to create adistantly supervised corpus of document text and associated images. We alsopropose a fine-tuning strategy that adapts large vision-language models likeCLIP that assume a one-to-one correspondence between text and image to the taskof scoring text visualness from text input alone. Our strategy involvesmodifying the model's contrastive learning objective to map text identified asnon-visual to a common NULL image while matching visual text to theircorresponding images in the document. We evaluate the proposed approach on itsability to (i) classify visual and non-visual text accurately, and (ii) attendover words that are identified as visual in psycholinguistic studies. Empiricalevaluation indicates that our approach performs better than several heuristicsand baseline models for the proposed task. Furthermore, to highlight theimportance of modeling the visualness of text, we conduct qualitative analysesof text-to-image generation systems like DALL-E.", "output": "Learning the Visualness of Text Using Large Vision-Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sampling-based path planning algorithms play an important role in autonomousrobotics. However, a common problem among the RRT-based algorithms is that theinitial path generated is not optimal and the convergence is too slow to beused in real-world applications. In this paper, we propose a novel image-basedlearning algorithm (CBAGAN-RRT) using a Convolutional Block AttentionGenerative Adversarial Network with a combination of spatial and channelattention and a novel loss function to design the heuristics, find a betteroptimal path, and improve the convergence of the algorithm both concerning timeand speed. The probability distribution of the paths generated from our GANmodel is used to guide the sampling process for the RRT algorithm. We train andtest our network on the dataset generated by cite{zhang2021generative} anddemonstrate that our algorithm outperforms the previous state-of-the-artalgorithms using both the image quality generation metrics like IOU Score, DiceScore, FID score, and path planning metrics like time cost and the number ofnodes. We conduct detailed experiments and ablation studies to illustrate thefeasibility of our study and show that our model performs well not only on thetraining dataset but also on the unseen test dataset. The advantage of ourapproach is that we can avoid the complicated preprocessing in the state space,our model can be generalized to complicated environments like those containingturns and narrow passages without loss of accuracy, and our model can be easilyintegrated with other sampling-based path planning algorithms.", "output": "CBAGAN-RRT: Convolutional Block Attention Generative Adversarial Network for Sampling-Based Path Planning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Going beyond 'dendritic democracy', we introduce a 'democracy of localprocessors', termed Cooperator. Here we compare their capabilities when used inpermutation-invariant neural networks for reinforcement learning (RL), withmachine learning algorithms based on Transformers, such as ChatGPT.Transformers are based on the long-standing conception of integrate-and-fire'point' neurons, whereas Cooperator is inspired by recent neurobiologicalbreakthroughs suggesting that the cellular foundations of mental life depend oncontext-sensitive pyramidal neurons in the neocortex which have twofunctionally distinct points. We show that when used for RL, an algorithm basedon Cooperator learns far quicker than that based on Transformer, even whilehaving the same number of parameters.", "output": "Cooperation Is All You Need."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cardiac diseases are one of the leading mortality factors in modern,industrialized societies, which cause high expenses in public health systems.Due to high costs, developing analytical methods to improve cardiac diagnosticsis essential. The heart's electric activity was first modeled using a set ofnonlinear differential equations. Following this, variations of cardiac spectraoriginating from deterministic dynamics are investigated. Analyzing a normalhuman heart's power spectra offers His-Purkinje network, which possesses afractal-like structure. Phase space trajectories are extracted from the timeseries electrocardiogram (ECG) graph with third-order derivate Taylor Series.Here in this study, phase space analysis and Convolutional Neural Networks(CNNs) method are applied to 44 records via the MIT-BIH database recorded withMLII. In order to increase accuracy, a straight line is drawn between thehighest Q-R distance in the phase space images of the records. Binary CNNclassification is used to determine healthy or unhealthy hearts. With a 90.90%accuracy rate, this model could classify records according to their heartstatus.", "output": "Understanding of Normal and Abnormal Hearts by Phase Space Analysis and Convolutional Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Typical parametric approaches restrict the exploration of diverse designs bygenerating variations based on a baseline design. In contrast, generativemodels provide a solution by leveraging existing designs to create compact yetdiverse generative design spaces (GDSs). However, the effectiveness of currentexploration methods in complex GDSs, especially in ship hull design, remainsunclear. To that end, we first construct a GDS using a generative adversarialnetwork, trained on 52,591 designs of various ship types. Next, we constructedthree modes of exploration, random (REM), semi-automated (SAEM) and automated(AEM), with varying levels of user involvement to explore GDS for novel andoptimised designs. In REM, users manually explore the GDS based on intuition.In SAEM, both the users and optimiser drive the exploration. The optimiserfocuses on exploring a diverse set of optimised designs, while the user directsthe exploration towards their design preference. AEM uses an optimiser tosearch for the global optimum based on design performance. Our results revealedthat REM generates the most diverse designs, followed by SAEM and AEM. However,the SAEM and AEM produce better-performing designs. Specifically, SAEM is themost effective in exploring designs with a high trade-off between novelty andperformance. In conclusion, our study highlights the need for innovativeexploration approaches to fully harness the potential of GDS in designoptimisation.", "output": "How does agency impact human-AI collaborative design space exploration? A case study on ship design with deep generative models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent decades, challenges have become very popular in scientific researchas these are crowdsourcing schemes. In particular, challenges are essential fordeveloping machine learning algorithms. For the challenges settings, it isvital to establish the scientific question, the dataset (with adequate quality,quantity, diversity, and complexity), performance metrics, as well as a way toauthenticate the participants' results (Gold Standard). This paper addressesthe problem of evaluating the performance of different competitors (algorithms)under the restrictions imposed by the challenge scheme, such as the comparisonof multiple competitors with a unique dataset (with fixed size), a minimalnumber of submissions and, a set of metrics chosen to assess performance. Thealgorithms are sorted according to the performance metric. Still, it is commonto observe performance differences among competitors as small as hundredths oreven thousandths, so the question is whether the differences are significant.This paper analyzes the results of the MeOffendEs@IberLEF 2021 competition andproposes to make inference through resampling techniques (bootstrap) to supportChallenge organizers' decision-making.", "output": "Comparison of classifiers in challenge scheme."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This document serves as a position paper that outlines the authors' visionfor a potential pathway towards generalist robots. The purpose of this documentis to share the excitement of the authors with the community and highlight apromising research direction in robotics and AI. The authors believe theproposed paradigm is a feasible path towards accomplishing the long-standinggoal of robotics research: deploying robots, or embodied AI agents morebroadly, in various non-factory real-world settings to perform diverse tasks.This document presents a specific idea for mining knowledge in the latestlarge-scale foundation models for robotics research. Instead of directlyadapting these models or using them to guide low-level policy learning, itadvocates for using them to generate diversified tasks and scenes at scale,thereby scaling up low-level skill learning and ultimately leading to afoundation model for robotics that empowers generalist robots. The authors areactively pursuing this direction, but in the meantime, they recognize that theambitious goal of building generalist robots with large-scale policy trainingdemands significant resources such as computing power and hardware, andresearch groups in academia alone may face severe resource constraints inimplementing the entire vision. Therefore, the authors believe sharing theirthoughts at this early stage could foster discussions, attract interest towardsthe proposed pathway and related topics from industry groups, and potentiallyspur significant technical advancements in the field.", "output": "Towards A Foundation Model for Generalist Robots: Diverse Skill Learning at Scale via Automated Task and Scene Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Time series can describe a wide range of natural and social phenomena. A fewsamples are climate and seismic measures trends, stock prices, or websitevisits. Time-series clustering helps to find outliers that, related to theseinstances, could represent temperature anomalies, imminent volcanic eruptions,market disturbances, or fraudulent web traffic. Founded on the success ofautomatic feature extraction techniques, specifically employing random kernels,we develop a new method for time series clustering consisting of two steps.First, a random convolutional structure transforms the data into an enhancedfeature representation. Afterwards, a clustering algorithm classifies thetransformed data. The method improves state-of-the-art results on time seriesclustering benchmarks.", "output": "Time Series Clustering With Random Convolutional Kernels."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The advancement of Deep Learning (DL) is driven by efficient Deep NeuralNetwork (DNN) design and new hardware accelerators. Current DNN design isprimarily tailored for general-purpose use and deployment on commerciallyviable platforms. Inference at the edge requires low latency, compact andpower-efficient models, and must be cost-effective. Digital processors based ontypical von Neumann architectures are not conducive to edge AI given the largeamounts of required data movement in and out of memory. Conversely,analog/mixed signal in-memory computing hardware accelerators can easilytranscend the memory wall of von Neuman architectures when acceleratinginference workloads. They offer increased area and power efficiency, which areparamount in edge resource-constrained environments. In this paper, we proposeAnalogNAS, a framework for automated DNN design targeting deployment on analogIn-Memory Computing (IMC) inference accelerators. We conduct extensive hardwaresimulations to demonstrate the performance of AnalogNAS on State-Of-The-Art(SOTA) models in terms of accuracy and deployment efficiency on various TinyMachine Learning (TinyML) tasks. We also present experimental results that showAnalogNAS models achieving higher accuracy than SOTA models when implemented ona 64-core IMC chip based on Phase Change Memory (PCM). The AnalogNAS searchcode is released: ", "output": "AnalogNAS: A Neural Network Design Framework for Accurate Inference with Analog In-Memory Computing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose conditioning field initialization for neural network basedtopology optimization. In this work, we focus on (1) improving upon existingneural network based topology optimization, (2) demonstrating that by using aprior initial field on the unoptimized domain, the efficiency of neural networkbased topology optimization can be further improved. Our approach consists of atopology neural network that is trained on a case by case basis to representthe geometry for a single topology optimization problem. It takes in domaincoordinates as input to represent the density at each coordinate where thetopology is represented by a continuous density field. The displacement issolved through a finite element solver. We employ the strain energy fieldcalculated on the initial design domain as an additional conditioning fieldinput to the neural network throughout the optimization. The addition of thestrain energy field input improves the convergence speed compared to standaloneneural network based topology optimization.", "output": "Topology Optimization using Neural Networks with Conditioning Field Initialization for Improved Efficiency."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reconstruction error-based neural architectures constitute a classical deeplearning approach to anomaly detection which has shown great performances. Itconsists in training an Autoencoder to reconstruct a set of examples deemed torepresent the normality and then to point out as anomalies those data that showa sufficiently large reconstruction error. Unfortunately, these architecturesoften become able to well reconstruct also the anomalies in the data. Thisphenomenon is more evident when there are anomalies in the training set. Inparticular when these anomalies are labeled, a setting called semi-supervised,the best way to train Autoencoders is to ignore anomalies and minimize thereconstruction error on normal data. The goal of this work is to investigateapproaches to allow reconstruction error-based architectures to instruct themodel to put known anomalies outside of the domain description of the normaldata. Specifically, our strategy exploits a limited number of anomalousexamples to increase the contrast between the reconstruction error associatedwith normal examples and those associated with both known and unknownanomalies, thus enhancing anomaly detection performances. The experiments showthat this new procedure achieves better performances than the standardAutoencoder approach and the main deep learning techniques for semi-supervisedanomaly detection.", "output": "Reconstruction Error-based Anomaly Detection with Few Outlying Examples."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most phenomena related to biomedical tasks are inherently complex, and inmany cases, are expressed as signals on biomedical Knowledge Graphs (KGs). Inthis work, we introduce the use of a new representation framework, the PrimeAdjacency Matrix (PAM) for biomedical KGs, which allows for very efficientnetwork analysis. PAM utilizes prime numbers to enable representing the wholeKG with a single adjacency matrix and the fast computation of multipleproperties of the network. We illustrate the applicability of the framework inthe biomedical domain by working on different biomedical knowledge graphs andby providing two case studies: one on drug-repurposing for COVID-19 and one onimportant metapath extraction. We show that we achieve better results than theoriginal proposed workflows, using very simple methods that require notraining, in considerably less time.", "output": "Analysing Biomedical Knowledge Graphs using Prime Adjacency Matrices."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The core purpose of developing artificial neural networks was to mimic thefunctionalities of biological neural networks. However, unlike biologicalneural networks, traditional artificial neural networks are often structuredhierarchically, which can impede the flow of information between neurons as theneurons in the same layer have no connections between them. Hence, we propose amore robust model of artificial neural networks where the hidden neurons,residing in the same hidden layer, are interconnected, enabling the neurons tolearn complex patterns and speeding up the convergence rate. With theexperimental study of our proposed model as fully connected layers in shallowand deep networks, we demonstrate that the model results in a significantincrease in convergence rate.", "output": "Connected Hidden Neurons (CHNNet): An Artificial Neural Network for Rapid Convergence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vector embeddings have been successfully applied in several domains to obtaineffective representations of non-numeric data which can then be used in variousdownstream tasks. We present a novel application of vector embeddings inprofessional road cycling by demonstrating a method to learn representationsfor riders and races based on historical results. We use unsupervised learningtechniques to validate that the resultant embeddings capture interestingfeatures of riders and races. These embeddings could be used for downstreamprediction tasks such as early talent identification and race outcomeprediction.", "output": "Bike2Vec: Vector Embedding Representations of Road Cycling Riders and Races."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Due to their high predictive performance and flexibility, machine learningmodels are an appropriate and efficient tool for ecologists. However,implementing a machine learning model is not yet a trivial task and may seemintimidating to ecologists with no previous experience in this area. Here weprovide a series of tips to help ecologists in implementing machine learningmodels. We focus on classification problems as many ecological studies aim toassign data into predefined classes such as ecological states or biologicalentities. Each of the nine tips identifies a common error, trap or challenge indeveloping machine learning models and provides recommendations to facilitatetheir use in ecological studies.", "output": "Nine tips for ecologists using machine learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Identification and verification of molecular properties such as side effectsis one of the most important and time-consuming steps in the process ofmolecule synthesis. For example, failure to identify side effects beforesubmission to regulatory groups can cost millions of dollars and months ofadditional research to the companies. Failure to identify side effects duringthe regulatory review can also cost lives. The complexity and expense of thistask have made it a candidate for a machine learning-based solution. Priorapproaches rely on complex model designs and excessive parameter counts forside effect predictions. We believe reliance on complex models only shifts thedifficulty away from chemists rather than alleviating the issue. Implementinglarge models is also expensive without prior access to high-performancecomputers. We propose a heuristic approach that allows for the utilization ofsimple neural networks, specifically the recurrent neural network, with a 98+%reduction in the number of required parameters compared to available largelanguage models while still obtaining near identical results as top-performingmodels.", "output": "Predicting Side Effect of Drug Molecules using Recurrent Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite tremendous progress in generating high-quality images using diffusionmodels, synthesizing a sequence of animated frames that are both photorealisticand temporally coherent is still in its infancy. While off-the-shelfbillion-scale datasets for image generation are available, collecting similarvideo data of the same scale is still challenging. Also, training a videodiffusion model is computationally much more expensive than its imagecounterpart. In this work, we explore finetuning a pretrained image diffusionmodel with video data as a practical solution for the video synthesis task. Wefind that naively extending the image noise prior to video noise prior in videodiffusion leads to sub-optimal performance. Our carefully designed video noiseprior leads to substantially better performance. Extensive experimentalvalidation shows that our model, Preserve Your Own Correlation (PYoCo), attainsSOTA zero-shot text-to-video results on the UCF-101 and MSR-VTT benchmarks. Italso achieves SOTA video generation quality on the small-scale UCF-101benchmark with a $10times$ smaller model using significantly less computationthan the prior art.", "output": "Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph Neural Networks (GNNs) have become the de-facto standard tool formodeling relational data. However, while many real-world graphs are directed,the majority of today's GNN models discard this information altogether bysimply making the graph undirected. The reasons for this are historical: 1)many early variants of spectral GNNs explicitly required undirected graphs, and2) the first benchmarks on homophilic graphs did not find significant gain fromusing direction. In this paper, we show that in heterophilic settings, treatingthe graph as directed increases the effective homophily of the graph,suggesting a potential gain from the correct use of directionality information.To this end, we introduce Directed Graph Neural Network (Dir-GNN), a novelgeneral framework for deep learning on directed graphs. Dir-GNN can be used toextend any Message Passing Neural Network (MPNN) to account for edgedirectionality information by performing separate aggregations of the incomingand outgoing edges. We prove that Dir-GNN matches the expressivity of theDirected Weisfeiler-Lehman test, exceeding that of conventional MPNNs. Inextensive experiments, we validate that while our framework leaves performanceunchanged on homophilic datasets, it leads to large gains over base models suchas GCN, GAT and GraphSage on heterophilic benchmarks, outperforming much morecomplex methods and achieving new state-of-the-art results.", "output": "Edge Directionality Improves Learning on Heterophilic Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently Transformer and Convolution neural network (CNN) based models haveshown promising results in EEG signal processing. Transformer models cancapture the global dependencies in EEG signals through a self-attentionmechanism, while CNN models can capture local features such as sawtooth waves.In this work, we propose an end-to-end neural epilepsy detection model, EENED,that combines CNN and Transformer. Specifically, by introducing the convolutionmodule into the Transformer encoder, EENED can learn the time-dependentrelationship of the patient's EEG signal features and notice local EEG abnormalmutations closely related to epilepsy, such as the appearance of spikes and thesprinkling of sharp and slow waves. Our proposed framework combines the abilityof Transformer and CNN to capture different scale features of EEG signals andholds promise for improving the accuracy and reliability of epilepsy detection.Our source code will be released soon on GitHub.", "output": "EENED: End-to-End Neural Epilepsy Detection based on Convolutional Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robust Markov decision processes (MDPs) address the challenge of modeluncertainty by optimizing the worst-case performance over an uncertainty set ofMDPs. In this paper, we focus on the robust average-reward MDPs under themodel-free setting. We first theoretically characterize the structure ofsolutions to the robust average-reward Bellman equation, which is essential forour later convergence analysis. We then design two model-free algorithms,robust relative value iteration (RVI) TD and robust RVI Q-learning, andtheoretically prove their convergence to the optimal solution. We provideseveral widely used uncertainty sets as examples, including those defined bythe contamination model, total variation, Chi-squared divergence,Kullback-Leibler (KL) divergence and Wasserstein distance.", "output": "Model-Free Robust Average-Reward Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we study the system identification problem for lineardiscrete-time systems under adversaries and analyze two lasso-type estimators.We study both asymptotic and non-asymptotic properties of these estimators intwo separate scenarios, corresponding to deterministic and stochastic modelsfor the attack times. Since the samples collected from the system arecorrelated, the existing results on lasso are not applicable. We show that whenthe system is stable and the attacks are injected periodically, the samplecomplexity for the exact recovery of the system dynamics is O(n), where n isthe dimension of the states. When the adversarial attacks occur at each timeinstance with probability p, the required sample complexity for the exactrecovery scales as O(log(n)p/(1-p)^2). This result implies the almost sureconvergence to the true system dynamics under the asymptotic regime. As aby-product, even when more than half of the data is compromised, our estimatorsstill learn the system correctly. This paper provides the first mathematicalguarantee in the literature on learning from correlated data for dynamicalsystems in the case when there is less clean data than corrupt data.", "output": "Exact Recovery for System Identification with More Corrupt Data than Clean Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative Language Models (GLMs) have demonstrated capabilities to storefactual knowledge and answer queries efficiently. Given varying prompts, does aGLM consistently generate factually correct answers? In this paper, weintroduce a statistical knowledge assessment framework guided by latentvariables and the KaRR metric, which quantifies a model's knowledge bycomputing its continuous probability across diverse text forms. We conduct acomprehensive comparison of knowledge across 14 GLMs using our framework,including LLaMA, Alpaca, OPT, and others. Our statistical knowledge assessmentencompasses 600 relation types and exhibits a strong correlation (0.43Kendall's $tau$) with human evaluation. Our findings reveal that the knowledgein GLMs with the same backbone architecture adheres to the scaling law, andthat tuning on instruction-following data may compromise the model's ability togenerate factually correct text consistently.", "output": "Statistical Knowledge Assessment for Generative Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Off-Policy reinforcement learning has been a driving force for thestate-of-the-art conversational AIs leading to more natural humanagentinteractions and improving the user satisfaction for goal-oriented agents.However, in large-scale commercial settings, it is often challenging to balancebetween policy improvements and experience continuity on the broad spectrum ofapplications handled by such system. In the literature, off-policy evaluationand guard-railing on aggregate statistics has been commonly used to addressthis problem. In this paper, we propose a method for curating and leveraginghigh-precision samples sourced from historical regression incident reports tovalidate, safe-guard, and improve policies prior to the online deployment. Weconducted extensive experiments using data from a real-world conversationalsystem and actual regression incidents. The proposed method is currentlydeployed in our production system to protect customers against brokenexperiences and enable long-term policy improvements.", "output": "Scalable and Safe Remediation of Defective Actions in Self-Learning Conversational Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A growing line of work shows how learned predictions can be used to breakthrough worst-case barriers to improve the running time of an algorithm.However, incorporating predictions into data structures with strong theoreticalguarantees remains underdeveloped. This paper takes a step in this direction byshowing that predictions can be leveraged in the fundamental online listlabeling problem. In the problem, n items arrive over time and must be storedin sorted order in an array of size Theta(n). The array slot of an element isits label and the goal is to maintain sorted order while minimizing the totalnumber of elements moved (i.e., relabeled). We design a new list labeling datastructure and bound its performance in two models. In the worst-caselearning-augmented model, we give guarantees in terms of the error in thepredictions. Our data structure provides strong guarantees: it is optimal forany prediction error and guarantees the best-known worst-case bound even whenthe predictions are entirely erroneous. We also consider a stochastic errormodel and bound the performance in terms of the expectation and variance of theerror. Finally, the theoretical results are demonstrated empirically. Inparticular, we show that our data structure has strong performance on realtemporal data sets where predictions are constructed from elements that arrivedin the past, as is typically done in a practical use case.", "output": "Online List Labeling with Predictions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning based approaches are being increasingly used for designingdecoders for next generation communication systems. One widely used frameworkis neural belief propagation (NBP), which unfolds the belief propagation (BP)iterations into a deep neural network and the parameters are trained in adata-driven manner. NBP decoders have been shown to improve upon classicaldecoding algorithms. In this paper, we investigate the generalizationcapabilities of NBP decoders. Specifically, the generalization gap of a decoderis the difference between empirical and expected bit-error-rate(s). We presentnew theoretical results which bound this gap and show the dependence on thedecoder complexity, in terms of code parameters (blocklength, message length,variable/check node degrees), decoding iterations, and the training datasetsize. Results are presented for both regular and irregular parity-checkmatrices. To the best of our knowledge, this is the first set of theoreticalresults on generalization performance of neural network based decoders. Wepresent experimental results to show the dependence of generalization gap onthe training dataset size, and decoding iterations for different codes.", "output": "Generalization Bounds for Neural Belief Propagation Decoders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce Graph-Induced Sum-Product Networks (GSPNs), a new probabilisticframework for graph representation learning that can tractably answerprobabilistic queries. Inspired by the computational trees induced by verticesin the context of message-passing neural networks, we build hierarchies ofsum-product networks (SPNs) where the parameters of a parent SPN are learnabletransformations of the a-posterior mixing probabilities of its children's sumunits. Due to weight sharing and the tree-shaped computation graphs of GSPNs,we obtain the efficiency and efficacy of deep graph networks with theadditional advantages of a purely probabilistic model. We show the model'scompetitiveness on scarce supervision scenarios, handling missing data, andgraph classification in comparison to popular neural models. We complement theexperiments with qualitative analyses on hyper-parameters and the model'sability to answer probabilistic queries.", "output": "Tractable Probabilistic Graph Representation Learning with Graph-Induced Sum-Product Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The discovery of individual objectives in collective behavior of complexdynamical systems such as fish schools and bacteria colonies is a long-standingchallenge. Inverse reinforcement learning is a potent approach for addressingthis challenge but its applicability to dynamical systems, involving continuousstate-action spaces and multiple interacting agents, has been limited. In thisstudy, we tackle this challenge by introducing an off-policy inversemulti-agent reinforcement learning algorithm (IMARL). Our approach combines theReF-ER techniques with guided cost learning. By leveraging demonstrations, ouralgorithm automatically uncovers the reward function and learns an effectivepolicy for the agents. Through extensive experimentation, we demonstrate thatthe proposed policy captures the behavior observed in the provided data, andachieves promising results across problem domains including single agent modelsin the OpenAI gym and multi-agent models of schooling behavior. The presentstudy shows that the proposed IMARL algorithm is a significant step towardsunderstanding collective dynamics from the perspective of its constituents, andshowcases its value as a tool for studying complex physical systems exhibitingcollective behaviour.", "output": "Discovering Individual Rewards in Collective Behavior through Inverse Multi-Agent Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We investigate how sparse neural activity affects the generalizationperformance of a deep Bayesian neural network at the large width limit. To thisend, we derive a neural network Gaussian Process (NNGP) kernel with rectifiedlinear unit (ReLU) activation and a predetermined fraction of active neurons.Using the NNGP kernel, we observe that the sparser networks outperform thenon-sparse networks at shallow depths on a variety of datasets. We validatethis observation by extending the existing theory on the generalization errorof kernel-ridge regression.", "output": "Sparsity-depth Tradeoff in Infinitely Wide Deep Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Traditional supervised learning tasks require a label for every instance inthe training set, but in many real-world applications, labels are onlyavailable for collections (bags) of instances. This problem setting, known asmultiple instance learning (MIL), is particularly relevant in the medicaldomain, where high-resolution images are split into smaller patches, but labelsapply to the image as a whole. Recent MIL models are able to capturecorrespondences between patches by employing self-attention, allowing them toweigh each patch differently based on all other patches in the bag. However,these approaches still do not consider the relative spatial relationshipsbetween patches within the larger image, which is especially important incomputational pathology. To this end, we introduce a novel MIL model withdistance-aware self-attention (DAS-MIL), which explicitly takes into accountrelative spatial information when modelling the interactions between patches.Unlike existing relative position representations for self-attention which arediscrete, our approach introduces continuous distance-dependent terms into thecomputation of the attention weights, and is the first to apply relativeposition representations in the context of MIL. We evaluate our model on acustom MNIST-based MIL dataset that requires the consideration of relativespatial information, as well as on CAMELYON16, a publicly available cancermetastasis detection dataset, where we achieve a test AUROC score of 0.91. Onboth datasets, our model outperforms existing MIL approaches that employabsolute positional encodings, as well as existing relative positionrepresentation schemes applied to MIL. Our code is available at", "output": "Deep Multiple Instance Learning with Distance-Aware Self-Attention."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent developments related to the energy transition pose particularchallenges for distribution grids. Hence, precise load forecasts become moreand more important for effective grid management. Novel modeling approachessuch as the Transformer architecture, in particular the Temporal FusionTransformer (TFT), have emerged as promising methods for time seriesforecasting. To date, just a handful of studies apply TFTs to electricity loadforecasting problems, mostly considering only single datasets and a fewcovariates. Therefore, we examine the potential of the TFT architecture forhourly short-term load forecasting across different time horizons (day-aheadand week-ahead) and network levels (grid and substation level). We find thatthe TFT architecture does not offer higher predictive performance than astate-of-the-art LSTM model for day-ahead forecasting on the entire grid.However, the results display significant improvements for the TFT when appliedat the substation level with a subsequent aggregation to the upper grid-level,resulting in a prediction error of 2.43% (MAPE) for the best-performingscenario. In addition, the TFT appears to offer remarkable improvements overthe LSTM approach for week-ahead forecasting (yielding a predictive error of2.52% (MAPE) at the lowest). We outline avenues for future research using theTFT approach for load forecasting, including the exploration of various gridlevels (e.g., grid, substation, and household level).", "output": "Short-Term Electricity Load Forecasting Using the Temporal Fusion Transformer: Effect of Grid Hierarchies and Data Sources."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Abstaining classifiers have the option to abstain from making predictions oninputs that they are unsure about. These classifiers are becoming increasinglypopular in high-stake decision-making problems, as they can withhold uncertainpredictions to improve their reliability and safety. When evaluating black-boxabstaining classifier(s), however, we lack a principled approach that accountsfor what the classifier would have predicted on its abstentions. These missingpredictions are crucial when, e.g., a radiologist is unsure of their diagnosisor when a driver is inattentive in a self-driving car. In this paper, weintroduce a novel approach and perspective to the problem of evaluating andcomparing abstaining classifiers by treating abstentions as missing data. Ourevaluation approach is centered around defining the counterfactual score of anabstaining classifier, defined as the expected performance of the classifierhad it not been allowed to abstain. We specify the conditions under which thecounterfactual score is identifiable: if the abstentions are stochastic, and ifthe evaluation data is independent of the training data (ensuring that thepredictions are missing at random), then the score is identifiable. Note that,if abstentions are deterministic, then the score is unidentifiable because theclassifier can perform arbitrarily poorly on its abstentions. Leveraging toolsfrom observational causal inference, we then develop nonparametric and doublyrobust methods to efficiently estimate this quantity under identification. Ourapproach is examined in both simulated and real data experiments.", "output": "Counterfactually Comparing Abstaining Classifiers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dynamic positron emission tomography imaging (dPET) provides temporallyresolved images of a tracer enabling a quantitative measure of physiologicalprocesses. Voxel-wise physiologically-based pharmacokinetic (PBPK) modeling ofthe time activity curves (TAC) can provide relevant diagnostic information forclinical workflow. Conventional fitting strategies for TACs are slow and ignorethe spatial relation between neighboring voxels. We train a spatio-temporalUNet to estimate the kinetic parameters given TAC from F-18-fluorodeoxyglucose(FDG) dPET. This work introduces a self-supervised loss formulation to enforcethe similarity between the measured TAC and those generated with the learnedkinetic parameters. Our method provides quantitatively comparable results atorgan-level to the significantly slower conventional approaches, whilegenerating pixel-wise parametric images which are consistent with expectedphysiology. To the best of our knowledge, this is the first self-supervisednetwork that allows voxel-wise computation of kinetic parameters consistentwith a non-linear kinetic model. The code will become publicly available uponacceptance.", "output": "Self-Supervised Learning for Physiologically-Based Pharmacokinetic Modeling in Dynamic PET."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Following up on a previous analysis of graph embeddings, we generalize andexpand some results to the general setting of vector symbolic architectures(VSA) and hyperdimensional computing (HDC). Importantly, we explore themathematical relationship between superposition, orthogonality, and tensorproduct. We establish the tensor product representation as the centralrepresentation, with a suite of unique properties. These include it being themost general and expressive representation, as well as being the mostcompressed representation that has errorrless unbinding and detection.", "output": "Tensor Products and Hyperdimensional Computing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language models are increasingly being deployed for general problem solvingacross a wide range of tasks, but are still confined to token-level,left-to-right decision-making processes during inference. This means they canfall short in tasks that require exploration, strategic lookahead, or whereinitial decisions play a pivotal role. To surmount these challenges, weintroduce a new framework for language model inference, Tree of Thoughts (ToT),which generalizes over the popular Chain of Thought approach to promptinglanguage models, and enables exploration over coherent units of text (thoughts)that serve as intermediate steps toward problem solving. ToT allows LMs toperform deliberate decision making by considering multiple different reasoningpaths and self-evaluating choices to decide the next course of action, as wellas looking ahead or backtracking when necessary to make global choices. Ourexperiments show that ToT significantly enhances language models'problem-solving abilities on three novel tasks requiring non-trivial planningor search: Game of 24, Creative Writing, and Mini Crosswords. For instance, inGame of 24, while GPT-4 with chain-of-thought prompting only solved 4% oftasks, our method achieved a success rate of 74%. Code repo with all prompts:", "output": "Tree of Thoughts: Deliberate Problem Solving with Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dynamic control flow is an important technique often used to designexpressive and efficient deep learning computations for applications such astext parsing, machine translation, exiting early out of deep models and so on.However, the resulting control flow divergence makes batching, an importantperformance optimization, difficult to perform manually. In this paper, wepresent ACRoBat, a framework that enables efficient automatic batching fordynamic deep learning computations by performing hybrid static+dynamic compileroptimizations and end-to-end tensor code generation. ACRoBat performs up to8.5X better than DyNet, a state-of-the-art framework for automatic batching, onan Nvidia GeForce RTX 3070 GPU.", "output": "ACRoBat: Optimizing Auto-batching of Dynamic Deep Learning at Compile Time."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There is a lot of research effort devoted by researcher into developingdifferent techniques for neural networks compression, yet the community seemsto lack standardised ways of evaluating and comparing between differentcompression techniques, which is key to identifying the most suitablecompression technique for different applications. In this paper we contributetowards standardisation of neural network compression by providing a review ofevaluation metrics. These metrics have been implemented into NetZIP, astandardised neural network compression bench. We showcase some of the metricsreviewed using three case studies focusing on object classification, objectdetection, and edge devices.", "output": "Evaluation Metrics for CNNs Compression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Empirical studies on the landscape of neural networks have shown thatlow-energy configurations are often found in complex connected structures,where zero-energy paths between pairs of distant solutions can be constructed.Here we consider the spherical negative perceptron, a prototypical non-convexneural network model framed as a continuous constraint satisfaction problem. Weintroduce a general analytical method for computing energy barriers in thesimplex with vertex configurations sampled from the equilibrium. We find thatin the over-parameterized regime the solution manifold displays simpleconnectivity properties. There exists a large geodesically convex componentthat is attractive for a wide range of optimization dynamics. Inside thisregion we identify a subset of atypically robust solutions that aregeodesically connected with most other solutions, giving rise to a star-shapedgeometry. We analytically characterize the organization of the connected spaceof solutions and show numerical evidence of a transition, at larger constraintdensities, where the aforementioned simple geodesic connectivity breaks down.", "output": "The star-shaped space of solutions of the spherical negative perceptron."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Neural Networks (DNNs) are becoming integral components of real worldservices relied upon by millions of users. Unfortunately, architects of thesesystems can find it difficult to ensure reliable performance as irrelevantdetails like random initialization can unexpectedly change the outputs of atrained system with potentially disastrous consequences. We formulate the modelstability problem by studying how the predictions of a model change, even whenit is retrained on the same data, as a consequence of stochasticity in thetraining process. For Natural Language Understanding (NLU) tasks, we findinstability in predictions for a significant fraction of queries. We formulateprincipled metrics, like per-sample ``label entropy'' across training runs orwithin a single training run, to quantify this phenomenon. Intriguingly, wefind that unstable predictions do not appear at random, but rather appear to beclustered in data-specific ways. We study data-agnostic regularization methodsto improve stability and propose new data-centric methods that exploit ourlocal stability estimates. We find that our localized data-specific mitigationstrategy dramatically outperforms data-agnostic methods, and comes within 90%of the gold standard, achieved by ensembling, at a fraction of thecomputational cost", "output": "Measuring and Mitigating Local Instability in Deep Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This study aimed to solve the semantic gap and misalignment issue betweenencoding and decoding because of multiple convolutional and pooling operationsin U-Net when segmenting subabdominal MRI images during rectal cancertreatment. A MRI Image Segmentation is proposed based on a multi-scale featurepyramid network and dual attention mechanism. Our innovation is the design oftwo modules: 1) a dilated convolution and multi-scale feature pyramid networkare used in the encoding to avoid the semantic gap. 2) a dual attentionmechanism is designed to maintain spatial information of U-Net and reducemisalignment. Experiments on a subabdominal MRI image dataset show the proposedmethod achieves better performance than others methods. In conclusion, amulti-scale feature pyramid network can reduce the semantic gap, and the dualattention mechanism can make an alignment of features between encoding anddecoding.", "output": "A Subabdominal MRI Image Segmentation Algorithm Based on Multi-Scale Feature Pyramid Network and Dual Attention Mechanism."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We focus on the task of learning a single index model $sigma(w^star cdotx)$ with respect to the isotropic Gaussian distribution in $d$ dimensions.Prior work has shown that the sample complexity of learning $w^star$ isgoverned by the information exponent $k^star$ of the link function $sigma$,which is defined as the index of the first nonzero Hermite coefficient of$sigma$. Ben Arous et al. (2021) showed that $n gtrsim d^{k^star-1}$ samplessuffice for learning $w^star$ and that this is tight for online SGD. However,the CSQ lower bound for gradient based methods only shows that $n gtrsimd^{k^star/2}$ samples are necessary. In this work, we close the gap betweenthe upper and lower bounds by showing that online SGD on a smoothed loss learns$w^star$ with $n gtrsim d^{k^star/2}$ samples. We also draw connections tostatistical analyses of tensor PCA and to the implicit regularization effectsof minibatch SGD on empirical losses.", "output": "Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Gauss-Newton methods and their stochastic version have been widely used inmachine learning and signal processing. Their nonsmooth counterparts, modifiedGauss-Newton or prox-linear algorithms, can lead to contrasting outcomes whencompared to gradient descent in large-scale statistical settings. We explorethe contrasting performance of these two classes of algorithms in theory on astylized statistical example, and experimentally on learning problems includingstructured prediction. In theory, we delineate the regime where the quadraticconvergence of the modified Gauss-Newton method is active under statisticalnoise. In the experiments, we underline the versatility of stochastic(sub)-gradient descent to minimize nonsmooth composite objectives.", "output": "Modified Gauss-Newton Algorithms under Noise."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Stein Variational Gradient Descent (SVGD) is a popular particle-based methodfor Bayesian inference. However, its convergence suffers from the variancecollapse, which reduces the accuracy and diversity of the estimation. In thispaper, we study the isotropy property of finite particles during theconvergence process and show that SVGD of finite particles cannot spread acrossthe entire sample space. Instead, all particles tend to cluster around theparticle center within a certain range and we provide an analytical bound forthis cluster. To further improve the effectiveness of SVGD for high-dimensionalproblems, we propose the Augmented Message Passing SVGD (AUMP-SVGD) method,which is a two-stage optimization procedure that does not require sparsity ofthe target distribution, unlike the MP-SVGD method. Our algorithm achievessatisfactory accuracy and overcomes the variance collapse problem in variousbenchmark problems.", "output": "Augmented Message Passing Stein Variational Gradient Descent."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The task of root cause analysis (RCA) is to identify the root causes ofsystem faults/failures by analyzing system monitoring data. Efficient RCA cangreatly accelerate system failure recovery and mitigate system damages orfinancial losses. However, previous research has mostly focused on developingoffline RCA algorithms, which often require manually initiating the RCAprocess, a significant amount of time and data to train a robust model, andthen being retrained from scratch for a new system fault.In this paper, we propose CORAL, a novel online RCA framework that canautomatically trigger the RCA process and incrementally update the RCA model.CORAL consists of Trigger Point Detection, Incremental Disentangled CausalGraph Learning, and Network Propagation-based Root Cause Localization. TheTrigger Point Detection component aims to detect system state transitionsautomatically and in near-real-time. To achieve this, we develop an onlinetrigger point detection approach based on multivariate singular spectrumanalysis and cumulative sum statistics. To efficiently update the RCA model, wepropose an incremental disentangled causal graph learning approach to decouplethe state-invariant and state-dependent information. After that, CORAL appliesa random walk with restarts to the updated causal graph to accurately identifyroot causes. The online RCA process terminates when the causal graph and thegenerated root cause list converge. Extensive experiments on three real-worlddatasets with case studies demonstrate the effectiveness and superiority of theproposed framework.", "output": "Incremental Causal Graph Learning for Online Unsupervised Root Cause Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks have consistently shown great performance in severalreal-world use cases like autonomous vehicles, satellite imaging, etc.,effectively leveraging large corpora of labeled training data. However,learning unbiased models depends on building a dataset that is representativeof a diverse range of realistic scenarios for a given task. This is challengingin many settings where data comes from high-volume streams, with each scenariooccurring in random interleaved episodes at varying frequencies. We studyrealistic streaming settings where data instances arrive in and are sampledfrom an episodic multi-distributional data stream. Using submodular informationmeasures, we propose STREAMLINE, a novel streaming active learning frameworkthat mitigates scenario-driven slice imbalance in the working labeled data viaa three-step procedure of slice identification, slice-aware budgeting, and dataselection. We extensively evaluate STREAMLINE on real-world streaming scenariosfor image classification and object detection tasks. We observe that STREAMLINEimproves the performance on infrequent yet critical slices of the data overcurrent baselines by up to $5%$ in terms of accuracy on our imageclassification tasks and by up to $8%$ in terms of mAP on our object detectiontasks.", "output": "STREAMLINE: Streaming Active Learning for Realistic Multi-Distributional Settings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automatic segmentation of medical images is a key step for diagnostic andinterventional tasks. However, achieving this requires large amounts ofannotated volumes, which can be tedious and time-consuming task for expertannotators. In this paper, we introduce DeepEdit, a deep learning-based methodfor volumetric medical image annotation, that allows automatic andsemi-automatic segmentation, and click-based refinement. DeepEdit combines thepower of two methods: a non-interactive (i.e. automatic segmentation usingnnU-Net, UNET or UNETR) and an interactive segmentation method (i.e. DeepGrow),into a single deep learning model. It allows easy integration ofuncertainty-based ranking strategies (i.e. aleatoric and epistemic uncertaintycomputation) and active learning. We propose and implement a method fortraining DeepEdit by using standard training combined with user interactionsimulation. Once trained, DeepEdit allows clinicians to quickly segment theirdatasets by using the algorithm in auto segmentation mode or by providingclicks via a user interface (i.e. 3D Slicer, OHIF). We show the value ofDeepEdit through evaluation on the PROSTATEx dataset for prostate/prostaticlesions and the Multi-Atlas Labeling Beyond the Cranial Vault (BTCV) datasetfor abdominal CT segmentation, using state-of-the-art network architectures asbaseline for comparison. DeepEdit could reduce the time and effort annotating3D medical images compared to DeepGrow alone. Source code is available at", "output": "DeepEdit: Deep Editable Learning for Interactive Segmentation of 3D Medical Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A key challenge in dysarthric speech recognition is the speaker-leveldiversity attributed to both speaker-identity associated factors such asgender, and speech impairment severity. Most prior researches on addressingthis issue focused on using speaker-identity only. To this end, this paperproposes a novel set of techniques to use both severity and speaker-identity indysarthric speech recognition: a) multitask training incorporating severityprediction error; b) speaker-severity aware auxiliary feature adaptation; andc) structured LHUC transforms separately conditioned on speaker-identity andseverity. Experiments conducted on UASpeech suggest incorporating additionalspeech impairment severity into state-of-the-art hybrid DNN, E2E Conformer andpre-trained Wav2vec 2.0 ASR systems produced statistically significant WERreductions up to 4.78% (14.03% relative). Using the best system the lowestpublished WER of 17.82% (51.25% on very low intelligibility) was obtained onUASpeech.", "output": "Use of Speech Impairment Severity for Dysarthric Speech Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "From the classical and influential works of Neal (1996), it is known that theinfinite width scaling limit of a Bayesian neural network with one hidden layeris a Gaussian process, emph{when the network weights have bounded priorvariance}. Neal's result has been extended to networks with multiple hiddenlayers and to convolutional neural networks, also with Gaussian process scalinglimits. The tractable properties of Gaussian processes then allowstraightforward posterior inference and uncertainty quantification,considerably simplifying the study of the limit process compared to a networkof finite width. Neural network weights with unbounded variance, however, poseunique challenges. In this case, the classical central limit theorem breaksdown and it is well known that the scaling limit is an $alpha$-stable processunder suitable conditions. However, current literature is primarily limited toforward simulations under these processes and the problem of posteriorinference under such a scaling limit remains largely unaddressed, unlike in theGaussian process case. To this end, our contribution is an interpretable andcomputationally efficient procedure for posterior inference, using aemph{conditionally Gaussian} representation, that then allows full use of theGaussian process machinery for tractable posterior inference and uncertaintyquantification in the non-Gaussian regime.", "output": "Posterior Inference on Infinitely Wide Bayesian Neural Networks under Weights with Unbounded Variance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph anomaly detection has long been an important problem in various domainspertaining to information security such as financial fraud, social spam,network intrusion, etc. The majority of existing methods are performed in anunsupervised manner, as labeled anomalies in a large scale are often tooexpensive to acquire. However, the identified anomalies may turn out to be datanoises or uninteresting data instances due to the lack of prior knowledge onthe anomalies. In realistic scenarios, it is often feasible to obtain limitedlabeled anomalies, which have great potential to advance graph anomalydetection. However, the work exploring limited labeled anomalies and a largeamount of unlabeled nodes in graphs to detect anomalies is rather limited.Therefore, in this paper, we study a novel problem of few-shot graph anomalydetection. We propose a new framework MetaGAD to learn to meta-transfer theknowledge between unlabeled and labeled nodes for graph anomaly detection.Experimental results on six real-world datasets with synthetic anomalies and\"organic\" anomalies (available in the dataset) demonstrate the effectiveness ofthe proposed approach in detecting anomalies with limited labeled anomalies.", "output": "MetaGAD: Learning to Meta Transfer for Few-shot Graph Anomaly Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The prevalence of large-scale graphs poses great challenges in time andstorage for training and deploying graph neural networks (GNNs). Several recentworks have explored solutions for pruning the large original graph into a smalland highly-informative one, such that training and inference on the pruned andlarge graphs have comparable performance. Although empirically effective,current researches focus on static or non-temporal graphs, which are notdirectly applicable to dynamic scenarios. In addition, they require labels asground truth to learn the informative structure, limiting their applicabilityto new problem domains where labels are hard to obtain. To solve the dilemma,we propose and study the problem of unsupervised graph pruning on dynamicgraphs. We approach the problem by our proposed STEP, a self-supervisedtemporal pruning framework that learns to remove potentially redundant edgesfrom input dynamic graphs. From a technical and industrial viewpoint, ourmethod overcomes the trade-offs between the performance and the time & memoryoverheads. Our results on three real-world datasets demonstrate the advantageson improving the efficacy, robustness, and efficiency of GNNs on dynamic nodeclassification tasks. Most notably, STEP is able to prune more than 50% ofedges on a million-scale industrial graph Alipay (7M nodes, 21M edges) whileapproximating up to 98% of the original performance. Code is available at", "output": "Less Can Be More: Unsupervised Graph Pruning for Large-scale Dynamic Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose the first black-box targeted attack against online deepreinforcement learning through reward poisoning during training time. Ourattack is applicable to general environments with unknown dynamics learned byunknown algorithms and requires limited attack budgets and computationalresources. We leverage a general framework and find conditions to ensureefficient attack under a general assumption of the learning algorithms. We showthat our attack is optimal in our framework under the conditions. Weexperimentally verify that with limited budgets, our attack efficiently leadsthe learning agent to various target policies under a diverse set of popularDRL environments and state-of-the-art learners.", "output": "Black-Box Targeted Reward Poisoning Attack Against Online Deep Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusions are a successful technique to sample from high-dimensionaldistributions can be either explicitly given or learnt from a collection ofsamples. They implement a diffusion process whose endpoint is a sample from thetarget distribution and whose drift is typically represented as a neuralnetwork. Stochastic localization is a successful technique to prove mixing ofMarkov Chains and other functional inequalities in high dimension. Analgorithmic version of stochastic localization was introduced in [EAMS2022], toobtain an algorithm that samples from certain statistical mechanics models.This notes have three objectives: (i) Generalize the construction [EAMS2022]to other stochastic localization processes; (ii) Clarify the connection betweendiffusions and stochastic localization. In particular we show that standarddenoising diffusions are stochastic localizations but other examples that arenaturally suggested by the proposed viewpoint; (iii) Describe some insightsthat follow from this viewpoint.", "output": "Sampling, Diffusions, and Stochastic Localization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Precisely forecasting the excess returns of an asset (e.g., Tesla stock) isbeneficial to all investors. However, the unpredictability of market dynamics,influenced by human behaviors, makes this a challenging task. In priorresearch, researcher have manually crafted among of factors as signals to guidetheir investing process. In contrast, this paper view this problem in adifferent perspective that we align deep learning model to combine those humandesigned factors to predict the trend of excess returns. To this end, wepresent a 5-layer deep neural network that generates more meaningful factors ina 2048-dimensional space. Modern network design techniques are utilized toenhance robustness training and reduce overfitting. Additionally, we propose agated network that dynamically filters out noise-learned features, resulting inimproved performance. We evaluate our model over 2,000 stocks from the Chinamarket with their recent three years records. The experimental results showthat the proposed gated activation layer and the deep neural network couldeffectively overcome the problem. Specifically, the proposed gated activationlayer and deep neural network contribute to the superior performance of ourmodel. In summary, the proposed model exhibits promising results and couldpotentially benefit investors seeking to optimize their investment strategies.", "output": "Gated Deeper Models are Effective Factor Learners."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Gradient Boosting Decision Tree (GBDT) has achieved remarkable success in awide variety of applications. The split finding algorithm, which determines thetree construction process, is one of the most crucial components of GBDT.However, the split finding algorithm has long been criticized for its biastowards features with a large number of potential splits. This bias introducessevere interpretability and overfitting issues in GBDT. To this end, we providea fine-grained analysis of bias in GBDT and demonstrate that the biasoriginates from 1) the systematic bias in the gain estimation of each split and2) the bias in the split finding algorithm resulting from the use of the samedata to evaluate the split improvement and determine the best split. Based onthe analysis, we propose unbiased gain, a new unbiased measurement of gainimportance using out-of-bag samples. Moreover, we incorporate the unbiasedproperty into the split finding algorithm and develop UnbiasedGBM to solve theoverfitting issue of GBDT. We assess the performance of UnbiasedGBM andunbiased gain in a large-scale empirical study comprising 60 datasets and showthat: 1) UnbiasedGBM exhibits better performance than popular GBDTimplementations such as LightGBM, XGBoost, and Catboost on average on the 60datasets and 2) unbiased gain achieves better average performance in featureselection than popular feature importance methods. The codes are available at", "output": "Unbiased Gradient Boosting Decision Tree with Unbiased Feature Importance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "When the data used for reinforcement learning (RL) are collected by multipleagents in a distributed manner, federated versions of RL algorithms allowcollaborative learning without the need of sharing local data. In this paper,we consider federated Q-learning, which aims to learn an optimal Q-function byperiodically aggregating local Q-estimates trained on local data alone.Focusing on infinite-horizon tabular Markov decision processes, we providesample complexity guarantees for both the synchronous and asynchronous variantsof federated Q-learning. In both cases, our bounds exhibit a linear speedupwith respect to the number of agents and sharper dependencies on other salientproblem parameters. Moreover, existing approaches to federated Q-learning adoptan equally-weighted averaging of local Q-estimates, which can be highlysub-optimal in the asynchronous setting since the local trajectories can behighly heterogeneous due to different local behavior policies. Existing samplecomplexity scales inverse proportionally to the minimum entry of the stationarystate-action occupancy distributions over all agents, requiring that everyagent covers the entire state-action space. Instead, we propose a novelimportance averaging algorithm, giving larger weights to more frequentlyvisited state-action pairs. The improved sample complexity scales inverseproportionally to the minimum entry of the average stationary state-actionoccupancy distribution of all agents, thus only requiring the agentscollectively cover the entire state-action space, unveiling the blessing ofheterogeneity.", "output": "The Blessing of Heterogeneity in Federated Q-learning: Linear Speedup and Beyond."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There have thousands of crimes are happening daily all around. But peoplekeep statistics only few of them, therefore crime rates are increasing day byday. The reason behind can be less concern or less statistics of previouscrimes. It is much more important to observe the previous crime statistics forgeneral people to make their outing decision and police for catching thecriminals are taking steps to restrain the crimes and tourists to make theirtravelling decision. National institute of justice releases crime survey datafor the country, but does not offer crime statistics up to Union or Thanalevel. Considering all of these cases we have come up with an approach whichcan give an approximation to people about the safety of a specific locationwith crime ranking of different areas locating the crimes on a map including afuture crime occurrence prediction mechanism. Our approach relies on differentonline Bangla newspapers for crawling the crime data, stemming and keywordextraction, location finding algorithm, cosine similarity, naive Bayesclassifier, and a custom crime prediction model", "output": "Ranking the locations and predicting future crime occurrence by retrieving news from different Bangla online newspapers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Designing biological sequences is an important challenge that requiressatisfying complex constraints and thus is a natural problem to address withdeep generative modeling. Diffusion generative models have achievedconsiderable success in many applications. Score-based generative stochasticdifferential equations (SDE) model is a continuous-time diffusion modelframework that enjoys many benefits, but the originally proposed SDEs are notnaturally designed for modeling discrete data. To develop generative SDE modelsfor discrete data such as biological sequences, here we introduce a diffusionprocess defined in the probability simplex space with stationary distributionbeing the Dirichlet distribution. This makes diffusion in continuous spacenatural for modeling discrete data. We refer to this approach as Dirchletdiffusion score model. We demonstrate that this technique can generate samplesthat satisfy hard constraints using a Sudoku generation task. This generativemodel can also solve Sudoku, including hard puzzles, without additionaltraining. Finally, we applied this approach to develop the first human promoterDNA sequence design model and showed that designed sequences share similarproperties with natural promoter sequences.", "output": "Dirichlet Diffusion Score Model for Biological Sequence Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the development of large language models (LLMs), zero-shot learning hasattracted much attention for various NLP tasks. Different from prior works thatgenerate training data with billion-scale natural language generation (NLG)models, we propose a retrieval-enhanced framework to create training data froma general-domain unlabeled corpus. To realize this, we first conductcontrastive pretraining to learn an unsupervised dense retriever for extractingthe most relevant documents using class-descriptive verbalizers. We thenfurther propose two simple strategies, namely Verbalizer Augmentation withDemonstrations and Self-consistency Guided Filtering to improve the topiccoverage of the dataset while removing noisy examples. Experiments on ninedatasets demonstrate that REGEN achieves 4.3% gain over the strongest baselinesand saves around 70% of the time compared to baselines using large NLG models.Besides, REGEN can be naturally integrated with recently proposed largelanguage models to boost performance.", "output": "ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a framework and an algorithm to uncover the unknown parts ofnonlinear equations directly from data. The framework is based on eXtendedPhysics-Informed Neural Networks (X-PINNs), domain decomposition in space-time,but we augment the original X-PINN method by imposing flux continuity acrossthe domain interfaces. The well-known Allen-Cahn equation is used todemonstrate the approach. The Frobenius matrix norm is used to evaluate theaccuracy of the X-PINN predictions and the results show excellent performance.In addition, symbolic regression is employed to determine the closed form ofthe unknown part of the equation from the data, and the results confirm theaccuracy of the X-PINNs based approach. To test the framework in a situationresembling real-world data, random noise is added to the datasets to mimicscenarios such as the presence of thermal noise or instrument errors. Theresults show that the framework is stable against significant amount of noise.As the final part, we determine the minimal amount of data required fortraining the neural network. The framework is able to predict the correct formand coefficients of the underlying dynamical equation when at least 50% datais used for training.", "output": "A Framework Based on Symbolic Regression Coupled with eXtended Physics-Informed Neural Networks for Gray-Box Learning of Equations of Motion from Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With growing capabilities of large language models, prompting them has becomethe dominant way to access them. This has motivated the development ofstrategies for automatically selecting effective language prompts. In thispaper, we introduce prompt flatness, a new metric to quantify the expectedutility of a language prompt. This metric is inspired by flatnessregularization in statistical learning that quantifies the robustness of themodel towards its parameter perturbations. We provide theoretical foundationsfor this metric and its relationship with other prompt selection metrics,providing a comprehensive understanding of existing methods. Empirically, weshow that combining prompt flatness with existing metrics improves bothperformance and sample efficiency. Our metric outperforms the previous promptselection metrics with an average increase of 5% in accuracy and 10% in Pearsoncorrelation across 6 classification benchmarks.", "output": "Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Time-Series Mining (TSM) is an important research area since it shows greatpotential in practical applications. Deep learning models that rely on massivelabeled data have been utilized for TSM successfully. However, constructing alarge-scale well-labeled dataset is difficult due to data annotation costs.Recently, Pre-Trained Models have gradually attracted attention in the timeseries domain due to their remarkable performance in computer vision andnatural language processing. In this survey, we provide a comprehensive reviewof Time-Series Pre-Trained Models (TS-PTMs), aiming to guide the understanding,applying, and studying TS-PTMs. Specifically, we first briefly introduce thetypical deep learning models employed in TSM. Then, we give an overview ofTS-PTMs according to the pre-training techniques. The main categories weexplore include supervised, unsupervised, and self-supervised TS-PTMs. Further,extensive experiments are conducted to analyze the advantages and disadvantagesof transfer learning strategies, Transformer-based models, and representativeTS-PTMs. Finally, we point out some potential directions of TS-PTMs for futurework.", "output": "A Survey on Time-Series Pre-Trained Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Non-stationary multi-armed bandit (NS-MAB) problems have recently receivedsignificant attention. NS-MAB are typically modelled in two scenarios: abruptlychanging, where reward distributions remain constant for a certain period andchange at unknown time steps, and smoothly changing, where reward distributionsevolve smoothly based on unknown dynamics. In this paper, we propose DiscountedThompson Sampling (DS-TS) with Gaussian priors to address both non-stationarysettings. Our algorithm passively adapts to changes by incorporating adiscounted factor into Thompson Sampling. DS-TS method has been experimentallyvalidated, but analysis of the regret upper bound is currently lacking. Undermild assumptions, we show that DS-TS with Gaussian priors can achieve nearlyoptimal regret bound on the order of $tilde{O}(sqrt{TB_T})$ for abruptlychanging and $tilde{O}(T^{beta})$ for smoothly changing, where $T$ is thenumber of time steps, $B_T$ is the number of breakpoints, $beta$ is associatedwith the smoothly changing environment and $tilde{O}$ hides the parametersindependent of $T$ as well as logarithmic terms. Furthermore, empiricalcomparisons between DS-TS and other non-stationary bandit algorithmsdemonstrate its competitive performance. Specifically, when prior knowledge ofthe maximum expected reward is available, DS-TS has the potential to outperformstate-of-the-art algorithms.", "output": "Discounted Thompson Sampling for Non-Stationary Bandit Problems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Long-term time series forecasting has gained significant attention in recentyears. While there are various specialized designs for capturing temporaldependency, previous studies have demonstrated that a single linear layer canachieve competitive forecasting performance compared to other complexarchitectures. In this paper, we thoroughly investigate the intrinsiceffectiveness of recent approaches and make three key observations: 1) linearmapping is critical to prior long-term time series forecasting efforts; 2)RevIN (reversible normalization) and CI (Channel Independent) play a vital rolein improving overall forecasting performance; and 3) linear mapping caneffectively capture periodic features in time series and has robustness fordifferent periods across channels when increasing the input horizon. We providetheoretical and experimental explanations to support our findings and alsodiscuss the limitations and future works. Our framework's code is available aturl{", "output": "Revisiting Long-term Time Series Forecasting: An Investigation on Linear Mapping."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The transformer extends its success from the language to the vision domain.Because of the stacked self-attention and cross-attention blocks, theacceleration deployment of vision transformer on GPU hardware is challengingand also rarely studied. This paper thoroughly designs a compression scheme tomaximally utilize the GPU-friendly 2:4 fine-grained structured sparsity andquantization. Specially, an original large model with dense weight parametersis first pruned into a sparse one by 2:4 structured pruning, which considersthe GPU's acceleration of 2:4 structured sparse pattern with FP16 data type,then the floating-point sparse model is further quantized into a fixed-pointone by sparse-distillation-aware quantization aware training, which considersGPU can provide an extra speedup of 2:4 sparse calculation with integertensors. A mixed-strategy knowledge distillation is used during the pruning andquantization process. The proposed compression scheme is flexible to supportsupervised and unsupervised learning styles. Experiment results show GPUSQ-ViTscheme achieves state-of-the-art compression by reducing vision transformermodels 6.4-12.7 times on model size and 30.3-62 times on FLOPs with negligibleaccuracy degradation on ImageNet classification, COCO detection and ADE20Ksegmentation benchmarking tasks. Moreover, GPUSQ-ViT can boost actualdeployment performance by 1.39-1.79 times and 3.22-3.43 times of latency andthroughput on A100 GPU, and 1.57-1.69 times and 2.11-2.51 times improvement oflatency and throughput on AGX Orin.", "output": "Boost Vision Transformer with GPU-Friendly Sparsity and Quantization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although Federated Learning (FL) enables global model training across clientswithout compromising their raw data, existing Federated Averaging(FedAvg)-based methods suffer from the problem of low inference performance,especially for unevenly distributed data among clients. This is mainly becausei) FedAvg initializes client models with the same global models, which makesthe local training hard to escape from the local search for optimal solutions;and ii) by averaging model parameters in a coarse manner, FedAvg eclipses theindividual characteristics of local models. To address such issues thatstrongly limit the inference capability of FL, we propose a novel and effectiveFL paradigm named FedMR (Federated Model Recombination). Unlike conventionalFedAvg-based methods, the cloud server of FedMR shuffles each layer ofcollected local models and recombines them to achieve new models for localtraining on clients. Due to the diversified initialization models for clientscoupled with fine-grained model recombination, FedMR can converge to awell-generalized global model for all the clients, leading to a superiorinference performance. Experimental results show that, compared withstate-of-the-art FL methods, FedMR can significantly improve inference accuracyin a quicker manner without exposing client privacy.", "output": "FedMR: Federated Learning via Model Recombination."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep graph clustering has recently received significant attention due to itsability to enhance the representation learning capabilities of models inunsupervised scenarios. Nevertheless, deep clustering for temporal graphs,which could capture crucial dynamic interaction information, has not been fullyexplored. It means that in many clustering-oriented real-world scenarios,temporal graphs can only be processed as static graphs. This not only causesthe loss of dynamic information but also triggers huge computationalconsumption. To solve the problem, we propose a general framework for deepTemporal Graph Clustering called TGC, which adjusts deep clustering techniques(clustering assignment distribution and adjacency matrix reconstruction) tosuit the interaction sequence-based batch-processing pattern of temporalgraphs. In addition, we discuss differences between temporal graph clusteringand existing static graph clustering from several levels. To verify thesuperiority of the proposed framework TGC, we conduct extensive experiments.The experimental results show that temporal graph clustering enables moreflexibility in finding a balance between time and space requirements, and ourframework can effectively improve the performance of existing temporal graphlearning methods. Our code and supplementary material will be released afterpublication.", "output": "Deep Temporal Graph Clustering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Animal-borne sensors ('bio-loggers') can record a suite of kinematic andenvironmental data, which can elucidate animal ecophysiology and improveconservation efforts. Machine learning techniques are useful for interpretingthe large amounts of data recorded by bio-loggers, but there exists no standardfor comparing the different machine learning techniques in this domain. Toaddress this, we present the Bio-logger Ethogram Benchmark (BEBE), a collectionof datasets with behavioral annotations, standardized modeling tasks, andevaluation metrics. BEBE is to date the largest, most taxonomically diverse,publicly available benchmark of this type, and includes 1654 hours of datacollected from 149 individuals across nine taxa. We evaluate the performance often different machine learning methods on BEBE, and identify key challenges tobe addressed in future work. Datasets, models, and evaluation code are madepublicly available at to enable communityuse of BEBE as a point of comparison in methods development.", "output": "A benchmark for computational analysis of animal behavior, using animal-borne tags."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper studies a long-term resource allocation problem over multipleperiods where each period requires a multi-stage decision-making process. Weformulate the problem as an online resource allocation problem in an episodicfinite-horizon Markov decision process with unknown non-stationary transitionsand stochastic non-stationary reward and resource consumption functions foreach episode. We provide an equivalent online linear programming reformulationbased on occupancy measures, for which we develop an online mirror descentalgorithm. Our online dual mirror descent algorithm for resource allocationdeals with uncertainties and errors in estimating the true feasible set, whichis of independent interest. We prove that under stochastic reward and resourceconsumption functions, the expected regret of the online mirror descentalgorithm is bounded by $O(rho^{-1}{H^{3/2}}Ssqrt{AT})$ where $rhoin(0,1)$is the budget parameter, $H$ is the length of the horizon, $S$ and $A$ are thenumbers of states and actions, and $T$ is the number of episodes.", "output": "Online Resource Allocation in Episodic Markov Decision Processes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Prior beliefs about the latent function to shape inductive biases can beincorporated into a Gaussian Process (GP) via the kernel. However, beyondkernel choices, the decision-making process of GP models remains poorlyunderstood. In this work, we contribute an analysis of the loss landscape forGP models using methods from physics. We demonstrate $nu$-continuity forMatern kernels and outline aspects of catastrophe theory at critical points inthe loss landscape. By directly including $nu$ in the hyperparameteroptimisation for Matern kernels, we find that typical values of $nu$ are farfrom optimal in terms of performance, yet prevail in the literature due to theincreased computational speed. We also provide an a priori method forevaluating the effect of GP ensembles and discuss various voting approachesbased on physical properties of the loss landscape. The utility of theseapproaches is demonstrated for various synthetic and real datasets. Ourfindings provide an enhanced understanding of the decision-making processbehind GPs and offer practical guidance for improving their performance andinterpretability in a range of applications.", "output": "Physics Inspired Approaches Towards Understanding Gaussian Processes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent years have witnessed the great success of Graph Neural Networks (GNNs)in handling graph-related tasks. However, MLPs remain the primary workhorse forpractical industrial applications due to their desirable inference efficiencyand scalability. To reduce their gaps, one can directly distill knowledge froma well-designed teacher GNN to a student MLP, which is termed as GNN-to-MLPdistillation. However, the process of distillation usually entails a loss ofinformation, and ``which knowledge patterns of GNNs are more likely to be leftand distilled into MLPs?\" becomes an important question. In this paper, wefirst factorize the knowledge learned by GNNs into low- and high-frequencycomponents in the spectral domain and then derive their correspondence in thespatial domain. Furthermore, we identified a potential information drowningproblem for existing GNN-to-MLP distillation, i.e., the high-frequencyknowledge of the pre-trained GNNs may be overwhelmed by the low-frequencyknowledge during distillation; we have described in detail what it represents,how it arises, what impact it has, and how to deal with it. In this paper, wepropose an efficient Full-Frequency GNN-to-MLP (FF-G2M) distillation framework,which extracts both low-frequency and high-frequency knowledge from GNNs andinjects it into MLPs. Extensive experiments show that FF-G2M improves over thevanilla MLPs by 12.6% and outperforms its corresponding teacher GNNs by 2.6%averaged over six graph datasets and three common GNN architectures.", "output": "Extracting Low-/High- Frequency Knowledge from Graph Neural Networks and Injecting it into MLPs: An Effective GNN-to-MLP Distillation Framework."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The layout design of pipelines is a critical task in the constructionindustry. Currently, pipeline layout is designed manually by engineers, whichis time-consuming and laborious. Automating and streamlining this process canreduce the burden on engineers and save time. In this paper, we propose amethod for generating three-dimensional layout of pipelines based on deepreinforcement learning (DRL). Firstly, we abstract the geometric features ofspace to establish a training environment and define reward functions based onthree constraints: pipeline length, elbow, and installation distance. Next, wecollect data through interactions between the agent and the environment andtrain the DRL model. Finally, we use the well-trained DRL model toautomatically design a single pipeline. Our results demonstrate that DRL modelscan complete the pipeline layout task in space in a much shorter time thantraditional algorithms while ensuring high-quality layout outcomes.", "output": "Automatic Design Method of Building Pipeline Layout Based on Deep Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion Probability Models (DPMs) have made impressive advancements invarious machine learning domains. However, achieving high-quality syntheticsamples typically involves performing a large number of sampling steps, whichimpedes the possibility of real-time sample synthesis. Traditional acceleratedsampling algorithms via knowledge distillation rely on pre-trained modelweights and discrete time step scenarios, necessitating additional trainingsessions to achieve their goals. To address these issues, we propose theCatch-Up Distillation (CUD), which encourages the current moment output of thevelocity estimation model ``catch up'' with its previous moment output.Specifically, CUD adjusts the original Ordinary Differential Equation (ODE)training objective to align the current moment output with both the groundtruth label and the previous moment output, utilizing Runge-Kutta-basedmulti-step alignment distillation for precise ODE estimation while preventingasynchronous updates. Furthermore, we investigate the design space for CUDsunder continuous time-step scenarios and analyze how to determine the suitablestrategies. To demonstrate CUD's effectiveness, we conduct thorough ablationand comparison experiments on CIFAR-10, MNIST, and ImageNet-64. On CIFAR-10, weobtain a FID of 2.80 by sampling in 15 steps under one-session training and thenew state-of-the-art FID of 3.37 by sampling in one step with additionaltraining. This latter result necessitated only 62w iterations with a batch sizeof 128, in contrast to Consistency Distillation, which demanded 210w iterationswith a larger batch size of 256.", "output": "Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent years have witnessed the rapid development of heterogeneous graphneural networks (HGNNs) in information retrieval (IR) applications. Manyexisting HGNNs design a variety of tailor-made graph convolutions to capturestructural and semantic information in heterogeneous graphs. However, existingHGNNs usually represent each node as a single vector in the multi-layer graphconvolution calculation, which makes the high-level graph convolution layerfail to distinguish information from different relations and different orders,resulting in the information loss in the message passing. %insufficient miningof information. To this end, we propose a novel heterogeneous graph neuralnetwork with sequential node representation, namely Seq-HGNN. To avoid theinformation loss caused by the single vector node representation, we firstdesign a sequential node representation learning mechanism to represent eachnode as a sequence of meta-path representations during the node messagepassing. Then we propose a heterogeneous representation fusion module,empowering Seq-HGNN to identify important meta-paths and aggregate theirrepresentations into a compact one. We conduct extensive experiments on fourwidely used datasets from Heterogeneous Graph Benchmark (HGB) and Open GraphBenchmark (OGB). Experimental results show that our proposed method outperformsstate-of-the-art baselines in both accuracy and efficiency. The source code isavailable at ", "output": "Seq-HGNN: Learning Sequential Node Representation on Heterogeneous Graph."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurate analysis of speech articulation is crucial for speech analysis.However, X-Y coordinates of articulators strongly depend on the anatomy of thespeakers and the variability of pellet placements, and existing methods formapping anatomical landmarks in the X-ray Microbeam Dataset (XRMB) fail tocapture the entire anatomy of the vocal tract. In this paper, we propose a newgeometric transformation that improves the accuracy of these measurements. Ourtransformation maps anatomical landmarks' X-Y coordinates along the midsagittalplane onto six relative measures: Lip Aperture (LA), Lip Protusion (LP), TongueBody Constriction Location (TTCL), Degree (TBCD), Tongue Tip ConstrictionLocation (TTCL) and Degree (TTCD). Our novel contribution is the extension ofthe palate trace towards the inferred anterior pharyngeal line, which improvesmeasurements of tongue body constriction.", "output": "Enhancing Speech Articulation Analysis using a Geometric Transformation of the X-ray Microbeam Dataset."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the potential benefits of Diffusion Models for NLP applications,publicly available implementations, trained models, or reproducible trainingprocedures currently need to be publicly available. We present the DemocratizedDiffusion Language Model (DDLM), based on the Continuous Diffusion forCategorical Data (CDCD) framework, to address these challenges. We propose asimplified training procedure for DDLM using the C4 dataset and perform anin-depth analysis of the trained model's behavior. Furthermore, we introduce anovel early-exiting strategy for faster sampling with models trained with scoreinterpolation. Since no previous works aimed at solving downstream tasks withpre-trained Diffusion LM (e.g., classification tasks), we experimented withGLUE Benchmark to study the ability of DDLM to transfer knowledge. With thispaper, we propose available training and evaluation pipelines to otherresearchers and pre-trained DDLM models, which could be used in future researchwith Diffusion LMs.", "output": "Democratized Diffusion Language Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents FastFit, a novel neural vocoder architecture thatreplaces the U-Net encoder with multiple short-time Fourier transforms (STFTs)to achieve faster generation rates without sacrificing sample quality. Wereplaced each encoder block with an STFT, with parameters equal to the temporalresolution of each decoder block, leading to the skip connection. FastFitreduces the number of parameters and the generation time of the model by almosthalf while maintaining high fidelity. Through objective and subjectiveevaluations, we demonstrated that the proposed model achieves nearly twice thegeneration speed of baseline iteration-based vocoders while maintaining highsound quality. We further showed that FastFit produces sound qualities similarto those of other baselines in text-to-speech evaluation scenarios, includingmulti-speaker and zero-shot text-to-speech.", "output": "FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose Ahead-of-Time (AoT) P-Tuning, a novelparameter-efficient fine-tuning method for pre-trained Language Models (LMs)that adds input-dependent bias before each Transformer layer. We evaluate AoTP-Tuning on GLUE and SuperGLUE benchmarking datasets using RoBERTa and DeBERTamodels, showing that it outperforms BitFit and is comparable or better thanother baseline methods for efficient fine-tuning. Additionally, we assess theinference overhead of AoT P-Tuning and demonstrate that it introducesnegligible overhead compared to established baseline methods. Our methodenables multi-task inference with a single backbone LM, making it a practicalsolution for real-world applications.", "output": "Ahead-of-Time P-Tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent years have witnessed the growing popularity of domain-specificaccelerators (DSAs), such as Google's TPUs, for accelerating variousapplications such as deep learning, search, autonomous driving, etc. Tofacilitate DSA designs, high-level synthesis (HLS) is used, which allows adeveloper to compile a high-level description in the form of software code in Cand C++ into a design in low-level hardware description languages (such as VHDLor Verilog) and eventually synthesized into a DSA on an ASIC(application-specific integrated circuit) or FPGA (field-programmable gatearrays). However, existing HLS tools still require microarchitecture decisions,expressed in terms of pragmas (such as directives for parallelization andpipelining). To enable more people to design DSAs, it is desirable to automatesuch decisions with the help of deep learning for predicting the quality of HLSdesigns. This requires us a deeper understanding of the program, which is acombination of original code and pragmas. Naturally, these programs can beconsidered as sequence data, for which large language models (LLM) can help. Inaddition, these programs can be compiled and converted into a control data flowgraph (CDFG), and the compiler also provides fine-grained alignment between thecode tokens and the CDFG nodes. However, existing works either fail to leverageboth modalities or combine the two in shallow or coarse ways. We propose ProgSGallowing the source code sequence modality and the graph modalities to interactwith each other in a deep and fine-grained way. To alleviate the scarcity oflabeled designs, a pre-training method is proposed based on a suite ofcompiler's data flow analysis tasks. Experimental results on two benchmarkdatasets show the superiority of ProgSG over baseline methods that either onlyconsider one modality or combine the two without utilizing the alignmentinformation.", "output": "ProgSG: Cross-Modality Representation Learning for Programs in Electronic Design Automation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Uncertainty-quantification methods are applied to estimate the confidence ofdeep-neural-networks classifiers over their predictions. However, most widelyused methods are known to be overconfident. We address this problem bydeveloping an algorithm that exploits the latent-space representation of datapoints fed into the network, to assess the accuracy of their prediction. Usingthe latent-space representation generated by the fraction of training set thatthe network classifies correctly, we build a statistical model that is able tocapture the likelihood of a given prediction. We show on a synthetic datasetthat commonly used methods are mostly overconfident. Overconfidence occurs alsofor predictions made on data points that are outside the distribution thatgenerated the training data. In contrast, our method can detect suchout-of-distribution data points as inaccurately predicted, thus aiding in theautomatic detection of outliers.", "output": "Uncertainty Quantification in Deep Neural Networks through Statistical Inference on Latent Space."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Symbolic music generation aims to create musical notes, which can help userscompose music, such as generating target instrumental tracks from scratch, orbased on user-provided source tracks. Considering the diverse and flexiblecombination between source and target tracks, a unified model capable ofgenerating any arbitrary tracks is of crucial necessity. Previous works fail toaddress this need due to inherent constraints in music representations andmodel architectures. To address this need, we propose a unified representationand diffusion framework named GETMusic (`GET' stands for GEnerate musicTracks), which includes a novel music representation named GETScore, and adiffusion model named GETDiff. GETScore represents notes as tokens andorganizes them in a 2D structure, with tracks stacked vertically andprogressing horizontally over time. During training, tracks are randomlyselected as either the target or source. In the forward process, target tracksare corrupted by masking their tokens, while source tracks remain as groundtruth. In the denoising process, GETDiff learns to predict the masked targettokens, conditioning on the source tracks. With separate tracks in GETScore andthe non-autoregressive behavior of the model, GETMusic can explicitly controlthe generation of any target tracks from scratch or conditioning on sourcetracks. We conduct experiments on music generation involving six instrumentaltracks, resulting in a total of 665 combinations. GETMusic provideshigh-quality results across diverse combinations and surpasses prior worksproposed for some specific combinations.", "output": "GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Edge networks call for communication efficient (low overhead) and robustdistributed optimization (DO) algorithms. These are, in fact, desirablequalities for DO frameworks, such as federated edge learning techniques, in thepresence of data and system heterogeneity, and in scenarios where internodecommunication is the main bottleneck. Although computationally demanding,Newton-type (NT) methods have been recently advocated as enablers of robustconvergence rates in challenging DO problems where edge devices have sufficientcomputational power. Along these lines, in this work we propose Q-SHED, anoriginal NT algorithm for DO featuring a novel bit-allocation scheme based onincremental Hessian eigenvectors quantization. The proposed technique isintegrated with the recent SHED algorithm, from which it inherits appealingfeatures like the small number of required Hessian computations, while beingbandwidth-versatile at a bit-resolution level. Our empirical evaluation againstcompeting approaches shows that Q-SHED can reduce by up to 60% the number ofcommunication rounds required for convergence.", "output": "Q-SHED: Distributed Optimization at the Edge via Hessian Eigenvectors Quantization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Systems for serving inference requests on graph neural networks (GNN) mustcombine low latency with high throughout, but they face irregular computationdue to skew in the number of sampled graph nodes and aggregated GNN features.This makes it challenging to exploit GPUs effectively: using GPUs to sampleonly a few graph nodes yields lower performance than CPU-based sampling; andaggregating many features exhibits high data movement costs between GPUs andCPUs. Therefore, current GNN serving systems use CPUs for graph sampling andfeature aggregation, limiting throughput.We describe Quiver, a distributed GPU-based GNN serving system withlow-latency and high-throughput. Quiver's key idea is to exploit workloadmetrics for predicting the irregular computation of GNN requests, and governingthe use of GPUs for graph sampling and feature aggregation: (1) for graphsampling, Quiver calculates the probabilistic sampled graph size, a metric thatpredicts the degree of parallelism in graph sampling. Quiver uses this metricto assign sampling tasks to GPUs only when the performance gains surpassCPU-based sampling; and (2) for feature aggregation, Quiver relies on thefeature access probability to decide which features to partition and replicateacross a distributed GPU NUMA topology. We show that Quiver achieves up to 35times lower latency with an 8 times higher throughput compared tostate-of-the-art GNN approaches (DGL and PyG).", "output": "Quiver: Supporting GPUs for Low-Latency, High-Throughput GNN Serving with Workload Awareness."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The difficulty of appropriately assigning credit is particularly heightenedin cooperative MARL with sparse reward, due to the concurrent time andstructural scales involved. Automatic subgoal generation (ASG) has recentlyemerged as a viable MARL approach inspired by utilizing subgoals inintrinsically motivated reinforcement learning. However, end-to-end learning ofcomplex task planning from sparse rewards without prior knowledge, undoubtedlyrequires massive training samples. Moreover, the diversity-promoting nature ofexisting ASG methods can lead to the \"over-representation\" of subgoals,generating numerous spurious subgoals of limited relevance to the actual taskreward and thus decreasing the sample efficiency of the algorithm. To addressthis problem and inspired by the disentangled representation learning, wepropose a novel \"disentangled\" decision-making method, Semantically Alignedtask decomposition in MARL (SAMA), that prompts pretrained language models withchain-of-thought that can suggest potential goals, provide suitable goaldecomposition and subgoal allocation as well as self-reflection-basedreplanning. Additionally, SAMA incorporates language-grounded RL to train eachagent's subgoal-conditioned policy. SAMA demonstrates considerable advantagesin sample efficiency compared to state-of-the-art ASG methods, as evidenced byits performance on two challenging sparse-reward tasks, Overcooked and MiniRTS.", "output": "Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning on graphs is becoming prevalent in a wide range of applicationsincluding social networks, robotics, communication, medicine, etc. Thesedatasets belonging to entities often contain critical private information. Theutilization of data for graph learning applications is hampered by the growingprivacy concerns from users on data sharing. Existing privacy-preservingmethods pre-process the data to extract user-side features, and only thesefeatures are used for subsequent learning. Unfortunately, these methods arevulnerable to adversarial attacks to infer private attributes. We present anovel privacy-respecting framework for distributed graph learning andgraph-based machine learning. In order to perform graph learning and otherdownstream tasks on the server side, this framework aims to learn features aswell as distances without requiring actual features while preserving theoriginal structural properties of the raw data. The proposed framework is quitegeneric and highly adaptable. We demonstrate the utility of the Euclideanspace, but it can be applied with any existing method of distance approximationand graph learning for the relevant spaces. Through extensive experimentationon both synthetic and real datasets, we demonstrate the efficacy of theframework in terms of comparing the results obtained without data sharing tothose obtained with data sharing as a benchmark. This is, to our knowledge, thefirst privacy-preserving distributed graph learning framework.", "output": "Free Lunch for Privacy Preserving Distributed Graph Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Considering the case where the response variable is a categorical variableand the predictor is a random function, two novel functional sufficientdimensional reduction (FSDR) methods are proposed based on mutual informationand square loss mutual information. Compared to the classical FSDR methods,such as functional sliced inverse regression and functional sliced averagevariance estimation, the proposed methods are appealing because they arecapable of estimating multiple effective dimension reduction directions in thecase of a relatively small number of categories, especially for the binaryresponse. Moreover, the proposed methods do not require the restrictive linearconditional mean assumption and the constant covariance assumption. They avoidthe inverse problem of the covariance operator which is often encountered inthe functional sufficient dimension reduction. The functional principalcomponent analysis with truncation be used as a regularization mechanism. Undersome mild conditions, the statistical consistency of the proposed methods isestablished. It is demonstrated that the two methods are competitive comparedwith some existing FSDR methods by simulations and real data analyses.", "output": "Functional sufficient dimension reduction through information maximization with application to classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper addresses the problem of translating night-time thermal infraredimages, which are the most adopted image modalities to analyze night-timescenes, to daytime color images (NTIT2DC), which provide better perceptions ofobjects. We introduce a novel model that focuses on enhancing the quality ofthe target generation without merely colorizing it. The proposed structuralaware (StawGAN) enables the translation of better-shaped and high-definitionobjects in the target domain. We test our model on aerial images of theDroneVeichle dataset containing RGB-IR paired images. The proposed approachproduces a more accurate translation with respect to other state-of-the-artimage translation models. The source code is available at", "output": "StawGAN: Structural-Aware Generative Adversarial Networks for Infrared Image Translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recalibrating probabilistic classifiers is vital for enhancing thereliability and accuracy of predictive models. Despite the development ofnumerous recalibration algorithms, there is still a lack of a comprehensivetheory that integrates calibration and sharpness (which is essential formaintaining predictive power). In this paper, we introduce the concept ofminimum-risk recalibration within the framework of mean-squared-error (MSE)decomposition, offering a principled approach for evaluating and recalibratingprobabilistic classifiers. Using this framework, we analyze the uniform-massbinning (UMB) recalibration method and establish a finite-sample risk upperbound of order $tilde{O}(B/n + 1/B^2)$ where $B$ is the number of bins and $n$is the sample size. By balancing calibration and sharpness, we furtherdetermine that the optimal number of bins for UMB scales with $n^{1/3}$,resulting in a risk bound of approximately $O(n^{-2/3})$. Additionally, wetackle the challenge of label shift by proposing a two-stage approach thatadjusts the recalibration function using limited labeled data from the targetdomain. Our results show that transferring a calibrated classifier requiressignificantly fewer target samples compared to recalibrating from scratch. Wevalidate our theoretical findings through numerical simulations, which confirmthe tightness of the proposed bounds, the optimal number of bins, and theeffectiveness of label shift adaptation.", "output": "Minimum-Risk Recalibration of Classifiers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Moment restrictions and their conditional counterparts emerge in many areasof machine learning and statistics ranging from causal inference toreinforcement learning. Estimators for these tasks, generally called methods ofmoments, include the prominent generalized method of moments (GMM) which hasrecently gained attention in causal inference. GMM is a special case of thebroader family of empirical likelihood estimators which are based onapproximating a population distribution by means of minimizing a$varphi$-divergence to an empirical distribution. However, the use of$varphi$-divergences effectively limits the candidate distributions toreweightings of the data samples. We lift this long-standing limitation andprovide a method of moments that goes beyond data reweighting. This is achievedby defining an empirical likelihood estimator based on maximum mean discrepancywhich we term the kernel method of moments (KMM). We provide a variant of ourestimator for conditional moment restrictions and show that it isasymptotically first-order optimal for such problems. Finally, we show that ourmethod achieves competitive performance on several conditional momentrestriction tasks.", "output": "Estimation Beyond Data Reweighting: Kernel Method of Moments."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The trustworthiness of DNNs is often challenged by their vulnerability tominor adversarial perturbations, which may not only undermine predictionaccuracy (robustness) but also cause biased predictions for similar inputs(individual fairness). Accurate fairness has been recently proposed to enforcea harmonic balance between accuracy and individual fairness. It induces thenotion of fairness confusion matrix to categorize predictions as true fair,true biased, false fair, and false biased. This paper proposes a harmonicevaluation approach, RobustFair, for the accurate fairness of DNNs, usingadversarial perturbations crafted through fairness confusion directed gradientsearch. By using Taylor expansions to approximate the ground truths ofadversarial instances, RobustFair can particularly identify the robustnessdefects entangled for spurious fairness, which are often elusive in robustnessevaluation, and missing in individual fairness evaluation. RobustFair can boostrobustness and individual fairness evaluations by identifying robustness orfairness defects simultaneously. Empirical case studies on fairness benchmarkdatasets show that, compared with the state-of-the-art white-box robustness andindividual fairness testing approaches, RobustFair detects significantly1.77-11.87 times adversarial perturbations, yielding 1.83-13.12 times biasedand 1.53-8.22 times false instances. The adversarial instances can then beeffectively exploited to improve the accurate fairness (and hence accuracy andindividual fairness) of the original deep neural network through retraining.The empirical case studies further show that the adversarial instancesidentified by RobustFair outperform those identified by the other testingapproaches, in promoting 21% accurate fairness and 19% individual fairness onmultiple sensitive attributes, without losing accuracy at all or even promotingit by up to 4%.", "output": "RobustFair: Adversarial Evaluation through Fairness Confusion Directed Gradient Search."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Using only image-sentence pairs, weakly-supervised visual-textual groundingaims to learn region-phrase correspondences of the respective entity mentions.Compared to the supervised approach, learning is more difficult since boundingboxes and textual phrases correspondences are unavailable. In light of this, wepropose the Semantic Prior Refinement Model (SPRM), whose predictions areobtained by combining the output of two main modules. The first untrainedmodule aims to return a rough alignment between textual phrases and boundingboxes. The second trained module is composed of two sub-components that refinethe rough alignment to improve the accuracy of the final phrase-bounding boxalignments. The model is trained to maximize the multimodal similarity betweenan image and a sentence, while minimizing the multimodal similarity of the samesentence and a new unrelated image, carefully selected to help the most duringtraining. Our approach shows state-of-the-art results on two popular datasets,Flickr30k Entities and ReferIt, shining especially on ReferIt with a 9.6%absolute improvement. Moreover, thanks to the untrained component, it reachescompetitive performances just using a small fraction of training examples.", "output": "Weakly-Supervised Visual-Textual Grounding with Semantic Prior Refinement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Query performance prediction (QPP) is a core task in information retrieval.The QPP task is to predict the retrieval quality of a search system for a querywithout relevance judgments. Research has shown the effectiveness andusefulness of QPP for ad-hoc search. Recent years have witnessed considerableprogress in conversational search (CS). Effective QPP could help a CS system todecide an appropriate action to be taken at the next turn. Despite itspotential, QPP for CS has been little studied. We address this research gap byreproducing and studying the effectiveness of existing QPP methods in thecontext of CS. While the task of passage retrieval remains the same in the twosettings, a user query in CS depends on the conversational history, introducingnovel QPP challenges. In particular, we seek to explore to what extent findingsfrom QPP methods for ad-hoc search generalize to three CS settings: (i)estimating the retrieval quality of different query rewriting-based retrievalmethods, (ii) estimating the retrieval quality of a conversational denseretrieval method, and (iii) estimating the retrieval quality for top ranks vs.deeper-ranked lists. Our findings can be summarized as follows: (i) supervisedQPP methods distinctly outperform unsupervised counterparts only when alarge-scale training set is available; (ii) point-wise supervised QPP methodsoutperform their list-wise counterparts in most cases; and (iii) retrievalscore-based unsupervised QPP methods show high effectiveness in assessing theconversational dense retrieval method, ConvDR.", "output": "Query Performance Prediction: From Ad-hoc to Conversational Search."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative modeling has recently undergone remarkable advancements, primarilypropelled by the transformative implications of Diffusion Probabilistic Models(DPMs). The impressive capability of these models, however, often entailssignificant computational overhead during both training and inference. Totackle this challenge, we present Diff-Pruning, an efficient compression methodtailored for learning lightweight diffusion models from pre-existing ones,without the need for extensive re-training. The essence of Diff-Pruning isencapsulated in a Taylor expansion over pruned timesteps, a process thatdisregards non-contributory diffusion steps and ensembles informative gradientsto identify important weights. Our empirical assessment, undertaken across fourdiverse datasets highlights two primary benefits of our proposed method: 1)Efficiency: it enables approximately a 50% reduction in FLOPs at a mere 10% to20% of the original training expenditure; 2) Consistency: the pruned diffusionmodels inherently preserve generative behavior congruent with their pre-trainedprogenitors. Code is available at url{", "output": "Structural Pruning for Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "NLP methods can aid historians in analyzing textual materials in greatervolumes than manually feasible. Developing such methods poses substantialchallenges though. First, acquiring large, annotated historical datasets isdifficult, as only domain experts can reliably label them. Second, mostavailable off-the-shelf NLP models are trained on modern language texts,rendering them significantly less effective when applied to historical corpora.This is particularly problematic for less well studied tasks, and for languagesother than English. This paper addresses these challenges while focusing on theunder-explored task of event extraction from a novel domain of historicaltexts. We introduce a new multilingual dataset in English, French, and Dutchcomposed of newspaper ads from the early modern colonial period reporting onenslaved people who liberated themselves from enslavement. We find that: 1)even with scarce annotated data, it is possible to achieve surprisingly goodresults by formulating the problem as an extractive QA task and leveragingexisting datasets and models for modern languages; and 2) cross-linguallow-resource learning for historical languages is highly challenging, andmachine translation of the historical datasets to the considered targetlanguages is, in practice, often the best-performing solution.", "output": "Multilingual Event Extraction from Historical Newspaper Adverts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose a novel algorithm for energy-efficient,low-latency, accurate inference at the wireless edge, in the context of 6Gnetworks endowed with reconfigurable intelligent surfaces (RISs). We consider ascenario where new data are continuously generated/collected by a set ofdevices and are handled through a dynamic queueing system. Building on themarriage between Lyapunov stochastic optimization and deep reinforcementlearning (DRL), we devise a dynamic learning algorithm that jointly optimizesthe data compression scheme, the allocation of radio resources (i.e., power,transmission precoding), the computation resources (i.e., CPU cycles), and theRIS reflectivity parameters (i.e., phase shifts), with the aim of performingenergy-efficient edge classification with end-to-end (E2E) delay and inferenceaccuracy constraints. The proposed strategy enables dynamic control of thesystem and of the wireless propagation environment, performing a low-complexityoptimization on a per-slot basis while dealing with time-varying radio channelsand task arrivals, whose statistics are unknown. Numerical results assess theperformance of the proposed RIS-empowered edge inference strategy in terms oftrade-off between energy, delay, and accuracy of a classification task.", "output": "Lyapunov-Driven Deep Reinforcement Learning for Edge Inference Empowered by Reconfigurable Intelligent Surfaces."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reducing the number of bits needed to encode the weights and activations ofneural networks is highly desirable as it speeds up their training andinference time while reducing memory consumption. For these reasons, researchin this area has attracted significant attention toward developing neuralnetworks that leverage lower-precision computing, such as mixed-precisiontraining. Interestingly, none of the existing approaches has investigated pure16-bit floating-point settings. In this paper, we shed light on the overlookedefficiency of pure 16-bit floating-point neural networks. As such, we provide acomprehensive theoretical analysis to investigate the factors contributing tothe differences observed between 16-bit and 32-bit models. We formalize theconcepts of floating-point error and tolerance, enabling us to quantitativelyexplain the conditions under which a 16-bit model can closely approximate theresults of its 32-bit counterpart. This theoretical exploration offersperspective that is distinct from the literature which attributes the successof low-precision neural networks to its regularization effect. This in-depthanalysis is supported by an extensive series of experiments. Our findingsdemonstrate that pure 16-bit floating-point neural networks can achieve similaror even better performance than their mixed-precision and 32-bit counterparts.We believe the results presented in this paper will have significantimplications for machine learning practitioners, offering an opportunity toreconsider using pure 16-bit networks in various applications.", "output": "In Defense of Pure 16-bit Floating-Point Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes an actor-critic algorithm for controlling the temperatureof a battery pack using a cooling fluid. This is modeled by a coupled 1Dpartial differential equation (PDE) with a controlled advection term thatdetermines the speed of the cooling fluid. The Hamilton-Jacobi-Bellman (HJB)equation is a PDE that evaluates the optimality of the value function anddetermines an optimal controller. We propose an algorithm that treats the valuenetwork as a Physics-Informed Neural Network (PINN) to solve for thecontinuous-time HJB equation rather than a discrete-time Bellman optimalityequation, and we derive an optimal controller for the environment that weexploit to achieve optimal control. Our experiments show that a hybrid-policymethod that updates the value network using the HJB equation and updates thepolicy network identically to PPO achieves the best results in the control ofthis PDE system.", "output": "Actor-Critic Methods using Physics-Informed Neural Networks: Control of a 1D PDE Model for Fluid-Cooled Battery Packs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sparse Neural Networks (SNNs) can potentially demonstrate similar performanceto their dense counterparts while saving significant energy and memory atinference. However, the accuracy drop incurred by SNNs, especially at highpruning ratios, can be an issue in critical deployment conditions. While recentworks mitigate this issue through sophisticated pruning techniques, we shiftour focus to an overlooked factor: hyperparameters and activation functions.Our analyses have shown that the accuracy drop can additionally be attributedto (i) Using ReLU as the default choice for activation functions unanimously,and (ii) Fine-tuning SNNs with the same hyperparameters as dense counterparts.Thus, we focus on learning a novel way to tune activation functions for sparsenetworks and combining these with a separate hyperparameter optimization (HPO)regime for sparse networks. By conducting experiments on popular DNN models(LeNet-5, VGG-16, ResNet-18, and EfficientNet-B0) trained on MNIST, CIFAR-10,and ImageNet-16 datasets, we show that the novel combination of these twoapproaches, dubbed Sparse Activation Function Search, short: SAFS, results inup to 15.53%, 8.88%, and 6.33% absolute improvement in the accuracy forLeNet-5, VGG-16, and ResNet-18 over the default training protocols, especiallyat high pruning ratios. Our code can be found at ", "output": "Learning Activation Functions for Sparse Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The development of Policy Iteration (PI) has inspired many recent algorithmsfor Reinforcement Learning (RL), including several policy gradient methods,that gained both theoretical soundness and empirical success on a variety oftasks. The theory of PI is rich in the context of centralized learning, but itsstudy is still in the infant stage under the federated setting. This paperexplores the federated version of Approximate PI (API) and derives its errorbound, taking into account the approximation error introduced by environmentheterogeneity. We theoretically prove that a proper client selection scheme canreduce this error bound. Based on the theoretical result, we propose a clientselection algorithm to alleviate the additional approximation error caused byenvironment heterogeneity. Experiment results show that the proposed algorithmoutperforms other biased and unbiased client selection methods on the federatedmountain car problem by effectively selecting clients with a lower level ofheterogeneity from the population distribution.", "output": "Client Selection for Federated Policy Optimization with Environment Heterogeneity."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spiking Neural Networks (SNNs) have attracted recent interest due to theirenergy efficiency and biological plausibility. However, the performance of SNNsstill lags behind traditional Artificial Neural Networks (ANNs), as there is noconsensus on the best learning algorithm for SNNs. Best-performing SNNs arebased on ANN to SNN conversion or learning with spike-based backpropagationthrough surrogate gradients. The focus of recent research has been ondeveloping and testing different learning strategies, with hand-tailoredarchitectures and parameter tuning. Neuroevolution (NE), has proven successfulas a way to automatically design ANNs and tune parameters, but its applicationsto SNNs are still at an early stage. DENSER is a NE framework for the automaticdesign and parametrization of ANNs, based on the principles of GeneticAlgorithms (GA) and Structured Grammatical Evolution (SGE). In this paper, wepropose SPENSER, a NE framework for SNN generation based on DENSER, for imageclassification on the MNIST and Fashion-MNIST datasets. SPENSER generatescompetitive performing networks with a test accuracy of 99.42% and 91.65%respectively.", "output": "SPENSER: Towards a NeuroEvolutionary Approach for Convolutional Spiking Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative models trained with Differential Privacy (DP) are increasinglyused to produce synthetic data while reducing privacy risks. Navigating theirspecific privacy-utility tradeoffs makes it challenging to determine whichmodels would work best for specific settings/tasks. In this paper, we fill thisgap in the context of tabular data by analyzing how DP generative modelsdistribute privacy budgets across rows and columns, arguably the main source ofutility degradation. We examine the main factors contributing to how privacybudgets are spent, including underlying modeling techniques, DP mechanisms, anddata dimensionality.Our extensive evaluation of both graphical and deep generative models shedslight on the distinctive features that render them suitable for differentsettings and tasks. We show that graphical models distribute the privacy budgethorizontally and thus cannot handle relatively wide datasets while theperformance on the task they were optimized for monotonically increases withmore data. Deep generative models spend their budget per iteration, so theirbehavior is less predictable with varying dataset dimensions but could performbetter if trained on more features. Also, low levels of privacy($epsilongeq100$) could help some models generalize, achieving better resultsthan without applying DP.", "output": "Understanding how Differentially Private Generative Models Spend their Privacy Budget."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Lifelong learning agents aim to learn multiple tasks sequentially over alifetime. This involves the ability to exploit previous knowledge when learningnew tasks and to avoid forgetting. Modulating masks, a specific type ofparameter isolation approach, have recently shown promise in both supervisedand reinforcement learning. While lifelong learning algorithms have beeninvestigated mainly within a single-agent approach, a question remains on howmultiple agents can share lifelong learning knowledge with each other. We showthat the parameter isolation mechanism used by modulating masks is particularlysuitable for exchanging knowledge among agents in a distributed anddecentralized system of lifelong learners. The key idea is that the isolationof specific task knowledge to specific masks allows agents to transfer onlyspecific knowledge on-demand, resulting in robust and effective distributedlifelong learning. We assume fully distributed and asynchronous scenarios withdynamic agent numbers and connectivity. An on-demand communication protocolensures agents query their peers for specific masks to be transferred andintegrated into their policies when facing each task. Experiments indicate thaton-demand mask communication is an effective way to implement distributedlifelong reinforcement learning and provides a lifelong learning benefit withrespect to distributed RL baselines such as DD-PPO, IMPALA, and PPO+EWC. Thesystem is particularly robust to connection drops and demonstrates rapidlearning due to knowledge exchange.", "output": "Sharing Lifelong Reinforcement Learning Knowledge via Modulating Masks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Optimal auction design is a fundamental problem in algorithmic game theory.This problem is notoriously difficult already in very simple settings. Recentwork in differentiable economics showed that neural networks can efficientlylearn known optimal auction mechanisms and discover interesting new ones. In anattempt to theoretically justify their empirical success, we focus on one ofthe first such networks, RochetNet, and a generalized version for affinemaximizer auctions. We prove that they satisfy mode connectivity, i.e., locallyoptimal solutions are connected by a simple, piecewise linear path such thatevery solution on the path is almost as good as one of the two local optima.Mode connectivity has been recently investigated as an intriguing empirical andtheoretically justifiable property of neural networks used for predictionproblems. Our results give the first such analysis in the context ofdifferentiable economics, where neural networks are used directly for solvingnon-convex optimization problems.", "output": "Mode Connectivity in Auction Design."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Policy gradient algorithms are an important family of deep reinforcementlearning techniques. Many past research endeavors focused on using thefirst-order policy gradient information to train policy networks. Differentfrom these works, we conduct research in this paper driven by the believe thatproperly utilizing and controlling Hessian information associated with thepolicy gradient can noticeably improve the performance of policy gradientalgorithms. One key Hessian information that attracted our attention is theHessian trace, which gives the divergence of the policy gradient vector fieldin the Euclidean policy parametric space. We set the goal to generalize thisEuclidean policy parametric space into a general Riemmanian manifold byintroducing a metric tensor field $g_ab$ in the parametric space. This isachieved through newly developed mathematical tools, deep learning algorithms,and metric tensor deep neural networks (DNNs). Armed with these technicaldevelopments, we propose a new policy gradient algorithm that learns tominimize the absolute divergence in the Riemannian manifold as an importantregularization mechanism, allowing the Riemannian manifold to smoothen itspolicy gradient vector field. The newly developed algorithm is experimentallystudied on several benchmark reinforcement learning problems. Our experimentsclearly show that the new metric tensor regularized algorithm can significantlyoutperform its counterpart that does not use our regularization technique.Additional experimental analysis further suggests that the trained metrictensor DNN and the corresponding metric tensor $g_{ab}$ can effectively reducethe absolute divergence towards zero in the Riemannian manifold.", "output": "Deep Metric Tensor Regularized Policy Gradient."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reweighted wake-sleep (RWS) is a machine learning method for performingBayesian inference in a very general class of models. RWS draws $K$ samplesfrom an underlying approximate posterior, then uses importance weighting toprovide a better estimate of the true posterior. RWS then updates itsapproximate posterior towards the importance-weighted estimate of the trueposterior. However, recent work [Chattergee and Diaconis, 2018] indicates thatthe number of samples required for effective importance weighting isexponential in the number of latent variables. Attaining such a large number ofimportance samples is intractable in all but the smallest models. Here, wedevelop massively parallel RWS, which circumvents this issue by drawing $K$samples of all $n$ latent variables, and individually reasoning about all $K^n$possible combinations of samples. While reasoning about $K^n$ combinationsmight seem intractable, the required computations can be performed inpolynomial time by exploiting conditional independencies in the generativemodel. We show considerable improvements over standard \"global\" RWS, whichdraws $K$ samples from the full joint.", "output": "Massively Parallel Reweighted Wake-Sleep."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While policy optimization algorithms have played an important role in recentempirical success of Reinforcement Learning (RL), the existing theoreticalunderstanding of policy optimization remains rather limited -- they are eitherrestricted to tabular MDPs or suffer from highly suboptimal sample complexity,especial in online RL where exploration is necessary. This paper proposes asimple efficient policy optimization framework -- Optimistic NPG for online RL.Optimistic NPG can be viewed as simply combining of the classic natural policygradient (NPG) algorithm [Kakade, 2001] with optimistic policy evaluationsubroutines to encourage exploration. For $d$-dimensional linear MDPs,Optimistic NPG is computationally efficient, and learns an$varepsilon$-optimal policy within $tilde{O}(d^2/varepsilon^3)$ samples,which is the first computationally efficient algorithm whose sample complexityhas the optimal dimension dependence $tilde{Theta}(d^2)$. It also improvesover state-of-the-art results of policy optimization algorithms [Zanette etal., 2021] by a factor of $d$. For general function approximation that subsumeslinear MDPs, Optimistic NPG, to our best knowledge, is also the first policyoptimization algorithm that achieves the polynomial sample complexity forlearning near-optimal policies.", "output": "Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual Question Answering (VQA) is an emerging area of interest forresearches, being a recent problem in natural language processing and imageprediction. In this area, an algorithm needs to answer questions about certainimages. As of the writing of this survey, 25 recent studies were analyzed.Besides, 6 datasets were analyzed and provided their link to download. In thiswork, several recent pieces of research in this area were investigated and adeeper analysis and comparison among them were provided, including results, thestate-of-the-art, common errors, and possible points of improvement for futureresearchers.", "output": "Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advancements in artificial intelligence (AI) and machine learning (ML)algorithms, coupled with the availability of faster computing infrastructure,have enhanced the security posture of cybersecurity operations centers(defenders) through the development of ML-aided network intrusion detectionsystems (NIDS). Concurrently, the abilities of adversaries to evade securityhave also increased with the support of AI/ML models. Therefore, defenders needto proactively prepare for evasion attacks that exploit the detectionmechanisms of NIDS. Recent studies have found that the perturbation offlow-based and packet-based features can deceive ML models, but theseapproaches have limitations. Perturbations made to the flow-based features aredifficult to reverse-engineer, while samples generated with perturbations tothe packet-based features are not playable.Our methodological framework, Deep PackGen, employs deep reinforcementlearning to generate adversarial packets and aims to overcome the limitationsof approaches in the literature. By taking raw malicious network packets asinputs and systematically making perturbations on them, Deep PackGencamouflages them as benign packets while still maintaining their functionality.In our experiments, using publicly available data, Deep PackGen achieved anaverage adversarial success rate of 66.4% against various ML models and acrossdifferent attack types. Our investigation also revealed that more than 45% ofthe successful adversarial samples were out-of-distribution packets that evadedthe decision boundaries of the classifiers. The knowledge gained from our studyon the adversary's ability to make specific evasive perturbations to differenttypes of malicious packets can help defenders enhance the robustness of theirNIDS against evolving adversarial attacks.", "output": "Deep PackGen: A Deep Reinforcement Learning Framework for Adversarial Network Packet Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The quantum perceptron, the variational circuit, and the Grover algorithmhave been proposed as promising components for quantum machine learning. Thispaper presents a new quantum perceptron that combines the quantum variationalcircuit and the Grover algorithm. However, this does not guarantee that thisquantum variational perceptron with Grover's algorithm (QVPG) will have anyadvantage over its quantum variational (QVP) and classical counterparts. Here,we examine the performance of QVP and QVP-G by computing their loss functionand analyzing their accuracy on the classification task, then comparing thesetwo quantum models to the classical perceptron (CP). The results show that ourtwo quantum models are more efficient than CP, and our novel suggested modelQVP-G outperforms the QVP, demonstrating that the Grover can be applied to theclassification task and even makes the model more accurate, besides theunstructured search problems.", "output": "Simulation of a Variational Quantum Perceptron using Grover's Algorithm."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We address the problem of denoising data from a Gaussian mixture using atwo-layer non-linear autoencoder with tied weights and a skip connection. Weconsider the high-dimensional limit where the number of training samples andthe input dimension jointly tend to infinity while the number of hidden unitsremains bounded. We provide closed-form expressions for the denoisingmean-squared test error. Building on this result, we quantitativelycharacterize the advantage of the considered architecture over the autoencoderwithout the skip connection that relates closely to principal componentanalysis. We further show that our results accurately capture the learningcurves on a range of real data sets.", "output": "High-dimensional Asymptotics of Denoising Autoencoders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a general methodology for deriving information-theoreticgeneralization bounds for learning algorithms. The main technical tool is aprobabilistic decorrelation lemma based on a change of measure and a relaxationof Young's inequality in $L_{psi_p}$ Orlicz spaces. Using the decorrelationlemma in combination with other techniques, such as symmetrization, couplings,and chaining in the space of probability measures, we obtain new upper boundson the generalization error, both in expectation and in high probability, andrecover as special cases many of the existing generalization bounds, includingthe ones based on mutual information, conditional mutual information,stochastic chaining, and PAC-Bayes inequalities. In addition, theFernique-Talagrand upper bound on the expected supremum of a subgaussianprocess emerges as a special case.", "output": "A unified framework for information-theoretic generalization bounds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Minimizing the difference of two submodular (DS) functions is a problem thatnaturally occurs in various machine learning problems. Although it is wellknown that a DS problem can be equivalently formulated as the minimization ofthe difference of two convex (DC) functions, existing algorithms do not fullyexploit this connection. A classical algorithm for DC problems is called the DCalgorithm (DCA). We introduce variants of DCA and its complete form (CDCA) thatwe apply to the DC program corresponding to DS minimization. We extend existingconvergence properties of DCA, and connect them to convergence properties onthe DS problem. Our results on DCA match the theoretical guarantees satisfiedby existing DS algorithms, while providing a more complete characterization ofconvergence properties. In the case of CDCA, we obtain a stronger localminimality guarantee. Our numerical results show that our proposed algorithmsoutperform existing baselines on two applications: speech corpus selection andfeature selection.", "output": "Difference of Submodular Minimization via DC Programming."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Inspired by the traditional partial differential equation (PDE) approach forimage denoising, we propose a novel neural network architecture, referred asNODE-ImgNet, that combines neural ordinary differential equations (NODEs) withconvolutional neural network (CNN) blocks. NODE-ImgNet is intrinsically a PDEmodel, where the dynamic system is learned implicitly without the explicitspecification of the PDE. This naturally circumvents the typical issuesassociated with introducing artifacts during the learning process. By invokingsuch a NODE structure, which can also be viewed as a continuous variant of aresidual network (ResNet) and inherits its advantage in image denoising, ourmodel achieves enhanced accuracy and parameter efficiency. In particular, ourmodel exhibits consistent effectiveness in different scenarios, includingdenoising gray and color images perturbed by Gaussian noise, as well asreal-noisy images, and demonstrates superiority in learning from small imagedatasets.", "output": "NODE-ImgNet: a PDE-informed effective and robust model for image denoising."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Regularization plays a pivotal role in ill-posed machine learning and inverseproblems. However, the fundamental comparative analysis of variousregularization norms remains open. We establish a small noise analysisframework to assess the effects of norms in Tikhonov and RKHS regularizations,in the context of ill-posed linear inverse problems with Gaussian noise. Thisframework studies the convergence rates of regularized estimators in the smallnoise limit and reveals the potential instability of the conventionalL2-regularizer. We solve such instability by proposing an innovative class ofadaptive fractional RKHS regularizers, which covers the L2 Tikhonov and RKHSregularizations by adjusting the fractional smoothness parameter. A surprisinginsight is that over-smoothing via these fractional RKHSs consistently yieldsoptimal convergence rates, but the optimal hyper-parameter may decay too fastto be selected in practice.", "output": "Small noise analysis for Tikhonov and RKHS regularizations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Inverse problems describe the task of recovering an underlying signal ofinterest given observables. Typically, the observables are related via somenon-linear forward model applied to the underlying unknown signal. Invertingthe non-linear forward model can be computationally expensive, as it ofteninvolves computing and inverting a linearization at a series of estimates.Rather than inverting the physics-based model, we instead train a surrogateforward model (emulator) and leverage modern auto-grad libraries to solve forthe input within a classical optimization framework. Current methods to trainemulators are done in a black box supervised machine learning fashion and failto take advantage of any existing knowledge of the forward model. In thisarticle, we propose a simple learned weighted average model that embedslinearizations of the forward model around various reference points into themodel itself, explicitly incorporating known physics. Grounding the learnedmodel with physics based linearizations improves the forward modeling accuracyand provides richer physics based gradient information during the inversionprocess leading to more accurate signal recovery. We demonstrate the efficacyon an ocean acoustic tomography (OAT) example that aims to recover ocean soundspeed profile (SSP) variations from acoustic observations (e.g. eigenrayarrival times) within simulation of ocean dynamics in the Gulf of Mexico.", "output": "PETAL: Physics Emulation Through Averaged Linearizations for Solving Inverse Problems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Past work demonstrated that using neural networks, we can extend unfinishedmusic pieces while maintaining the music style of the musician. With recentadvancements in large language models and diffusion models, we are now capableof generating comics with an interesting storyline while maintaining the artstyle of the artist. In this paper, we used ChatGPT to generate storylines anddialogue and then generated the comic using stable diffusion. We introduced anovel way to evaluate AI-generated stories, and we achieved SOTA performance oncharacter fidelity and art style by fine-tuning stable diffusion using LoRA,ControlNet, etc.", "output": "Generating coherent comic with rich story using ChatGPT and Stable Diffusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A considerable number of texts encountered daily are somehow connected witheach other. For example, Wikipedia articles refer to other articles viahyperlinks, scientific papers relate to others via citations or (co)authors,while tweets relate via users that follow each other or reshare content. Hence,a graph-like structure can represent existing connections and be seen ascapturing the \"context\" of the texts. The question thus arises if extractingand integrating such context information into a language model might helpfacilitate a better automated understanding of the text. In this study, weexperimentally demonstrate that incorporating graph-based contextualizationinto BERT model enhances its performance on an example of a classificationtask. Specifically, on Pubmed dataset, we observed a reduction in error from8.51% to 7.96%, while increasing the number of parameters just by 1.6%.Our source code: ", "output": "Enriching language models with graph-based context information to better understand textual data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Contemporary face detection algorithms have to deal with many challenges suchas variations in pose, illumination, and scale. A subclass of the facedetection problem that has recently gained increasing attention is occludedface detection, or more specifically, the detection of masked faces. Threeyears on since the advent of the COVID-19 pandemic, there is still a completelack of evidence regarding how well existing face detection algorithms performon masked faces. This article first offers a brief review of state-of-the-artface detectors and detectors made for the masked face problem, along with areview of the existing masked face datasets. We evaluate and compare theperformances of a well-representative set of face detectors at masked facedetection and conclude with a discussion on the possible contributing factorsto their performance.", "output": "A Comparative Study of Face Detection Algorithms for Masked Face Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "People usually have different intents for choosing items, while theirpreferences under the same intent may also different. In traditionalcollaborative filtering approaches, both intent and preference factors areusually entangled in the modeling process, which significantly limits therobustness and interpretability of recommendation performances. For example,the low-rating items are always treated as negative feedback while theyactually could provide positive information about user intent. To this end, inthis paper, we propose a two-fold representation learning approach, namelyDouble Disentangled Collaborative Filtering (DDCF), for personalizedrecommendations. The first-level disentanglement is for separating theinfluence factors of intent and preference, while the second-leveldisentanglement is performed to build independent sparse preferencerepresentations under individual intent with limited computational complexity.Specifically, we employ two variational autoencoder networks, intentrecognition network and preference decomposition network, to learn the intentand preference factors, respectively. In this way, the low-rating items will betreated as positive samples for modeling intents while the negative samples formodeling preferences. Finally, extensive experiments on three real-worlddatasets and four evaluation metrics clearly validate the effectiveness and theinterpretability of DDCF.", "output": "Preference or Intent? Double Disentangled Collaborative Filtering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Typical generative diffusion models rely on a Gaussian diffusion process fortraining the backward transformations, which can then be used to generatesamples from Gaussian noise. However, real world data often takes place indiscrete-state spaces, including many scientific applications. Here, we developa theoretical formulation for arbitrary discrete-state Markov processes in theforward diffusion process using exact (as opposed to variational) analysis. Werelate the theory to the existing continuous-state Gaussian diffusion as wellas other approaches to discrete diffusion, and identify the correspondingreverse-time stochastic process and score function in the continuous-timesetting, and the reverse-time mapping in the discrete-time setting. As anexample of this framework, we introduce ``Blackout Diffusion'', which learns toproduce samples from an empty image instead of from noise. Numericalexperiments on the CIFAR-10, Binarized MNIST, and CelebA datasets confirm thefeasibility of our approach. Generalizing from specific (Gaussian) forwardprocesses to discrete-state processes without a variational approximation shedslight on how to interpret diffusion models, which we discuss.", "output": "Blackout Diffusion: Generative Diffusion Models in Discrete-State Spaces."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Foundation models (e.g., CLIP or DINOv2) have shown their impressive learningand transferring capabilities on a wide range of visual tasks, by training on alarge corpus of data and adapting to specific downstream tasks. It is, however,interesting that foundation models have not been fully explored for universaldomain adaptation (UniDA), which is to learn models using labeled data in asource domain and unlabeled data in a target one, such that the learned modelscan successfully adapt to the target data. In this paper, we make comprehensiveempirical studies of state-of-the-art UniDA methods using foundation models. Wefirst demonstrate that, while foundation models greatly improve the performanceof the baseline methods that train the models on the source data alone,existing UniDA methods generally fail to improve over the baseline. Thissuggests that new research efforts are very necessary for UniDA usingfoundation models. To this end, we propose a very simple method of target datadistillation on the CLIP model, and achieves consistent improvement over thebaseline across all the UniDA benchmarks. Our studies are under a newlyproposed evaluation metric of universal classification rate (UCR), which isthreshold- and ratio-free and addresses the threshold-sensitive issueencountered when using the existing H-score metric.", "output": "Universal Domain Adaptation from Foundation Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We investigate the emergent abilities of the recently proposed web-scalespeech model Whisper, by adapting it to unseen tasks with prompt engineering.We selected three tasks: audio-visual speech recognition (AVSR), code-switchedspeech recognition (CS-ASR), and speech translation (ST) on unseen languagepairs. We design task-specific prompts, by either leveraging anotherlarge-scale model, or simply manipulating the special tokens in the defaultprompts. Experiments show that compared to the default prompts, our proposedprompts improve performance by 10% to 45% on the three zero-shot tasks, andeven outperform SotA supervised models on some datasets. In addition, ourexperiments reveal many interesting properties of Whisper, including itsrobustness to prompts, bias on accents, and the multilingual understanding inits latent space. Code is available at", "output": "Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unifying semi-supervised learning (SSL) and open-set recognition into asingle learning policy would facilitate the development of cost-efficient andapplication-grade classifiers. However, previous attempts do not clarify thedifference between unobserved novel categories (those only seen during testing)and observed novel categories (those present in unlabelled training data). Thisstudy introduces Open-Set Learning with Augmented Category by ExploitingUnlabelled Data (Open-LACU), the first policy that generalises between bothnovel category types. We adapt the state-of-the-art OSR method of MarginGenerative Adversarial Networks (Margin-GANs) into several Open-LACUconfigurations, setting the benchmarks for Open-LACU and offering uniqueinsights into novelty detection using Margin-GANs. Finally, we highlight thesignificance of the Open-LACU policy by discussing the applications of semanticsegmentation in remote sensing, object detection in radiology and diseaseidentification through cough analysis. These applications include observed andunobserved novel categories, making Open-LACU essential for trainingclassifiers in these big data domains.", "output": "Unified machine learning: Open-set learning with augmented category by exploiting unlabelled data (Open-LACU)."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study online learning in repeated first-price auctions where a bidder,only observing the winning bid at the end of each auction, learns to adaptivelybid in order to maximize her cumulative payoff. To achieve this goal, thebidder faces a censored feedback: if she wins the bid, then she is not able toobserve the highest bid of the other bidders, which we assume is textit{iid}drawn from an unknown distribution. In this paper, we develop the firstlearning algorithm that achieves a near-optimal $widetilde{O}(sqrt{T})$regret bound, by exploiting two structural properties of first-price auctions,i.e. the specific feedback structure and payoff function.The feedback in first-price auctions combines the graph feedback acrossactions (bids), the cross learning across contexts (private values), and apartial order over the contexts; we generalize it as the partially orderedcontextual bandits. We establish both strengths and weaknesses of thisframework, by showing a curious separation that a regret nearly independent ofthe action/context sizes is possible under stochastic contexts, but isimpossible under adversarial contexts. In particular, this framework leads toan $O(sqrt{T}log^{2.5}T)$ regret for first-price auctions when the bidder'sprivate values are emph{iid}.Despite the limitation of the above framework, we further exploit the specialpayoff function of first-price auctions to develop a sample-efficient algorithmeven in the presence of adversarially generated private values. We establish an$O(sqrt{T}log^3 T)$ regret bound for this algorithm, hence providing acomplete characterization of optimal learning guarantees for first-priceauctions.", "output": "Optimal No-regret Learning in Repeated First-price Auctions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We prove a universal approximation property (UAP) for a class of ODENet and aclass of ResNet, which are simplified mathematical models for deep learningsystems with skip connections. The UAP can be stated as follows. Let $n$ and$m$ be the dimension of input and output data, and assume $mleq n$. Then weshow that ODENet of width $n+m$ with any non-polynomial continuous activationfunction can approximate any continuous function on a compact subset on$mathbb{R}^n$. We also show that ResNet has the same property as the depthtends to infinity. Furthermore, we derive the gradient of a loss functionexplicitly with respect to a certain tuning variable. We use this to constructa learning algorithm for ODENet. To demonstrate the usefulness of thisalgorithm, we apply it to a regression problem, a binary classification, and amultinomial classification in MNIST.", "output": "Universal Approximation Properties for an ODENet and a ResNet: Mathematical Analysis and Numerical Experiments."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is often the case that data are with multiple views in real-worldapplications. Fully exploring the information of each view is significant formaking data more representative. However, due to various limitations andfailures in data collection and pre-processing, it is inevitable for real datato suffer from view missing and data scarcity. The coexistence of these twoissues makes it more challenging to achieve the pattern classification task.Currently, to our best knowledge, few appropriate methods can well-handle thesetwo issues simultaneously. Aiming to draw more attention from the community tothis challenge, we propose a new task in this paper, called few-shot partialmulti-view learning, which focuses on overcoming the negative impact of theview-missing issue in the low-data regime. The challenges of this task aretwofold: (i) it is difficult to overcome the impact of data scarcity under theinterference of missing views; (ii) the limited number of data exacerbatesinformation scarcity, thus making it harder to address the view-missing issuein turn. To address these challenges, we propose a new unified Gaussiandense-anchoring method. The unified dense anchors are learned for the limitedpartial multi-view data, thereby anchoring them into a unified denserepresentation space where the influence of data scarcity and view missing canbe alleviated. We conduct extensive experiments to evaluate our method. Theresults on Cub-googlenet-doc2vec, Handwritten, Caltech102, Scene15, Animal,ORL, tieredImagenet, and Birds-200-2011 datasets validate its effectiveness.", "output": "Few-shot Partial Multi-view Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the tremendous successes of deep neural networks (DNNs) in variousapplications, many fundamental aspects of deep learning remain incompletelyunderstood, including DNN trainability. In a trainability study, one aims todiscern what makes one DNN model easier to train than another under comparableconditions. In particular, our study focuses on multi-layer perceptron (MLP)models equipped with the same number of parameters. We introduce a new notioncalled variability to help explain the benefits of deep learning and thedifficulties in training very deep MLPs. Simply put, variability of a neuralnetwork represents the richness of landscape patterns in the data space withrespect to well-scaled random weights. We empirically show that variability ispositively correlated to the number of activations and negatively correlated toa phenomenon called \"Collapse to Constant\", which is related but not identicalto the well-known vanishing gradient phenomenon. Experiments on a smallstylized model problem confirm that variability can indeed accurately predictMLP trainability. In addition, we demonstrate that, as an activation functionin MLP models, the absolute value function can offer better variability thanthe popular ReLU function can.", "output": "Multi-layer Perceptron Trainability Explained via Variability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Intelligence relies on an agent's knowledge of what it does not know. Thiscapability can be assessed based on the quality of joint predictions of labelsacross multiple inputs. In principle, ensemble-based approaches produceeffective joint predictions, but the computational costs of training largeensembles can become prohibitive. We introduce the epinet: an architecture thatcan supplement any conventional neural network, including large pretrainedmodels, and can be trained with modest incremental computation to estimateuncertainty. With an epinet, conventional neural networks outperform very largeensembles, consisting of hundreds or more particles, with orders of magnitudeless computation. The epinet does not fit the traditional framework of Bayesianneural networks. To accommodate development of approaches beyond BNNs, such asthe epinet, we introduce the epistemic neural network (ENN) as an interface formodels that produce joint predictions.", "output": "Epistemic Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The generalization error of a classifier is related to the complexity of theset of functions among which the classifier is chosen. We study a family oflow-complexity classifiers consisting of thresholding a random one-dimensionalfeature. The feature is obtained by projecting the data on a random line afterembedding it into a higher-dimensional space parametrized by monomials of orderup to k. More specifically, the extended data is projected n-times and the bestclassifier among those n, based on its performance on training data, is chosen.We show that this type of classifier is extremely flexible, as it is likely toapproximate, to an arbitrary precision, any continuous function on a compactset as well as any boolean function on a compact set that splits the supportinto measurable subsets. In particular, given full knowledge of the classconditional densities, the error of these low-complexity classifiers wouldconverge to the optimal (Bayes) error as k and n go to infinity. On the otherhand, if only a training dataset is given, we show that the classifiers willperfectly classify all the training points as k and n go to infinity. We alsobound the generalization error of our random classifiers. In general, ourbounds are better than those for any classifier with VC dimension greater thanO (ln n) . In particular, our bounds imply that, unless the number ofprojections n is extremely large, there is a significant advantageous gapbetween the generalization error of the random projection approach and that ofa linear classifier in the extended space. Asymptotically, as the number ofsamples approaches infinity, the gap persists for any such n. Thus, there is apotentially large gain in generalization properties by selecting parameters atrandom, rather than optimization.", "output": "Optimality and complexity of classification by random projection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Currently, pre-trained models can be considered the default choice for a widerange of NLP tasks. Despite their SoTA results, there is practical evidencethat these models may require a different number of computing layers fordifferent input sequences, since evaluating all layers leads to overconfidencein wrong predictions (namely overthinking). This problem can potentially besolved by implementing adaptive computation time approaches, which were firstdesigned to improve inference speed. Recently proposed PonderNet may be apromising solution for performing an early exit by treating the exit layer'sindex as a latent variable. However, the originally proposed exit criterion,relying on sampling from trained posterior distribution on the probability ofexiting from the $i$-th layer, introduces major variance in exit layer indices,significantly reducing the resulting model's performance. In this paper, wepropose improving PonderNet with a novel deterministic Q-exit criterion and arevisited model architecture. We adapted the proposed mechanism to ALBERT andRoBERTa and compared it with recent methods for performing an early exit. Weobserved that the proposed changes can be considered significant improvementson the original PonderNet architecture and outperform PABEE on a wide range ofGLUE tasks. In addition, we also performed an in-depth ablation study of theproposed architecture to further understand Lambda layers and theirperformance.", "output": "PALBERT: Teaching ALBERT to Ponder."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Time-to-event analysis (survival analysis) is used when the response ofinterest is the time until a pre-specified event occurs. Time-to-event data aresometimes discrete either because time itself is discrete or due to grouping offailure times into intervals or rounding off measurements. In addition, thefailure of an individual could be one of several distinct failure types, knownas competing risks (events). Most methods and software packages for survivalregression analysis assume that time is measured on a continuous scale. It iswell-known that naively applying standard continuous-time models withdiscrete-time data may result in biased estimators of the discrete-time models.The Python package PyDTS, for simulating, estimating and evaluatingsemi-parametric competing-risks models for discrete-time survival data, isintroduced. The package implements a fast procedure that enables includingregularized regression methods, such as LASSO and elastic net, among others. Asimulation study showcases flexibility and accuracy of the package. The utilityof the package is demonstrated by analysing the Medical Information Mart forIntensive Care (MIMIC) - IV dataset for prediction of hospitalization length ofstay.", "output": "PyDTS: A Python Package for Discrete-Time Survival (Regularized) Regression with Competing Risks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we analyze the effect of a policy recommendation on theperformance of an artificial interbank market. Financial institutions stipulatelending agreements following a public recommendation and their individualinformation. The former is modeled by a reinforcement learning optimal policythat maximizes the system's fitness and gathers information on the economicenvironment. The policy recommendation directs economic actors to create creditrelationships through the optimal choice between a low interest rate or a highliquidity supply. The latter, based on the agents' balance sheet, allowsdetermining the liquidity supply and interest rate that the banks optimallyoffer their clients within the market. Thanks to the combination between thepublic and the private signal, financial institutions create or cut theircredit connections over time via a preferential attachment evolving procedureable to generate a dynamic network. Our results show that the emergence of acore-periphery interbank network, combined with a certain level of homogeneityin the size of lenders and borrowers, is essential to ensure the system'sresilience. Moreover, the optimal policy recommendation obtained throughreinforcement learning is crucial in mitigating systemic risk.", "output": "Reinforcement Learning Policy Recommendation for Interbank Network Stability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A growing body of work has shown that deep neural networks are susceptible toadversarial examples. These take the form of small perturbations applied to themodel's input which lead to incorrect predictions. Unfortunately, mostliterature focuses on visually imperceivable perturbations to be applied todigital images that often are, by design, impossible to be deployed to physicaltargets. We present Adversarial Scratches: a novel L0 black-box attack, whichtakes the form of scratches in images, and which possesses much greaterdeployability than other state-of-the-art attacks. Adversarial Scratchesleverage B'ezier Curves to reduce the dimension of the search space andpossibly constrain the attack to a specific location. We test AdversarialScratches in several scenarios, including a publicly available API and imagesof traffic signs. Results show that, often, our attack achieves higher foolingrate than other deployable state-of-the-art methods, while requiringsignificantly fewer queries and modifying very few pixels.", "output": "Adversarial Scratches: Deployable Attacks to CNN Classifiers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A parametric adaptive physics-informed greedy Latent Space DynamicsIdentification (gLaSDI) method is proposed for accurate, efficient, and robustdata-driven reduced-order modeling of high-dimensional nonlinear dynamicalsystems. In the proposed gLaSDI framework, an autoencoder discovers intrinsicnonlinear latent representations of high-dimensional data, while dynamicsidentification (DI) models capture local latent-space dynamics. An interactivetraining algorithm is adopted for the autoencoder and local DI models, whichenables identification of simple latent-space dynamics and enhances accuracyand efficiency of data-driven reduced-order modeling. To maximize andaccelerate the exploration of the parameter space for the optimal modelperformance, an adaptive greedy sampling algorithm integrated with aphysics-informed residual-based error indicator and random-subset evaluation isintroduced to search for the optimal training samples on the fly. Further, toexploit local latent-space dynamics captured by the local DI models for animproved modeling accuracy with a minimum number of local DI models in theparameter space, a k-nearest neighbor convex interpolation scheme is employed.The effectiveness of the proposed framework is demonstrated by modeling variousnonlinear dynamical problems, including Burgers equations, nonlinear heatconduction, and radial advection. The proposed adaptive greedy samplingoutperforms the conventional predefined uniform sampling in terms of accuracy.Compared with the high-fidelity models, gLaSDI achieves 17 to 2,658x speed-upwith 1 to 5% relative errors.", "output": "gLaSDI: Parametric Physics-informed Greedy Latent Space Dynamics Identification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformer-based models have delivered impressive results on many tasks,particularly vision and language tasks. In many model training situations,conventional configurations are typically adopted. For example, we often setthe base model with hidden dimensions (i.e. model width) to be 768 and thenumber of transformer layers (i.e. model depth) to be 12. In this paper, werevisit these conventional configurations. Through theoretical analysis andexperimental evaluation, we show that the masked autoencoder is effective inalleviating the over-smoothing issue in deep transformer training. Based onthis finding, we propose Bamboo, an idea of using deeper and narrowertransformer configurations, for masked autoencoder training. On ImageNet, withsuch a simple change in configuration, re-designed model achieves 87.1% top-1accuracy and outperforms SoTA models like MAE and BEiT. On language tasks,re-designed model outperforms BERT with default setting by 1.1 points onaverage, on GLUE datasets.", "output": "A Study on Transformer Configuration and Training Objective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In a clinical setting it is essential that deployed image processing systemsare robust to the full range of inputs they might encounter and, in particular,do not make confidently wrong predictions. The most popular approach to safeprocessing is to train networks that can provide a measure of theiruncertainty, but these tend to fail for inputs that are far outside thetraining data distribution. Recently, generative modelling approaches have beenproposed as an alternative; these can quantify the likelihood of a data sampleexplicitly, filtering out any out-of-distribution (OOD) samples before furtherprocessing is performed. In this work, we focus on image segmentation andevaluate several approaches to network uncertainty in the far-OOD and near-OODcases for the task of segmenting haemorrhages in head CTs. We find all of theseapproaches are unsuitable for safe segmentation as they provide confidentlywrong predictions when operating OOD. We propose performing full 3D OODdetection using a VQ-GAN to provide a compressed latent representation of theimage and a transformer to estimate the data likelihood. Our approachsuccessfully identifies images in both the far- and near-OOD cases. We find astrong relationship between image likelihood and the quality of a model'ssegmentation, making this approach viable for filtering images unsuitable forsegmentation. To our knowledge, this is the first time transformers have beenapplied to perform OOD detection on 3D image data. Code is available atgithub.com/marksgraham/transformer-ood.", "output": "Transformer-based out-of-distribution detection for clinically safe segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning regression methods allow estimation of functions withoutunrealistic parametric assumptions. Although they can perform exceptionally inprediction error, most lack theoretical convergence rates necessary forsemi-parametric efficient estimation (e.g. TMLE, AIPW) of parameters likeaverage treatment effects. The Highly Adaptive Lasso (HAL) is the onlyregression method proven to converge quickly enough for a meaningfully largeclass of functions, independent of the dimensionality of the predictors.Unfortunately, HAL is not computationally scalable. In this paper we build uponthe theory of HAL to construct the Selectively Adaptive Lasso (SAL), a newalgorithm which retains HAL's dimension-free, nonparametric convergence ratebut which also scales computationally to large high-dimensional datasets. Toaccomplish this, we prove some general theoretical results pertaining toempirical loss minimization in nested Donsker classes. Our resulting algorithmis a form of gradient tree boosting with an adaptive learning rate, which makesit fast and trivial to implement with off-the-shelf software. Finally, we showthat our algorithm retains the performance of standard gradient boosting on adiverse group of real-world datasets. SAL makes semi-parametric efficientestimators practically possible and theoretically justifiable in many big datasettings.", "output": "The Selectively Adaptive Lasso."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Continual learning (CL) enables the development of models and agents thatlearn from a sequence of tasks while addressing the limitations of standarddeep learning approaches, such as catastrophic forgetting. In this work, weinvestigate the factors that contribute to the performance differences betweentask-agnostic CL and multi-task (MTL) agents. We pose two hypotheses: (1)task-agnostic methods might provide advantages in settings with limited data,computation, or high dimensionality, and (2) faster adaptation may beparticularly beneficial in continual learning settings, helping to mitigate theeffects of catastrophic forgetting. To investigate these hypotheses, weintroduce a replay-based recurrent reinforcement learning (3RL) methodology fortask-agnostic CL agents. We assess 3RL on a synthetic task and the Meta-Worldbenchmark, which includes 50 unique manipulation tasks. Our results demonstratethat 3RL outperforms baseline methods and can even surpass its multi-taskequivalent in challenging settings with high dimensionality. We also show thatthe recurrent task-agnostic agent consistently outperforms or matches theperformance of its transformer-based counterpart. These findings provideinsights into the advantages of task-agnostic CL over task-aware MTL approachesand highlight the potential of task-agnostic methods in resource-constrained,high-dimensional, and multi-task environments.", "output": "Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming Challenges."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph contrastive learning has emerged as a powerful tool for unsupervisedgraph representation learning. The key to the success of graph contrastivelearning is to acquire high-quality positive and negative samples ascontrasting pairs for the purpose of learning underlying structural semanticsof the input graph. Recent works usually sample negative samples from the sametraining batch with the positive samples, or from an external irrelevant graph.However, a significant limitation lies in such strategies, which is theunavoidable problem of sampling false negative samples. In this paper, wepropose a novel method to utilize textbf{C}ounterfactual mechanism to generateartificial hard negative samples for textbf{G}raph textbf{C}ontrastivelearning, namely textbf{CGC}, which has a different perspective compared tothose sampling-based strategies. We utilize counterfactual mechanism to producehard negative samples, which ensures that the generated samples are similar to,but have labels that different from the positive sample. The proposed methodachieves satisfying results on several datasets compared to some traditionalunsupervised graph learning methods and some SOTA graph contrastive learningmethods. We also conduct some supplementary experiments to give an extensiveillustration of the proposed method, including the performances of CGC withdifferent hard negative samples and evaluations for hard negative samplesgenerated with different similarity measurements.", "output": "Generating Counterfactual Hard Negative Samples for Graph Contrastive Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most reinforcement learning algorithms seek a single optimal strategy thatsolves a given task. However, it can often be valuable to learn a diverse setof solutions, for instance, to make an agent's interaction with users moreengaging, or improve the robustness of a policy to an unexpected perturbance.We propose Diversity-Guided Policy Optimization (DGPO), an on-policy algorithmthat discovers multiple strategies for solving a given task. Unlike prior work,it achieves this with a shared policy network trained over a single run.Specifically, we design an intrinsic reward based on an information-theoreticdiversity objective. Our final objective alternately constraints on thediversity of the strategies and on the extrinsic reward. We solve theconstrained optimization problem by casting it as a probabilistic inferencetask and use policy iteration to maximize the derived lower bound. Experimentalresults show that our method efficiently discovers diverse strategies in a widevariety of reinforcement learning tasks. Compared to baseline methods, DGPOachieves comparable rewards, while discovering more diverse strategies, andoften with better sample efficiency.", "output": "DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Changes to hyperparameters can have a dramatic effect on model accuracy.Thus, the tuning of hyperparameters plays an important role in optimizingmachine-learning models. An integral part of the hyperparameter-tuning processis the evaluation of model checkpoints, which is done through the use of\"validators\". In a supervised setting, these validators evaluate checkpoints bycomputing accuracy on a validation set that has labels. In contrast, in anunsupervised setting, the validation set has no such labels. Without anylabels, it is impossible to compute accuracy, so validators must estimateaccuracy instead. But what is the best approach to estimating accuracy? In thispaper, we consider this question in the context of unsupervised domainadaptation (UDA). Specifically, we propose three new validators, and we compareand rank them against five other existing validators, on a large dataset of1,000,000 checkpoints. Extensive experimental results show that two of ourproposed validators achieve state-of-the-art performance in various settings.Finally, we find that in many cases, the state-of-the-art is obtained by asimple baseline method. To the best of our knowledge, this is the largestempirical study of UDA validators to date. Code is available at", "output": "Three New Validators and a Large-Scale Benchmark Ranking for Unsupervised Domain Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent years have seen a surge in research on dynamic graph representationlearning, which aims to model temporal graphs that are dynamic and evolvingconstantly over time. However, current work typically models graph dynamicswith recurrent neural networks (RNNs), making them suffer seriously fromcomputation and memory overheads on large temporal graphs. So far, scalabilityof dynamic graph representation learning on large temporal graphs remains oneof the major challenges. In this paper, we present a scalable framework, namelySpikeNet, to efficiently capture the temporal and structural patterns oftemporal graphs. We explore a new direction in that we can capture the evolvingdynamics of temporal graphs with spiking neural networks (SNNs) instead ofRNNs. As a low-power alternative to RNNs, SNNs explicitly model graph dynamicsas spike trains of neuron populations and enable spike-based propagation in anefficient way. Experiments on three large real-world temporal graph datasetsdemonstrate that SpikeNet outperforms strong baselines on the temporal nodeclassification task with lower computational costs. Particularly, SpikeNetgeneralizes to a large temporal graph (2.7M nodes and 13.9M edges) withsignificantly fewer parameters and computation overheads.Our code is publiclyavailable at url{", "output": "Scaling Up Dynamic Graph Representation Learning via Spiking Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We investigate a primal-dual (PD) method for the saddle point problem (SPP)that uses a linear approximation of the primal function instead of the standardproximal step, resulting in a linearized PD (LPD) method. For convex-stronglyconcave SPP, we observe that the LPD method has a suboptimal dependence on theLipschitz constant of the primal function. To fix this issue, we combinefeatures of Accelerated Gradient Descent with the LPD method resulting in asingle-loop Accelerated Linearized Primal-Dual (ALPD) method. ALPD methodachieves the optimal gradient complexity when the SPP has a semi-linearcoupling function. We also present an inexact ALPD method for SPPs with ageneral nonlinear coupling function that maintains the optimal gradientevaluations of the primal parts and significantly improves the gradientevaluations of the coupling term compared to the ALPD method. We verify ourfindings with numerical experiments.", "output": "Accelerated Primal-Dual Methods for Convex-Strongly-Concave Saddle Point Problems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In neural networks, task-relevant information is represented jointly bygroups of neurons. However, the specific way in which this mutual informationabout the classification label is distributed among the individual neurons isnot well understood: While parts of it may only be obtainable from specificsingle neurons, other parts are carried redundantly or synergistically bymultiple neurons. We show how Partial Information Decomposition (PID), a recentextension of information theory, can disentangle these different contributions.From this, we introduce the measure of \"Representational Complexity\", whichquantifies the difficulty of accessing information spread across multipleneurons. We show how this complexity is directly computable for smaller layers.For larger layers, we propose subsampling and coarse-graining procedures andprove corresponding bounds on the latter. Empirically, for quantized deepneural networks solving the MNIST and CIFAR10 tasks, we observe thatrepresentational complexity decreases both through successive hidden layers andover training, and compare the results to related measures. Overall, we proposerepresentational complexity as a principled and interpretable summary statisticfor analyzing the structure and evolution of neural representations and complexsystems in general.", "output": "A Measure of the Complexity of Neural Representations based on Partial Information Decomposition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The problem of sparse multichannel blind deconvolution (S-MBD) arisesfrequently in many engineering applications such as radar/sonar/ultrasoundimaging. To reduce its computational and implementation cost, we propose acompression method that enables blind recovery from much fewer measurementswith respect to the full received signal in time. The proposed compressionmeasures the signal through a filter followed by a subsampling, allowing for asignificant reduction in implementation cost. We derive theoretical guaranteesfor the identifiability and recovery of a sparse filter from compressedmeasurements. Our results allow for the design of a wide class of compressionfilters. We, then, propose a data-driven unrolled learning framework to learnthe compression filter and solve the S-MBD problem. The encoder is a recurrentinference network that maps compressed measurements into an estimate of sparsefilters. We demonstrate that our unrolled learning method is more robust tochoices of source shapes and has better recovery performance compared tooptimization-based methods. Finally, in data-limited applications (fewshotlearning), we highlight the superior generalization capability of unrolledlearning compared to conventional deep learning.", "output": "Unrolled Compressed Blind-Deconvolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Energy consumption from selecting, training and deploying deep learningmodels has continued to increase over the past few years. Our goal in this workis to support the design of energy-efficient deep learning models that areeasier to train with lower compute resources, practical to deploy in real-worldedge/mobile computing settings and environmentally sustainable. Tabularbenchmarks for neural architecture search (NAS) allow the evaluation of NASstrategies at lower computational cost by providing pre-computed performancestatistics. In this work, we suggest including energy efficiency as anadditional performance criterion to NAS and present an updated tabularbenchmark by including information on energy consumption and carbon footprintfor different architectures. The benchmark called EC-NAS is made availableopen-source to support energy consumption-aware NAS research. EC-NAS alsoincludes a surrogate model for predicting energy consumption, and helps usreduce the overall energy cost of creating this dataset. We demonstrate theusefulness of EC-NAS by applying multi-objective optimisation algorithms thatreveal the trade-off between energy consumption and accuracy, showing that itis possible to discover energy-efficient architectures with little to no lossin performance.", "output": "EC-NAS: Energy Consumption Aware Tabular Benchmarks for Neural Architecture Search."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural closure models have recently been proposed as a method for efficientlyapproximating small scales in multiscale systems with neural networks. Thechoice of loss function and associated training procedure has a large effect onthe accuracy and stability of the resulting neural closure model. In this work,we systematically compare three distinct procedures: \"derivative fitting\",\"trajectory fitting\" with discretise-then-optimise, and \"trajectory fitting\"with optimise-then-discretise. Derivative fitting is conceptually the simplestand computationally the most efficient approach and is found to performreasonably well on one of the test problems (Kuramoto-Sivashinsky) but poorlyon the other (Burgers). Trajectory fitting is computationally more expensivebut is more robust and is therefore the preferred approach. Of the twotrajectory fitting procedures, the discretise-then-optimise approach producesmore accurate models than the optimise-then-discretise approach. While theoptimise-then-discretise approach can still produce accurate models, care mustbe taken in choosing the length of the trajectories used for training, in orderto train the models on long-term behaviour while still producing reasonablyaccurate gradients during training. Two existing theorems are interpreted in anovel way that gives insight into the long-term accuracy of a neural closuremodel based on how accurate it is in the short term.", "output": "Comparison of neural closure models for discretised PDEs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Articulatory recordings track the positions and motion of differentarticulators along the vocal tract and are widely used to study speechproduction and to develop speech technologies such as articulatory based speechsynthesizers and speech inversion systems. The University of Wisconsin X-Raymicrobeam (XRMB) dataset is one of various datasets that provide articulatoryrecordings synced with audio recordings. The XRMB articulatory recordingsemploy pellets placed on a number of articulators which can be tracked by themicrobeam. However, a significant portion of the articulatory recordings aremistracked, and have been so far unsuable. In this work, we present a deeplearning based approach using Masked Autoencoders to accurately reconstruct themistracked articulatory recordings for 41 out of 47 speakers of the XRMBdataset. Our model is able to reconstruct articulatory trajectories thatclosely match ground truth, even when three out of eight articulators aremistracked, and retrieve 3.28 out of 3.4 hours of previously unusablerecordings.", "output": "Masked Autoencoders Are Articulatory Learners."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although deep neural network (DNN)-based speech enhancement (SE) methodsoutperform the previous non-DNN-based ones, they often degrade the perceptualquality of generated outputs. To tackle this problem, we introduce a DNN-basedgenerative refiner, Diffiner, aiming to improve perceptual speech qualitypre-processed by an SE method. We train a diffusion-based generative model byutilizing a dataset consisting of clean speech only. Then, our refinereffectively mixes clean parts newly generated via denoising diffusionrestoration into the degraded and distorted parts caused by a preceding SEmethod, resulting in refined speech. Once our refiner is trained on a set ofclean speech, it can be applied to various SE methods without additionaltraining specialized for each SE module. Therefore, our refiner can be aversatile post-processing module w.r.t. SE methods and has high potential interms of modularity. Experimental results show that our method improvedperceptual speech quality regardless of the preceding SE methods used.", "output": "Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the approximation of shift-invariant or equivariant functions bydeep fully convolutional networks from the dynamical systems perspective. Weprove that deep residual fully convolutional networks and theircontinuous-layer counterpart can achieve universal approximation of thesesymmetric functions at constant channel width. Moreover, we show that the samecan be achieved by non-residual variants with at least 2 channels in each layerand convolutional kernel size of at least 2. In addition, we show that theserequirements are necessary, in the sense that networks with fewer channels orsmaller kernels fail to be universal approximators.", "output": "On the Universal Approximation Property of Deep Fully Convolutional Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-task learning (MTL) models have demonstrated impressive results incomputer vision, natural language processing, and recommender systems. Eventhough many approaches have been proposed, how well these approaches balancedifferent tasks on each parameter still remains unclear. In this paper, wepropose to measure the task dominance degree of a parameter by the totalupdates of each task on this parameter. Specifically, we compute the totalupdates by the exponentially decaying Average of the squared Updates (AU) on aparameter from the corresponding task.Based on this novel metric, we observethat many parameters in existing MTL methods, especially those in the highershared layers, are still dominated by one or several tasks. The dominance of AUis mainly due to the dominance of accumulative gradients from one or severaltasks. Motivated by this, we propose a Task-wise Adaptive learning rateapproach, AdaTask in short, to separate the emph{accumulative gradients} andhence the learning rate of each task for each parameter in adaptive learningrate approaches (e.g., AdaGrad, RMSProp, and Adam). Comprehensive experimentson computer vision and recommender system MTL datasets demonstrate that AdaTasksignificantly improves the performance of dominated tasks, resulting SOTAaverage task-wise performance. Analysis on both synthetic and real-worlddatasets shows AdaTask balance parameters in every shared layer well.", "output": "AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Intrinsic motivation is a promising exploration technique for solvingreinforcement learning tasks with sparse or absent extrinsic rewards. Thereexist two technical challenges in implementing intrinsic motivation: 1) how todesign a proper intrinsic objective to facilitate efficient exploration; and 2)how to combine the intrinsic objective with the extrinsic objective to helpfind better solutions. In the current literature, the intrinsic objectives areall designed in a task-agnostic manner and combined with the extrinsicobjective via simple addition (or used by itself for reward-free pre-training).In this work, we show that these designs would fail in typical sparse-rewardcontinuous control tasks. To address the problem, we propose ConstrainedIntrinsic Motivation (CIM) to leverage readily attainable task priors toconstruct a constrained intrinsic objective, and at the same time, exploit theLagrangian method to adaptively balance the intrinsic and extrinsic objectivesvia a simultaneous-maximization framework. We empirically show, on multiplesparse-reward continuous control tasks, that our CIM approach achieves greatlyimproved performance and sample efficiency over state-of-the-art methods.Moreover, the key techniques of our CIM can also be plugged into existingmethods to boost their performances.", "output": "CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural sequence models, especially transformers, exhibit a remarkablecapacity for in-context learning. They can construct new predictors fromsequences of labeled examples $(x, f(x))$ presented in the input withoutfurther parameter updates. We investigate the hypothesis that transformer-basedin-context learners implement standard learning algorithms implicitly, byencoding smaller models in their activations, and updating these implicitmodels as new examples appear in the context. Using linear regression as aprototypical problem, we offer three sources of evidence for this hypothesis.First, we prove by construction that transformers can implement learningalgorithms for linear models based on gradient descent and closed-form ridgeregression. Second, we show that trained in-context learners closely match thepredictors computed by gradient descent, ridge regression, and exactleast-squares regression, transitioning between different predictors astransformer depth and dataset noise vary, and converging to Bayesian estimatorsfor large widths and depths. Third, we present preliminary evidence thatin-context learners share algorithmic features with these predictors: learners'late layers non-linearly encode weight vectors and moment matrices. Theseresults suggest that in-context learning is understandable in algorithmicterms, and that (at least in the linear case) learners may rediscover standardestimation algorithms. Code and reference implementations are released at", "output": "What learning algorithm is in-context learning? Investigations with linear models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Causal effect estimation from observational data is a fundamental task inempirical sciences. It becomes particularly challenging when unobservedconfounders are involved in a system. This paper focuses on front-dooradjustment -- a classic technique which, using observed mediators allows toidentify causal effects even in the presence of unobserved confounding. Whilethe statistical properties of the front-door estimation are quite wellunderstood, its algorithmic aspects remained unexplored for a long time.Recently, Jeong, Tian, and Barenboim [NeurIPS 2022] have presented the firstpolynomial-time algorithm for finding sets satisfying the front-door criterionin a given directed acyclic graph (DAG), with an $O(n^3(n+m))$ run time, where$n$ denotes the number of variables and $m$ the number of edges of the causalgraph. In our work, we give the first linear-time, i.e., $O(n+m)$, algorithmfor this task, which thus reaches the asymptotically optimal time complexity.This result implies an $O(n(n+m))$ delay enumeration algorithm of allfront-door adjustment sets, again improving previous work by Jeong et al. by afactor of $n^3$. Moreover, we provide the first linear-time algorithm forfinding a minimal front-door adjustment set. We offer implementations of ouralgorithms in multiple programming languages to facilitate practical usage andempirically validate their feasibility, even for large graphs.", "output": "Linear-Time Algorithms for Front-Door Adjustment in Causal Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "AI-enabled precision medicine promises a transformational improvement inhealthcare outcomes by enabling data-driven personalized diagnosis, prognosis,and treatment. However, the well-known \"curse of dimensionality\" and theclustered structure of biomedical data together interact to present a jointchallenge in the high dimensional, limited observation precision medicineregime. To overcome both issues simultaneously we propose a simple and scalableapproach to joint clustering and embedding that combines standard embeddingmethods with a convex clustering penalty in a modular way. This novel,cluster-aware embedding approach overcomes the complexity and limitations ofcurrent joint embedding and clustering methods, which we show withstraightforward implementations of hierarchically clustered principal componentanalysis (PCA), locally linear embedding (LLE), and canonical correlationanalysis (CCA). Through both numerical experiments and real-world examples, wedemonstrate that our approach outperforms traditional and contemporaryclustering methods on highly underdetermined problems (e.g., with just tens ofobservations) as well as on large sample datasets. Importantly, our approachdoes not require the user to choose the desired number of clusters, but insteadyields interpretable dendrograms of hierarchically clustered embeddings. Thusour approach improves significantly on existing methods for identifying patientsubgroups in multiomics and neuroimaging data, enabling scalable andinterpretable biomarkers for precision medicine.", "output": "Simple and Scalable Algorithms for Cluster-Aware Precision Medicine."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Step-by-step reasoning approaches like chain of thought (CoT) have proved tobe very effective in inducing reasoning capabilities in large language models.However, the success of the CoT approach is fundamentally tied to the modelsize, and billion parameter-scale models are often needed to get CoT to work.In this paper, we propose a knowledge distillation approach that leverages thestep-by-step CoT reasoning capabilities of larger models and distills theseabilities into smaller models.In this work, we propose an alternative reasoning scheme, Socratic CoT, thatlearns a decomposition of the original problem into a sequence of subproblemsand uses it to guide the intermediate reasoning steps. We use Socratic CoT totrain a combination of two small distilled models: a problem decomposer and asubproblem solver. In practice, given a new problem, the two distilled modelswork in sync to decompose and solve complex problems. On multiple reasoningdatasets (GSM8K, StrategyQA, and SVAMP), our proposed distillation strategiesboosts the performance of smaller models over 70% compared to the baselines.Finally, we investigate when Socratic CoT is an effective alternative to CoT,demonstrating cases where a much smaller model (GPT-2 large) can outperform a10X larger model (GPT-3 6B). Our code is available here:", "output": "Distilling Reasoning Capabilities into Smaller Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph convolutional networks (GCNs) have achieved great success in graphrepresentation learning by extracting high-level features from nodes and theirtopology. Since GCNs generally follow a message-passing mechanism, each nodeaggregates information from its first-order neighbour to update itsrepresentation. As a result, the representations of nodes with edges betweenthem should be positively correlated and thus can be considered positivesamples. However, there are more non-neighbour nodes in the whole graph, whichprovide diverse and useful information for the representation update. Twonon-adjacent nodes usually have different representations, which can be seen asnegative samples. Besides the node representations, the structural informationof the graph is also crucial for learning. In this paper, we usedquality-diversity decomposition in determinant point processes (DPP) to obtaindiverse negative samples. When defining a distribution on diverse subsets ofall non-neighbouring nodes, we incorporate both graph structure information andnode representations. Since the DPP sampling process requires matrix eigenvaluedecomposition, we propose a new shortest-path-base method to improvecomputational efficiency. Finally, we incorporate the obtained negative samplesinto the graph convolution operation. The ideas are evaluated empirically inexperiments on node classification tasks. These experiments show that the newlyproposed methods not only improve the overall performance of standardrepresentation learning but also significantly alleviate over-smoothingproblems.", "output": "Graph Convolutional Neural Networks with Diverse Negative Samples via Decomposed Determinant Point Processes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Estimating $T_2$ relaxation time distributions from multi-echo $T_2$-weightedMRI ($T_2W$) data can provide valuable biomarkers for assessing inflammation,demyelination, edema, and cartilage composition in various pathologies,including neurodegenerative disorders, osteoarthritis, and tumors. Deep neuralnetwork (DNN) based methods have been proposed to address the complex inverseproblem of estimating $T_2$ distributions from MRI data, but they are not yetrobust enough for clinical data with low Signal-to-Noise ratio (SNR) and arehighly sensitive to distribution shifts such as variations in echo-times (TE)used during acquisition. Consequently, their application is hindered inclinical practice and large-scale multi-institutional trials with heterogeneousacquisition protocols. We propose a physically-primed DNN approach, called$P_2T_2$, that incorporates the signal decay forward model in addition to theMRI signal into the DNN architecture to improve the accuracy and robustness of$T_2$ distribution estimation. We evaluated our $P_2T_2$ model in comparison toboth DNN-based methods and classical methods for $T_2$ distribution estimationusing 1D and 2D numerical simulations along with clinical data. Our modelimproved the baseline model's accuracy for low SNR levels ($SNR<80$) which arecommon in the clinical setting. Further, our model achieved a $sim$35%improvement in robustness against distribution shifts in the acquisitionprocess compared to previously proposed DNN models. Finally, Our $P_2T_2$ modelproduces the most detailed Myelin-Water fraction maps compared to baselineapproaches when applied to real human MRI data. Our $P_2T_2$ model offers areliable and precise means of estimating $T_2$ distributions from MRI data andshows promise for use in large-scale multi-institutional trials withheterogeneous acquisition protocols.", "output": "P2T2: a Physically-primed deep-neural-network approach for robust $T_{2}$ distribution estimation from quantitative $T_{2}$-weighted MRI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spiking Neural Networks (SNNs) have emerged as a promising alternative totraditional Deep Neural Networks for low-power computing. However, theeffectiveness of SNNs is not solely determined by their performance but also bytheir energy consumption, prediction speed, and robustness to noise. The recentmethod Fast & Deep, along with others, achieves fast and energy-efficientcomputation by constraining neurons to fire at most once. Known asTime-To-First-Spike (TTFS), this constraint however restricts the capabilitiesof SNNs in many aspects. In this work, we explore the relationships betweenperformance, energy consumption, speed and stability when using thisconstraint. More precisely, we highlight the existence of tradeoffs whereperformance and robustness are gained at the cost of sparsity and predictionlatency. To improve these tradeoffs, we propose a relaxed version of Fast &Deep that allows for multiple spikes per neuron. Our experiments show thatrelaxing the spike constraint provides higher performance while also benefitingfrom faster convergence, similar sparsity, comparable prediction latency, andbetter robustness to noise compared to TTFS SNNs. By highlighting thelimitations of TTFS and demonstrating the advantages of unconstrained SNNs weprovide valuable insight for the development of effective learning strategiesfor neuromorphic computing.", "output": "Exploring Tradeoffs in Spiking Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The ability to replicate predictions by machine learning (ML) or artificialintelligence (AI) models and results in scientific workflows that incorporatesuch ML/AI predictions is driven by numerous factors. An uncertainty-awaremetric that can quantitatively assess the reproducibility of quantities ofinterest (QoI) would contribute to the trustworthiness of results obtained fromscientific workflows involving ML/AI models. In this article, we discuss howuncertainty quantification (UQ) in a Bayesian paradigm can provide a generaland rigorous framework for quantifying reproducibility for complex scientificworkflows. Such as framework has the potential to fill a critical gap thatcurrently exists in ML/AI for scientific workflows, as it will enableresearchers to determine the impact of ML/AI model prediction variability onthe predictive outcomes of ML/AI-powered workflows. We expect that theenvisioned framework will contribute to the design of more reproducible andtrustworthy workflows for diverse scientific applications, and ultimately,accelerate scientific discoveries.", "output": "A Rigorous Uncertainty-Aware Quantification Framework Is Essential for Reproducible and Replicable Machine Learning Workflows."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Improving the predictive capability and computational cost of dynamicalmodels is often at the heart of augmenting computational physics with machinelearning (ML). However, most learning results are limited in interpretabilityand generalization over different computational grid resolutions, initial andboundary conditions, domain geometries, and physical or problem-specificparameters. In the present study, we simultaneously address all thesechallenges by developing the novel and versatile methodology of unified neuralpartial delay differential equations. We augment existing/low-fidelitydynamical models directly in their partial differential equation (PDE) formswith both Markovian and non-Markovian neural network (NN) closureparameterizations. The melding of the existing models with NNs in thecontinuous spatiotemporal space followed by numerical discretizationautomatically allows for the desired generalizability. The Markovian term isdesigned to enable extraction of its analytical form and thus providesinterpretability. The non-Markovian terms allow accounting for inherentlymissing time delays needed to represent the real world. We obtain adjoint PDEsin the continuous form, thus enabling direct implementation acrossdifferentiable and non-differentiable computational physics codes, different MLframeworks, and treatment of nonuniformly-spaced spatiotemporal training data.We demonstrate the new generalized neural closure models (gnCMs) frameworkusing four sets of experiments based on advecting nonlinear waves, shocks, andocean acidification models. Our learned gnCMs discover missing physics, findleading numerical error terms, discriminate among candidate functional forms inan interpretable fashion, achieve generalization, and compensate for the lackof complexity in simpler models. Finally, we analyze the computationaladvantages of our new framework.", "output": "Generalized Neural Closure Models with Interpretability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the gradients of a maxout network with respect to inputs andparameters and obtain bounds for the moments depending on the architecture andthe parameter distribution. We observe that the distribution of theinput-output Jacobian depends on the input, which complicates a stableparameter initialization. Based on the moments of the gradients, we formulateparameter initialization strategies that avoid vanishing and explodinggradients in wide networks. Experiments with deep fully-connected andconvolutional networks show that this strategy improves SGD and Adam trainingof deep maxout networks. In addition, we obtain refined bounds on the expectednumber of linear regions, results on the expected curve length distortion, andresults on the NTK.", "output": "Expected Gradients of Maxout Networks and Consequences to Parameter Initialization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A solar sail is one of the most promising space exploration system because ofits theoretically infinite specific impulse using solar radiation pressure(SRP). Recently, some researchers proposed \"transformable spacecrafts\" that canactively reconfigure their body configurations with actuatable joints. Thetransformable spacecrafts are expected to greatly enhance orbit and attitudecontrol capability due to its high redundancy in control degree of freedom ifthey are used like solar sails. However, its large number of input posesdifficulties in control, and therefore, previous researchers imposed strongconstraints to limit its potential control capabilities. This paper addressesnovel attitude control techniques for the transformable spacecrafts under SRP.The authors have constructed two proposed methods; one of those is a jointangle optimization to acquire arbitrary SRP force and torque, and the other isa momentum damping control driven by joint angle actuation. Our proposedmethods are formulated in general forms and applicable to any transformablespacecraft that has front faces that can dominantly receive SRP on each body.Validity of the proposed methods are confirmed by numerical simulations. Thispaper contributes to making most of the high control redundancy oftransformable spacecrafts without consuming any expendable propellants, whichis expected to greatly enhance orbit and attitude control capability.", "output": "Optimization of body configuration and joint-driven attitude stabilization for transformable spacecrafts under solar radiation pressure."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Building recommendation systems via federated learning (FL) is a new emergingchallenge for advancing next-generation Internet service and privacyprotection. Existing approaches train shared item embedding by FL while keepingthe user embedding private on client side. However, item embedding identicalfor all clients cannot capture users' individual differences on perceiving thesame item and thus leads to poor personalization. Moreover, dense itemembedding in FL results in expensive communication cost and latency. To addressthese challenges, we propose Federated Recommendation with AdditivePersonalization (FedRAP), which learns a global view of items via FL and apersonalized view locally on each user. FedRAP enforces sparsity of the globalview to save FL's communication cost and encourages difference between the twoviews through regularization. We propose an effective curriculum to learn thelocal and global views progressively with increasing regularization weights. Toproduce recommendations for an user, FedRAP adds the two views together toobtain a personalized item embedding. FedRAP achieves the best performance inFL setting on multiple benchmarks. It outperforms recent federatedrecommendation methods and several ablation study baselines.", "output": "Federated Recommendation with Additive Personalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The generation of molecules with desired properties has gained tremendouspopularity, revolutionizing the way scientists design molecular structures andproviding valuable support for chemical and drug design. However, despite thepotential of language models in molecule generation, they face numerouschallenges such as the generation of syntactically or chemically flawedmolecules, narrow domain focus, and limitations in creating diverse anddirectionally feasible molecules due to a dearth of annotated data or externalmolecular databases. To this end, we introduce MolGen, a pre-trained molecularlanguage model tailored specifically for molecule generation. MolGen acquiresintrinsic structural and grammatical insights by reconstructing over 100million molecular SELFIES, while facilitating knowledge transfer betweendifferent domains through domain-agnostic molecular prefix tuning. Moreover, wepresent a self-feedback paradigm that inspires the pre-trained model to alignwith the ultimate goal of producing molecules with desirable properties.Extensive experiments demonstrate that MolGen achieves superior performance onwell-known molecule generation benchmarks. Further analysis shows that MolGencan accurately capture molecule distributions, implicitly learn theirstructural characteristics, and efficiently explore chemical space. Thepre-trained model, codes, and datasets are publicly available for futureresearch at ", "output": "Domain-Agnostic Molecular Generation with Self-feedback."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Noisy partial label learning (noisy PLL) is an important branch of weaklysupervised learning. Unlike PLL where the ground-truth label must conceal inthe candidate label set, noisy PLL relaxes this constraint and allows theground-truth label may not be in the candidate label set. To address thischallenging problem, most of the existing works attempt to detect noisy samplesand estimate the ground-truth label for each noisy sample. However, detectionerrors are unavoidable. These errors can accumulate during training andcontinuously affect model optimization. To this end, we propose a novelframework for noisy PLL with theoretical guarantees, called ``Adjusting LabelImportance Mechanism (ALIM)''. It aims to reduce the negative impact ofdetection errors by trading off the initial candidate set and model outputs.ALIM is a plug-in strategy that can be integrated with existing PLL approaches.Experimental results on benchmark datasets demonstrate that our method canachieve state-of-the-art performance on noisy PLL.textcolor[rgb]{0.93,0.0,0.47}{Our code can be found in SupplementaryMaterial}.", "output": "ALIM: Adjusting Label Importance Mechanism for Noisy Partial Label Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent advances in neural language models have also been successfullyapplied to the field of chemistry, offering generative solutions for classicalproblems in molecular design and synthesis planning. These new methods have thepotential to fuel a new era of data-driven automation in scientific discovery.However, specialized models are still typically required for each task, leadingto the need for problem-specific fine-tuning and neglecting taskinterrelations. The main obstacle in this field is the lack of a unifiedrepresentation between natural language and chemical representations,complicating and limiting human-machine interaction. Here, we propose the firstmulti-domain, multi-task language model that can solve a wide range of tasks inboth the chemical and natural language domains. Our model can handle chemicaland natural language concurrently, without requiring expensive pre-training onsingle domains or task-specific models. Interestingly, sharing weights acrossdomains remarkably improves our model when benchmarked against state-of-the-artbaselines on single-domain and cross-domain tasks. In particular, sharinginformation across domains and tasks gives rise to large improvements incross-domain tasks, the magnitude of which increase with scale, as measured bymore than a dozen of relevant metrics. Our work suggests that such models canrobustly and efficiently accelerate discovery in physical sciences bysuperseding problem-specific fine-tuning and enhancing human-modelinteractions.", "output": "Unifying Molecular and Textual Representations via Multi-task Language Modelling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training deep networks and tuning hyperparameters on large datasets iscomputationally intensive. One of the primary research directions for efficienttraining is to reduce training costs by selecting well-generalizable subsets oftraining data. Compared to simple adaptive random subset selection baselines,existing intelligent subset selection approaches are not competitive due to thetime-consuming subset selection step, which involves computing model-dependentgradients and feature embeddings and applies greedy maximization of submodularobjectives. Our key insight is that removing the reliance on downstream modelparameters enables subset selection as a pre-processing step and enables one totrain multiple models at no additional cost. In this work, we propose MILO, amodel-agnostic subset selection framework that decouples the subset selectionfrom model training while enabling superior model convergence and performanceby using an easy-to-hard curriculum. Our empirical results indicate that MILOcan train models $3times - 10 times$ faster and tune hyperparameters$20times - 75 times$ faster than full-dataset training or tuning withoutcompromising performance.", "output": "MILO: Model-Agnostic Subset Selection Framework for Efficient Model Training and Tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider a non-convex constrained optimization problem, where theobjective function is weakly convex and the constraint function is eitherconvex or weakly convex. To solve this problem, we consider the classicalswitching subgradient method, which is an intuitive and easily implementablefirst-order method whose oracle complexity was only known for convex problems.This paper provides the first analysis on the oracle complexity of theswitching subgradient method for finding a nearly stationary point ofnon-convex problems. Our results are derived separately for convex and weaklyconvex constraints. Compared to existing approaches, especially the double-loopmethods, the switching gradient method can be applied to non-smooth problemsand achieves the same complexity using only a single loop, which saves theeffort on tuning the number of inner iterations.", "output": "Oracle Complexity of Single-Loop Switching Subgradient Methods for Non-Smooth Weakly Convex Functional Constrained Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Research in machine learning has polarized into two general approaches forregression tasks: Transductive methods construct estimates directly fromavailable data but are usually problem unspecific. Inductive methods can bemuch more specific but generally require compute-intensive solution searches.In this work, we propose a hybrid approach and show that transductiveregression principles can be meta-learned through gradient descent to formefficient in-context neural approximators by leveraging the theory ofvector-valued Reproducing Kernel Banach Spaces (RKBS). We apply this approachto function spaces defined over finite and infinite-dimensional spaces(function-valued operators) and show that once trained, the Transducer canalmost instantaneously capture an infinity of functional relationships given afew pairs of input and output examples and return new image estimates. Wedemonstrate the benefit of our meta-learned transductive approach to modelcomplex physical systems influenced by varying external factors with littledata at a fraction of the usual deep learning training computational cost forpartial differential equations and climate modeling applications.", "output": "Learning Functional Transduction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novelreinforcement learning framework for history-dependent environments thatgeneralizes the contextual MDP framework to handle non-Markov environments,where contexts change over time. We consider special cases of the model, with afocus on logistic DCMDPs, which break the exponential dependence on historylength by leveraging aggregation functions to determine context transitions.This special structure allows us to derive an upper-confidence-bound stylealgorithm for which we establish regret bounds. Motivated by our theoreticalresults, we introduce a practical model-based algorithm for logistic DCMDPsthat plans in a latent space and uses optimism over history-dependent features.We demonstrate the efficacy of our approach on a recommendation task (usingMovieLens data) where user behavior dynamics evolve in response torecommendations.", "output": "Reinforcement Learning with History-Dependent Dynamic Contexts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Tensor completion is important to many areas such as computer vision, dataanalysis, and signal processing. Enforcing low-rank structures on completedtensors, a category of methods known as low-rank tensor completion, hasrecently been studied extensively. Whilst such methods attained great success,none considered exploiting numerical priors of tensor elements. Ignoringnumerical priors causes loss of important information regarding the data, andtherefore prevents the algorithms from reaching optimal accuracy. This workattempts to construct a new methodological framework called GCDTC (GeneralizedCP Decomposition Tensor Completion) for leveraging numerical priors andachieving higher accuracy in tensor completion. In this newly introducedframework, a generalized form of CP Decomposition is applied to low-rank tensorcompletion. This paper also proposes an algorithm known as SPTC (Smooth PoissonTensor Completion) for nonnegative integer tensor completion as aninstantiation of the GCDTC framework. A series of experiments on real-worlddata indicate that SPTC could produce results superior in completion accuracyto current state-of-the-art methods. Related code is available in thesupplemental materials.", "output": "Exploring Numerical Priors for Low-Rank Tensor Completion with Generalized CP Decomposition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Fraud detection is to identify, monitor, and prevent potentially fraudulentactivities from complex data. The recent development and success in AI,especially machine learning, provides a new data-driven way to deal with fraud.From a methodological point of view, machine learning based fraud detection canbe divided into two categories, i.e., conventional methods (decision tree,boosting...) and deep learning, both of which have significant limitations interms of the lack of representation learning ability for the former andinterpretability for the latter. Furthermore, due to the rarity of detectedfraud cases, the associated data is usually imbalanced, which seriouslydegrades the performance of classification algorithms. In this paper, wepropose deep boosting decision trees (DBDT), a novel approach for frauddetection based on gradient boosting and neural networks. In order to combinethe advantages of both conventional methods and deep learning, we firstconstruct soft decision tree (SDT), a decision tree structured model withneural networks as its nodes, and then ensemble SDTs using the idea of gradientboosting. In this way we embed neural networks into gradient boosting toimprove its representation learning capability and meanwhile maintain theinterpretability. Furthermore, aiming at the rarity of detected fraud cases, inthe model training phase we propose a compositional AUC maximization approachto deal with data imbalances at algorithm level. Extensive experiments onseveral real-life fraud detection datasets show that DBDT can significantlyimprove the performance and meanwhile maintain good interpretability. Our codeis available at ", "output": "Efficient Fraud Detection Using Deep Boosting Decision Trees."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper investigates group distributionally robust optimization (GDRO),with the purpose to learn a model that performs well over $m$ differentdistributions. First, we formulate GDRO as a stochastic convex-concavesaddle-point problem, and demonstrate that stochastic mirror descent (SMD),using $m$ samples in each iteration, achieves an $O(m (log m)/epsilon^2)$sample complexity for finding an $epsilon$-optimal solution, which matches the$Omega(m/epsilon^2)$ lower bound up to a logarithmic factor. Then, we makeuse of techniques from online learning to reduce the number of samples requiredin each round from $m$ to $1$, keeping the same sample complexity.Specifically, we cast GDRO as a two-players game where one player simplyperforms SMD and the other executes an online algorithm for non-obliviousmulti-armed bandits. Next, we consider a more practical scenario where thenumber of samples that can be drawn from each distribution is different, andpropose a novel formulation of weighted GDRO, which allows us to derivedistribution-dependent convergence rates. Denote by $n_i$ the sample budget forthe $i$-th distribution, and assume $n_1 geq n_2 geq cdots geq n_m$. In thefirst approach, we incorporate non-uniform sampling into SMD such that thesample budget is satisfied in expectation, and prove the excess risk of the$i$-th distribution decreases at an $O(sqrt{n_1 log m}/n_i)$ rate. In thesecond approach, we use mini-batches to meet the budget exactly and also reducethe variance in stochastic gradients, and then leverage stochastic mirror-proxalgorithm, which can exploit small variances, to optimize a carefully designedweighted GDRO problem. Under appropriate conditions, it attains an $O((logm)/sqrt{n_i})$ convergence rate, which almost matches the optimal$O(sqrt{1/n_i})$ rate of only learning from the $i$-th distribution with $n_i$samples.", "output": "Stochastic Approximation Approaches to Group Distributionally Robust Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Across domains such as medicine, employment, and criminal justice, predictivemodels often target labels that imperfectly reflect the outcomes of interest toexperts and policymakers. For example, clinical risk assessments deployed toinform physician decision-making often predict measures of healthcareutilization (e.g., costs, hospitalization) as a proxy for patient medical need.These proxies can be subject to outcome measurement error when theysystematically differ from the target outcome they are intended to measure.However, prior modeling efforts to characterize and mitigate outcomemeasurement error overlook the fact that the decision being informed by a modeloften serves as a risk-mitigating intervention that impacts the target outcomeof interest and its recorded proxy. Thus, in these settings, addressingmeasurement error requires counterfactual modeling of treatment effects onoutcomes. In this work, we study intersectional threats to model reliabilityintroduced by outcome measurement error, treatment effects, and selection biasfrom historical decision-making policies. We develop an unbiased riskminimization method which, given knowledge of proxy measurement errorproperties, corrects for the combined effects of these challenges. We alsodevelop a method for estimating treatment-dependent measurement errorparameters when these are unknown in advance. We demonstrate the utility of ourapproach theoretically and via experiments on real-world data from randomizedcontrolled trials conducted in healthcare and employment domains. Asimportantly, we demonstrate that models correcting for outcome measurementerror or treatment effects alone suffer from considerable reliabilitylimitations. Our work underscores the importance of considering intersectionalthreats to model validity during the design and evaluation of predictive modelsfor decision support.", "output": "Counterfactual Prediction Under Outcome Measurement Error."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Out-of-distribution (OOD) detection is critical to building reliable machinelearning systems in the open world. Researchers have proposed variousstrategies to reduce model overconfidence on OOD data. Among them, ReAct is atypical and effective technique to deal with model overconfidence, whichtruncates high activations to increase the gap between in-distribution and OOD.Despite its promising results, is this technique the best choice for wideningthe gap? To answer this question, we leverage the variational method to findthe optimal operation and verify the necessity of suppressing abnormally lowand high activations and amplifying intermediate activations in OOD detection,rather than focusing only on high activations like ReAct. This motivates us topropose a novel technique called ``Variational Rectified Activation (VRA)'',which simulates these suppression and amplification operations using piecewisefunctions. Experimental results on multiple benchmark datasets demonstrate thatour method outperforms existing post-hoc strategies. Meanwhile, VRA iscompatible with different scoring functions and network architectures.textcolor[rgb]{0.93,0.0,0.47}{Our code can be found in SupplementaryMaterial}.", "output": "VRA: Variational Rectified Activation for Out-of-distribution Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Soft actor-critic is a successful successor over soft Q-learning. While livedunder maximum entropy framework, their relationship is still unclear. In thispaper, we prove that in the limit they converge to the same solution. This isappealing since it translates the optimization from an arduous to an easierway. The same justification can also be applied to other regularizers such asKL divergence.", "output": "The Point to Which Soft Actor-Critic Converges."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent work have demonstrated that robustness (to \"corruption\") can be atodds with generalization. Adversarial training, for instance, aims to reducethe problematic susceptibility of modern neural networks to small dataperturbations. Surprisingly, overfitting is a major concern in adversarialtraining despite being mostly absent in standard training. We provide heretheoretical evidence for this peculiar \"robust overfitting\" phenomenon.Subsequently, we advance a novel distributionally robust loss function bridgingrobustness and generalization. We demonstrate both theoretically as well asempirically the loss to enjoy a certified level of robustness against twocommon types of corruption--data evasion and poisoning attacks--while ensuringguaranteed generalization. We show through careful numerical experiments thatour resulting holistic robust (HR) training procedure yields SOTA performance.Finally, we indicate that HR training can be interpreted as a direct extensionof adversarial training and comes with a negligible additional computationalburden. A ready-to-use python library implementing our algorithm is availableat ", "output": "Certified Robust Neural Networks: Generalization and Corruption Resistance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning image processing models have had remarkable success in recentyears in generating high quality images. Particularly, the Improved DenoisingDiffusion Probabilistic Models (DDPM) have shown superiority in image qualityto the state-of-the-art generative models, which motivated us to investigatetheir capability in the generation of the synthetic electrocardiogram (ECG)signals. In this work, synthetic ECG signals are generated by the Improved DDPMand by the Wasserstein GAN with Gradient Penalty (WGAN-GP) models and thencompared. To this end, we devise a pipeline to utilize DDPM in its original$2D$ form. First, the $1D$ ECG time series data are embedded into the $2D$space, for which we employed the Gramian Angular Summation/Difference Fields(GASF/GADF) as well as Markov Transition Fields (MTF) to generate three $2D$matrices from each ECG time series, which when put together, form a $3$-channel$2D$ datum. Then $2D$ DDPM is used to generate $2D$ $3$-channel synthetic ECGimages. The $1D$ ECG signals are created by de-embedding the $2D$ generatedimage files back into the $1D$ space. This work focuses on unconditional modelsand the generation of emph{Normal Sinus Beat} ECG signals exclusively, wherethe Normal Sinus Beat class from the MIT-BIH Arrhythmia dataset is used in thetraining phase. The emph{quality}, emph{distribution}, and theemph{authenticity} of the generated ECG signals by each model arequantitatively evaluated and compared. Our results show that in the proposedpipeline and in the particular setting of this paper, the WGAN-GP model isconsistently superior to DDPM in all the considered metrics.", "output": "Synthetic ECG Signal Generation using Probabilistic Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Objective: The study aimed to determine the prevalence of severalnon-communicable diseases (NCD) and analyze risk factors among adult patientsseeking nutritional guidance in Dhaka, Bangladesh. Result: Our study observedthe relationships between gender, age groups, obesity, and NCDs (DM, CKD, IBS,CVD, CRD, thyroid). The most frequently reported NCD was cardiovascular issues(CVD), which was present in 83.56% of all participants. CVD was more common inmale participants. Consequently, male participants had a higher blood pressuredistribution than females. Diabetes mellitus (DM), on the other hand, did nothave a gender-based inclination. Both CVD and DM had an age-based progression.Our study showed that chronic respiratory illness was more frequent inmiddle-aged participants than in younger or elderly individuals. Based on thedata, every one in five hospitalized patients was obese. We analyzed theco-morbidities and found that 31.5% of the population has only one NCD, 30.1%has two NCDs, and 38.3% has more than two NCDs. Besides, 86.25% of all diabeticpatients had cardiovascular issues. All thyroid patients in our study had CVD.Using a t-test, we found a relationship between CKD and thyroid (p-value0.061). Males under 35 years have a statistically significant relationshipbetween thyroid and chronic respiratory diseases (p-value 0.018). We also foundan association between DM and CKD among patients over 65 (p-value 0.038).Moreover, there has been a statistically significant relationship between CKDand Thyroid (P < 0.05) for those below 35 and 35-65. We used a two-way ANOVAtest to find the statistically significant interaction of heart issues andchronic respiratory illness, in combination with diabetes. The combination ofDM and RTI also affected CKD in male patients over 65 years old.", "output": "Prevalence and Major Risk Factors of Non-communicable Diseases: A Machine Learning based Cross-Sectional Study."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent work has shown the promise of creating generalist, transformer-based,models for language, vision, and sequential decision-making problems. To createsuch models, we generally require centralized training objectives, data, andcompute. It is of interest if we can more flexibly create generalist policiesby merging together multiple, task-specific, individually trained policies. Inthis work, we take a preliminary step in this direction through merging, oraveraging, subsets of Decision Transformers in parameter space trained ondifferent MuJoCo locomotion problems, forming multi-task models withoutcentralized training. We also show that when merging policies, we can obtainbetter results if all policies start from common, pre-trained initializations.We also find improvements from larger pre-trained models, and utilizing Fisherinformation for merging. In general, we believe research in this directioncould help democratize and distribute the process of which forms generallycapable models.", "output": "Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we present a robust incremental learning model for regressiontasks on temporal tabular datasets. Using commonly available tabular andtime-series prediction models as building blocks, a machine-learning model isbuilt incrementally to adapt to distributional shifts in data. Using theconcept of self-similarity, the model uses only two basic building blocks ofmachine learning models, gradient boosting decision trees and neural networksto build models for any required complexity. The model is efficient as nospecialised neural architectures are used and each model building block can beindependently trained in parallel. The model is demonstrated to have robustperformances under adverse situations such as regime changes, fat-taileddistributions and low signal-to-noise ratios. Model robustness are studiedunder different hyper-parameters and complexities.", "output": "Robust incremental learning pipelines for temporal tabular datasets with distribution shifts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Hamilton-Jacobi partial differential equations (HJ PDEs) have deepconnections with a wide range of fields, including optimal control,differential games, and imaging sciences. By considering the time variable tobe a higher dimensional quantity, HJ PDEs can be extended to the multi-timecase. In this paper, we establish a novel theoretical connection betweenspecific optimization problems arising in machine learning and the multi-timeHopf formula, which corresponds to a representation of the solution to certainmulti-time HJ PDEs. Through this connection, we increase the interpretabilityof the training process of certain machine learning applications by showingthat when we solve these learning problems, we also solve a multi-time HJ PDEand, by extension, its corresponding optimal control problem. As a firstexploration of this connection, we develop the relation between the regularizedlinear regression problem and the Linear Quadratic Regulator (LQR). We thenleverage our theoretical connection to adapt standard LQR solvers (namely,those based on the Riccati ordinary differential equations) to design newtraining approaches for machine learning. Finally, we provide some numericalexamples that demonstrate the versatility and possible computational advantagesof our Riccati-based approach in the context of continual learning,post-training calibration, transfer learning, and sparse dynamicsidentification.", "output": "Leveraging Multi-time Hamilton-Jacobi PDEs for Certain Scientific Machine Learning Problems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study multiclass online prediction where the learner can predict using alist of multiple labels (as opposed to just one label in the traditionalsetting). We characterize learnability in this model using the $b$-aryLittlestone dimension. This dimension is a variation of the classicalLittlestone dimension with the difference that binary mistake trees arereplaced with $(k+1)$-ary mistake trees, where $k$ is the number of labels inthe list. In the agnostic setting, we explore different scenarios depending onwhether the comparator class consists of single-labeled or multi-labeledfunctions and its tradeoff with the size of the lists the algorithm uses. Wefind that it is possible to achieve negative regret in some cases and provide acomplete characterization of when this is possible. As part of our work, weadapt classical algorithms such as Littlestone's SOA and Rosenblatt'sPerceptron to predict using lists of labels. We also establish combinatorialresults for list-learnable classes, including an list online version of theSauer-Shelah-Perles Lemma. We state our results within the framework of patternclasses -- a generalization of hypothesis classes which can represent adaptivehypotheses (i.e. functions with memory), and model data-dependent assumptionssuch as linear classification with margin.", "output": "List Online Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sparse joint shift (SJS) was recently proposed as a tractable model forgeneral dataset shift which may cause changes to the marginal distributions offeatures and labels as well as the posterior probabilities and theclass-conditional feature distributions. Fitting SJS for a target datasetwithout label observations may produce valid predictions of labels andestimates of class prior probabilities. We present new results on thetransmission of SJS from sets of features to larger sets of features, aconditional correction formula for the class posterior probabilities under thetarget distribution, identifiability of SJS, and the relationship between SJSand covariate shift. In addition, we point out inconsistencies in thealgorithms which were proposed for estimating the characteristics of SJS, asthey could hamper the search for optimal solutions.", "output": "Sparse joint shift in multinomial classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Entropy measures are effective features for time series classificationproblems. Traditional entropy measures, such as Shannon entropy, useprobability distribution function. However, for the effective separation oftime series, new entropy estimation methods are required to characterize thechaotic dynamic of the system. Our concept of Neural Network Entropy (NNetEn)is based on the classification of special datasets in relation to the entropyof the time series recorded in the reservoir of the neural network. NNetEnestimates the chaotic dynamics of time series in an original way and does nottake into account probability distribution functions. We propose two newclassification metrics: R2 Efficiency and Pearson Efficiency. The efficiency ofNNetEn is verified on separation of two chaotic time series of sine mappingusing dispersion analysis. For two close dynamic time series (r = 1.1918 and r= 1.2243), the F-ratio has reached the value of 124 and reflects highefficiency of the introduced method in classification problems. Theelectroenceph-alography signal classification for healthy persons and patientswith Alzheimer disease illustrates the practical application of the NNetEnfeatures. Our computations demonstrate the synergistic effect of increasingclassification accuracy when applying traditional entropy measures and theNNetEn concept conjointly. An implementation of the algorithms in Python ispresented.", "output": "Neural Network Entropy (NNetEn): Entropy-Based EEG Signal and Chaotic Time Series Classification, Python Package for NNetEn Calculation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce HATELEXICON, a lexicon of slurs and targets of hate speech forthe countries of Brazil, Germany, India and Kenya, to aid training andinterpretability of models. We demonstrate how our lexicon can be used tointerpret model predictions, showing that models developed to classify extremespeech rely heavily on target words when making predictions. Further, wepropose a method to aid shot selection for training in low-resource settingsvia HATELEXICON. In few-shot learning, the selection of shots is of paramountimportance to model performance. In our work, we simulate a few-shot settingfor German and Hindi, using HASOC data for training and the MultilingualHateCheck (MHC) as a benchmark. We show that selecting shots based on ourlexicon leads to models performing better on MHC than models trained on shotssampled randomly. Thus, when given only a few training examples, using ourlexicon to select shots containing more sociocultural information leads tobetter few-shot performance.", "output": "Sociocultural knowledge is needed for selection of shots in hate speech detection tasks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Exploration is a fundamental aspect of reinforcement learning (RL), and itseffectiveness is a deciding factor in the performance of RL algorithms,especially when facing sparse extrinsic rewards. Recent studies have shown theeffectiveness of encouraging exploration with intrinsic rewards estimated fromnovelties in observations. However, there is a gap between the novelty of anobservation and an exploration, as both the stochasticity in the environmentand the agent's behavior may affect the observation. To evaluate exploratorybehaviors accurately, we propose DEIR, a novel method in which we theoreticallyderive an intrinsic reward with a conditional mutual information term thatprincipally scales with the novelty contributed by agent explorations, and thenimplement the reward with a discriminative forward model. Extensive experimentson both standard and advanced exploration tasks in MiniGrid show that DEIRquickly learns a better policy than the baselines. Our evaluations on ProcGendemonstrate both the generalization capability and the general applicability ofour intrinsic reward. Our source code is available at", "output": "DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Eye movements during reading offer insights into both the reader's cognitiveprocesses and the characteristics of the text that is being read. Hence, theanalysis of scanpaths in reading have attracted increasing attention acrossfields, ranging from cognitive science over linguistics to computer science. Inparticular, eye-tracking-while-reading data has been argued to bear thepotential to make machine-learning-based language models exhibit a morehuman-like linguistic behavior. However, one of the main challenges in modelinghuman scanpaths in reading is their dual-sequence nature: the words are orderedfollowing the grammatical rules of the language, whereas the fixations arechronologically ordered. As humans do not strictly read from left-to-right, butrather skip or refixate words and regress to previous words, the alignment ofthe linguistic and the temporal sequence is non-trivial. In this paper, wedevelop Eyettention, the first dual-sequence model that simultaneouslyprocesses the sequence of words and the chronological sequence of fixations.The alignment of the two sequences is achieved by a cross-sequence attentionmechanism. We show that Eyettention outperforms state-of-the-art models inpredicting scanpaths. We provide an extensive within- and across-data setevaluation on different languages. An ablation study and qualitative analysissupport an in-depth understanding of the model's behavior.", "output": "Eyettention: An Attention-based Dual-Sequence Model for Predicting Human Scanpaths during Reading."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks, which employ batch normalization and ReLU-likeactivation functions, suffer from instability in the early stages of trainingdue to the high gradient induced by temporal gradient explosion. In this study,we analyze the occurrence and mitigation of gradient explosion boththeoretically and empirically, and discover that the correlation betweenactivations plays a key role in preventing the gradient explosion frompersisting throughout the training. Finally, based on our observations, wepropose an improved adaptive learning rate algorithm to effectively control thetraining instability.", "output": "The Disharmony between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation between Activations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial training is an effective learning technique to improve therobustness of deep neural networks. In this study, the influence of adversarialtraining on deep learning models in terms of fairness, robustness, andgeneralization is theoretically investigated under more general perturbationscope that different samples can have different perturbation directions (theadversarial and anti-adversarial directions) and varied perturbation bounds.Our theoretical explorations suggest that the combination of adversaries andanti-adversaries (samples with anti-adversarial perturbations) in training canbe more effective in achieving better fairness between classes and a bettertradeoff between robustness and generalization in some typical learningscenarios (e.g., noisy label learning and imbalance learning) compared withstandard adversarial training. On the basis of our theoretical findings, a moregeneral learning objective that combines adversaries and anti-adversaries withvaried bounds on each training sample is presented. Meta learning is utilizedto optimize the combination weights. Experiments on benchmark datasets underdifferent learning scenarios verify our theoretical findings and theeffectiveness of the proposed methodology.", "output": "Combining Adversaries with Anti-adversaries in Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Segment Anything Model (SAM) is the first foundation model for generalimage segmentation. It designed a novel promotable segmentation task, ensuringzero-shot image segmentation using the pre-trained model via two main modesincluding automatic everything and manual prompt. SAM has achieved impressiveresults on various natural image segmentation tasks. However, medical imagesegmentation (MIS) is more challenging due to the complex modalities, fineanatomical structures, uncertain and complex object boundaries, and wide-rangeobject scales. Meanwhile, zero-shot and efficient MIS can well reduce theannotation time and boost the development of medical image analysis. Hence, SAMseems to be a potential tool and its performance on large medical datasetsshould be further validated. We collected and sorted 52 open-source datasets,and built a large medical segmentation dataset with 16 modalities, 68 objects,and 553K slices. We conducted a comprehensive analysis of different SAM testingstrategies on the so-called COSMOS 553K dataset. Extensive experiments validatethat SAM performs better with manual hints like points and boxes for objectperception in medical images, leading to better performance in prompt modecompared to everything mode. Additionally, SAM shows remarkable performance insome specific objects and modalities, but is imperfect or even totally fails inother situations. Finally, we analyze the influence of different factors (e.g.,the Fourier-based boundary complexity and size of the segmented objects) onSAM's segmentation performance. Extensive experiments validate that SAM'szero-shot segmentation capability is not sufficient to ensure its directapplication to the MIS.", "output": "Segment Anything Model for Medical Images?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Kernelized Stein discrepancy (KSD) is a score-based discrepancy widely usedin goodness-of-fit tests. It can be applied even when the target distributionhas an unknown normalising factor, such as in Bayesian analysis. We showtheoretically and empirically that the KSD test can suffer from low power whenthe target and the alternative distribution have the same well-separated modesbut differ in mixing proportions. We propose to perturb the observed sample viaMarkov transition kernels, with respect to which the target distribution isinvariant. This allows us to then employ the KSD test on the perturbed sample.We provide numerical evidence that with suitably chosen kernels the proposedapproach can lead to a substantially higher power than the KSD test.", "output": "Using Perturbation to Improve Goodness-of-Fit Tests based on Kernelized Stein Discrepancy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite remarkable successes, deep reinforcement learning algorithms remainsample inefficient: they require an enormous amount of trial and error to findgood policies. Model-based algorithms promise sample efficiency by building anenvironment model that can be used for planning. Posterior Sampling forReinforcement Learning is such a model-based algorithm that has attractedsignificant interest due to its performance in the tabular setting. This paperintroduces Posterior Sampling for Deep Reinforcement Learning (PSDRL), thefirst truly scalable approximation of Posterior Sampling for ReinforcementLearning that retains its model-based essence. PSDRL combines efficientuncertainty quantification over latent state space models with a speciallytailored continual planning algorithm based on value-function approximation.Extensive experiments on the Atari benchmark show that PSDRL significantlyoutperforms previous state-of-the-art attempts at scaling up posterior samplingwhile being competitive with a state-of-the-art (model-based) reinforcementlearning method, both in sample efficiency and computational efficiency.", "output": "Posterior Sampling for Deep Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large Language Models (LLMs) have significantly advanced natural languageprocessing (NLP) with their impressive language understanding and generationcapabilities. However, their performance may be suboptimal for domain-specifictasks that require specialized knowledge due to limited exposure to the relateddata. Additionally, the lack of transparency of most state-of-the-art (SOTA)LLMs, which can only be accessed via APIs, impedes further fine-tuning withdomain custom data. Moreover, providing private data to the LLMs' owner leadsto data privacy problems. To address these challenges, we propose the novelParametric Knowledge Guiding (PKG) framework, which equips LLMs with aknowledge-guiding module to access relevant knowledge without altering theLLMs' parameters. Our PKG is based on open-source \"white-box\" language models,allowing offline memory of any knowledge that LLMs require. We demonstrate thatour PKG framework can enhance the performance of \"black-box\" LLMs on a range ofdomain knowledge-intensive tasks that require factual (+7.9%), tabular(+11.9%), medical (+3.0%), and multimodal (+8.1%) knowledge.", "output": "Augmented Large Language Models with Parametric Knowledge Guiding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances in self-supervised learning and neural network scaling haveenabled the creation of large models, known as foundation models, which can beeasily adapted to a wide range of downstream tasks. The current paradigm forcomparing foundation models involves evaluating them with aggregate metrics onvarious benchmark datasets. This method of model comparison is heavilydependent on the chosen evaluation metric, which makes it unsuitable forsituations where the ideal metric is either not obvious or unavailable. In thiswork, we present a methodology for directly comparing the embedding spacegeometry of foundation models, which facilitates model comparison without theneed for an explicit evaluation metric. Our methodology is grounded in randomgraph theory and enables valid hypothesis testing of embedding similarity on aper-datum basis. Further, we demonstrate how our methodology can be extended tofacilitate population level model comparison. In particular, we show how ourframework can induce a manifold of models equipped with a distance functionthat correlates strongly with several downstream metrics. We remark on theutility of this population level model comparison as a first step towards ataxonomic science of foundation models.", "output": "Comparing Foundation Models using Data Kernels."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This research paper describes a realtime system for identifying American SignLanguage (ASL) movements that employs modern computer vision and machinelearning approaches. The suggested method makes use of the Mediapipe libraryfor feature extraction and a Convolutional Neural Network (CNN) for ASL gestureclassification. The testing results show that the suggested system can detectall ASL alphabets with an accuracy of 99.95%, indicating its potential for usein communication devices for people with hearing impairments. The proposedapproach can also be applied to additional sign languages with similar handmotions, potentially increasing the quality of life for people with hearingloss. Overall, the study demonstrates the effectiveness of using Mediapipe andCNN for real-time sign language recognition, making a significant contributionto the field of computer vision and machine learning.", "output": "Mediapipe and CNNs for Real-Time ASL Gesture Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We show the convergence of Wasserstein inverse reinforcement learning formulti-objective optimizations with the projective subgradient method byformulating an inverse problem of the multi-objective optimization problem. Inaddition, we prove convergence of inverse reinforcement learning (maximumentropy inverse reinforcement learning, guided cost learning) with gradientdescent and the projective subgradient method.", "output": "A proof of convergence of inverse reinforcement learning for multi-objective optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Hawkes processes are a popular framework to model the occurrence ofsequential events, i.e., occurrence dynamics, in several fields such as socialdiffusion. In real-world scenarios, the inter-arrival time among events isirregular. However, existing neural network-based Hawkes process models notonly i) fail to capture such complicated irregular dynamics, but also ii)resort to heuristics to calculate the log-likelihood of events since they aremostly based on neural networks designed for regular discrete inputs. To thisend, we present the concept of Hawkes process based on controlled differentialequations (HP-CDE), by adopting the neural controlled differential equation(neural CDE) technology which is an analogue to continuous RNNs. Since HP-CDEcontinuously reads data, i) irregular time-series datasets can be properlytreated preserving their uneven temporal spaces, and ii) the log-likelihood canbe exactly computed. Moreover, as both Hawkes processes and neural CDEs arefirst developed to model complicated human behavioral dynamics, neuralCDE-based Hawkes processes are successful in modeling such occurrence dynamics.In our experiments with 4 real-world datasets, our method outperforms existingmethods by non-trivial margins.", "output": "Hawkes Process Based on Controlled Differential Equations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Contrastive learning-based methods, such as unsup-SimCSE, have achievedstate-of-the-art (SOTA) performances in learning unsupervised sentenceembeddings. However, in previous studies, each embedding used for contrastivelearning only derived from one sentence instance, and we call these embeddingsinstance-level embeddings. In other words, each embedding is regarded as aunique class of its own, whichmay hurt the generalization performance. In thisstudy, we propose IS-CSE (instance smoothing contrastive sentence embedding) tosmooth the boundaries of embeddings in the feature space. Specifically, weretrieve embeddings from a dynamic memory buffer according to the semanticsimilarity to get a positive embedding group. Then embeddings in the group areaggregated by a self-attention operation to produce a smoothed instanceembedding for further analysis. We evaluate our method on standard semantictext similarity (STS) tasks and achieve an average of 78.30%, 79.47%, 77.73%,and 79.42% Spearman's correlation on the base of BERT-base, BERT-large,RoBERTa-base, and RoBERTa-large respectively, a 2.05%, 1.06%, 1.16% and 0.52%improvement compared to unsup-SimCSE.", "output": "Instance Smoothed Contrastive Learning for Unsupervised Sentence Embedding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large Language Models (LLMs) have made remarkable strides in natural languageprocessing, but their expanding size poses challenges in terms of computationalexpense and inefficiency. Conversely, Small Language Models (SLMs) are knownfor their efficiency but often encounter difficulties in tasks with limitedcapacity and training data, particularly in domain-specific scenarios. In thispaper, we introduce Dr. LLaMA, a method that improves SLMs in the medicaldomain through generative data augmentation utilizing LLMs. The objective is todevelop more efficient and capable models tailored for specializedapplications. Our preliminary results on the PubMedQA dataset demonstrate thatLLMs effectively refine and diversify existing question-answer pairs, leadingto improved performance of a significantly smaller model after fine-tuning. Thebest SLM surpasses few-shot GPT-4 with under 1.6 billion parameters on thePubMedQA. Our code and generated data are publicly available to facilitatefurther explorations.", "output": "Dr. LLaMA: Improving Small Language Models on PubMedQA via Generative Data Augmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data-driven industrial health prognostics require rich training data todevelop accurate and reliable predictive models. However, stringent dataprivacy laws and the abundance of edge industrial data necessitatedecentralized data utilization. Thus, the industrial health prognostics fieldis well suited to significantly benefit from federated learning (FL), adecentralized and privacy-preserving learning technique. However, FL-basedhealth prognostics tasks have hardly been investigated due to the complexitiesof meaningfully aggregating model parameters trained from heterogeneous data toform a high performing federated model. Specifically, data heterogeneity amongedge devices, stemming from dissimilar degradation mechanisms and unequaldataset sizes, poses a critical statistical challenge for developing accuratefederated models. We propose a pioneering FL-based health prognostic model witha feature similarity-matched parameter aggregation algorithm todiscriminatingly learn from heterogeneous edge data. The algorithm searchesacross the heterogeneous locally trained models and matches neurons withprobabilistically similar feature extraction functions first, beforeselectively averaging them to form the federated model parameters. As thealgorithm only averages similar neurons, as opposed to conventional naiveaveraging of coordinate-wise neurons, the distinct feature extractors of localmodels are carried over with less dilution to the resultant federated model.Using both cyclic degradation data of Li-ion batteries and non-cyclic data ofturbofan engines, we demonstrate that the proposed method yields accuracyimprovements as high as 44.5% and 39.3% for state-of-health estimation andremaining useful life estimation, respectively.", "output": "A Federated Learning-based Industrial Health Prognostics for Heterogeneous Edge Devices using Matched Feature Extraction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Message passing neural networks (MPNNs) have been shown to suffer from thephenomenon of over-squashing that causes poor performance for tasks relying onlong-range interactions. This can be largely attributed to message passing onlyoccurring locally, over a node's immediate neighbours. Rewiring approachesattempting to make graphs 'more connected', and supposedly better suited tolong-range tasks, often lose the inductive bias provided by distance on thegraph since they make distant nodes communicate instantly at every layer. Inthis paper we propose a framework, applicable to any MPNN architecture, thatperforms a layer-dependent rewiring to ensure gradual densification of thegraph. We also propose a delay mechanism that permits skip connections betweennodes depending on the layer and their mutual distance. We validate ourapproach on several long-range tasks and show that it outperforms graphTransformers and multi-hop MPNNs.", "output": "DRew: Dynamically Rewired Message Passing with Delay."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Self-supervised learning (SSL) speech models such as wav2vec and HuBERT havedemonstrated state-of-the-art performance on automatic speech recognition (ASR)and proved to be extremely useful in low label-resource settings. However, thesuccess of SSL models has yet to transfer to utterance-level tasks such asspeaker, emotion, and language recognition, which still require supervisedfine-tuning of the SSL models to obtain good performance. We argue that theproblem is caused by the lack of disentangled representations and anutterance-level learning objective for these tasks. Inspired by how HuBERT usesclustering to discover hidden acoustic units, we formulate a factor analysis(FA) model that uses the discovered hidden acoustic units to align the SSLfeatures. The underlying utterance-level representations are disentangled fromthe content of speech using probabilistic inference on the aligned features.Furthermore, the variational lower bound derived from the FA model provides anutterance-level objective, allowing error gradients to be backpropagated to theTransformer layers to learn highly discriminative acoustic units. When used inconjunction with HuBERT's masked prediction training, our models outperform thecurrent best model, WavLM, on all utterance-level non-semantic tasks on theSUPERB benchmark with only 20% of labeled data.", "output": "Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, deep reinforcement learning (RL) has been proposed to learn thetractography procedure and train agents to reconstruct the structure of thewhite matter without manually curated reference streamlines. While theperformances reported were competitive, the proposed framework is complex, andlittle is still known about the role and impact of its multiple parts. In thiswork, we thoroughly explore the different components of the proposed framework,such as the choice of the RL algorithm, seeding strategy, the input signal andreward function, and shed light on their impact. Approximately 7,400 modelswere trained for this work, totalling nearly 41,000 hours of GPU time. Our goalis to guide researchers eager to explore the possibilities of deep RL fortractography by exposing what works and what does not work with the category ofapproach. As such, we ultimately propose a series of recommendations concerningthe choice of RL algorithm, the input to the agents, the reward function andmore to help future work using reinforcement learning for tractography. We alsorelease the open source codebase, trained models, and datasets for users andresearchers wanting to explore reinforcement learning for tractography.", "output": "What Matters in Reinforcement Learning for Tractography."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The widespread adoption of Internet of Things (IoT) devices in smart cities,intelligent healthcare systems, and various real-world applications haveresulted in the generation of vast amounts of data, often analyzed usingdifferent Machine Learning (ML) models. Federated learning (FL) has beenacknowledged as a privacy-preserving machine learning technology, wheremultiple parties cooperatively train ML models without exchanging raw data.However, the current FL architecture does not allow for an audit of thetraining process due to the various data-protection policies implemented byeach FL participant. Furthermore, there is no global model verifiabilityavailable in the current architecture. This paper proposes a smartcontract-based policy control for securing the Federated Learning (FL)management system. First, we develop and deploy a smart contract-based localtraining policy control on the FL participants' side. This policy control isused to verify the training process, ensuring that the evaluation processfollows the same rules for all FL participants. We then enforce a smartcontract-based aggregation policy to manage the global model aggregationprocess. Upon completion, the aggregated model and policy are stored onblockchain-based storage. Subsequently, we distribute the aggregated globalmodel and the smart contract to all FL participants. Our proposed method usessmart policy control to manage access and verify the integrity of machinelearning models. We conducted multiple experiments with various machinelearning architectures and datasets to evaluate our proposed framework, such asMNIST and CIFAR-10.", "output": "Smart Policy Control for Securing Federated Learning Management System."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this dataset we provide a comprehensive collection of magnetograms (imagesquantifying the strength of the magnetic field) from the National Aeronauticsand Space Administration's (NASA's) Solar Dynamics Observatory (SDO). Thedataset incorporates data from three sources and provides SDO Helioseismic andMagnetic Imager (HMI) magnetograms of solar active regions (regions of largemagnetic flux, generally the source of eruptive events) as well as labels ofcorresponding flaring activity. This dataset will be useful for image analysisor solar physics research related to magnetic structure, its evolution overtime, and its relation to solar flares. The dataset will be of interest tothose researchers investigating automated solar flare prediction methods,including supervised and unsupervised machine learning (classical and deep),binary and multi-class classification, and regression. This dataset is aminimally processed, user configurable dataset of consistently sized images ofsolar active regions that can serve as a benchmark dataset for solar flareprediction research.", "output": "Solar Active Region Magnetogram Image Dataset for Studies of Space Weather."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Protecting user privacy is a major concern for many machine learning systemsthat are deployed at scale and collect from a diverse set of population. Oneway to address this concern is by collecting and releasing data labels in anaggregated manner so that the information about a single user is potentiallycombined with others. In this paper, we explore the possibility of trainingmachine learning models with aggregated data labels, rather than individuallabels. Specifically, we consider two natural aggregation procedures suggestedby practitioners: curated bags where the data points are grouped based oncommon features and random bags where the data points are grouped randomly inbag of similar sizes. For the curated bag setting and for a broad range of lossfunctions, we show that we can perform gradient-based learning without anydegradation in performance that may result from aggregating data. Our method isbased on the observation that the sum of the gradients of the loss function onindividual data examples in a curated bag can be computed from the aggregatelabel without the need for individual labels. For the random bag setting, weprovide a generalization risk bound based on the Rademacher complexity of thehypothesis class and show how empirical risk minimization can be regularized toachieve the smallest risk bound. In fact, in the random bag setting, there is atrade-off between size of the bag and the achievable error rate as our boundindicates. Finally, we conduct a careful empirical study to confirm ourtheoretical findings. In particular, our results suggest that aggregatelearning can be an effective method for preserving user privacy whilemaintaining model accuracy.", "output": "Learning from Aggregated Data: Curated Bags versus Random Bags."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances in machine translation (MT) have shown that Minimum BayesRisk (MBR) decoding can be a powerful alternative to beam search decoding,especially when combined with neural-based utility functions. However, theperformance of MBR decoding depends heavily on how and how many candidates aresampled from the model. In this paper, we explore how different samplingapproaches for generating candidate lists for MBR decoding affect performance.We evaluate popular sampling approaches, such as ancestral, nucleus, and top-ksampling. Based on our insights into their limitations, we experiment with therecently proposed epsilon-sampling approach, which prunes away all tokens witha probability smaller than epsilon, ensuring that each token in a samplereceives a fair probability mass. Through extensive human evaluations, wedemonstrate that MBR decoding based on epsilon-sampling significantlyoutperforms not only beam search decoding, but also MBR decoding with all othertested sampling methods across four language pairs.", "output": "Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum Bayes Risk Decoding for Machine Translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transgender and non-binary (TGNB) individuals disproportionately experiencediscrimination and exclusion from daily life. Given the recent popularity andadoption of language generation technologies, the potential to furthermarginalize this population only grows. Although a multitude of NLP fairnessliterature focuses on illuminating and addressing gender biases, assessinggender harms for TGNB identities requires understanding how such identitiesuniquely interact with societal gender norms and how they differ from genderbinary-centric perspectives. Such measurement frameworks inherently requirecentering TGNB voices to help guide the alignment between gender-inclusive NLPand whom they are intended to serve. Towards this goal, we ground our work inthe TGNB community and existing interdisciplinary literature to assess how thesocial reality surrounding experienced marginalization by TGNB personscontributes to and persists within Open Language Generation (OLG). By firstunderstanding their marginalization stressors, we evaluate (1) misgendering and(2) harmful responses to gender disclosure. To do this, we introduce the TANGOdataset, comprising of template-based text curated from real-world text withina TGNB-oriented community. We discover a dominance of binary gender normswithin the models; LLMs least misgendered subjects in generated text whentriggered by prompts whose subjects used binary pronouns. Meanwhile,misgendering was most prevalent when triggering generation with singular theyand neopronouns. When prompted with gender disclosures, LLM text containedstigmatizing language and scored most toxic when triggered by TGNB genderdisclosure. Our findings warrant further research on how TGNB harms manifest inLLMs and serve as a broader case study toward concretely grounding the designof gender-inclusive AI in community voices and interdisciplinary literature.", "output": "\"I'm fully who I am\": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We prove Wasserstein inverse reinforcement learning enables the learner'sreward values to imitate the expert's reward values in a finite iteration formulti-objective optimizations. Moreover, we prove Wasserstein inversereinforcement learning enables the learner's optimal solutions to imitate theexpert's optimal solutions for multi-objective optimizations with lexicographicorder.", "output": "A proof of imitation of Wasserstein inverse reinforcement learning for multi-objective optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent popularity of large language models (LLMs) has brought asignificant impact to boundless fields, particularly through their open-endedecosystem such as the APIs, open-sourced models, and plugins. However, withtheir widespread deployment, there is a general lack of research thatthoroughly discusses and analyzes the potential risks concealed. In that case,we intend to conduct a preliminary but pioneering study covering therobustness, consistency, and credibility of LLMs systems. With most of therelated literature in the era of LLM uncharted, we propose an automatedworkflow that copes with an upscaled number of queries/responses. Overall, weconduct over a million queries to the mainstream LLMs including ChatGPT, LLaMA,and OPT. Core to our workflow consists of a data primitive, followed by anautomated interpreter that evaluates these LLMs under different adversarialmetrical systems. As a result, we draw several, and perhaps unfortunate,conclusions that are quite uncommon from this trendy community. Briefly, theyare: (i)-the minor but inevitable error occurrence in the user-generated queryinput may, by chance, cause the LLM to respond unexpectedly; (ii)-LLMs possesspoor consistency when processing semantically similar query input. In addition,as a side finding, we find that ChatGPT is still capable to yield the correctanswer even when the input is polluted at an extreme level. While thisphenomenon demonstrates the powerful memorization of the LLMs, it raisesserious concerns about using such data for LLM-involved evaluation in academicdevelopment. To deal with it, we propose a novel index associated with adataset that roughly decides the feasibility of using such data forLLM-involved evaluation. Extensive empirical studies are tagged to support theaforementioned claims.", "output": "Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility."}] \ No newline at end of file