From a9821f1b0bf47240832382aa31e236d14bd86c79 Mon Sep 17 00:00:00 2001 From: wangrongsheng Date: Sat, 6 May 2023 15:26:22 +0800 Subject: [PATCH] * update 2023-05-06 15:26:22 --- data/2023-05-06.json | 1 + 1 file changed, 1 insertion(+) create mode 100644 data/2023-05-06.json diff --git a/data/2023-05-06.json b/data/2023-05-06.json new file mode 100644 index 0000000..9199dc1 --- /dev/null +++ b/data/2023-05-06.json @@ -0,0 +1 @@ +[{"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Taking on a historical lens, this paper traces the development of cyberneticsand systems thinking back to the 1950s, when a group of interdisciplinaryscholars converged to create a new theoretical model based on machines andsystems for understanding matters of meaning, information, consciousness, andlife. By presenting a genealogy of research in the landscape architecturediscipline, the paper argues that landscape architects have been an importantpart of the development of cybernetics by materializing systems based oncybernetic principles in the environment through ecologically based landscapedesign. The landscape discipline has developed a design framework that providestransformative insights into understanding machine intelligence. The papercalls for a new paradigm of environmental engagement to understand matters ofdesign and machine intelligence.", "output": "Cybernetic Environment: A Historical Reflection on System, Design, and Machine Intelligence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There have been theory-based endeavours that directly engage with AI and MLin the landscape discipline. By presenting a case that uses machine learningtechniques to predict variables in a coastal environment, this paper providesempirical evidence of the forthcoming cybernetic environment, in whichdesigners are conceptualized not as authors but as choreographers, catalystagents, and conductors among many other intelligent agents. Drawing ideas fromposthumanism, this paper argues that, to truly understand the cyberneticenvironment, we have to take on posthumanist ethics and overcome humanexceptionalism.", "output": "The Future of Artificial Intelligence (AI) and Machine Learning (ML) in Landscape Design: A Case Study in Coastal Virginia, USA."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper investigates the idea of cultivated wildness at the intersectionof landscape design and artificial intelligence. The paper posits thatcontemporary landscape practices should overcome the potentially singleunderstanding on wilderness, and instead explore landscape strategies tocultivate new forms of wild places via ideas and concerns in contemporaryEnvironmental Humanities, Science and Technology Studies, Ecological Sciences,and Landscape Architecture. Drawing cases in environmental engineering,computer science, and landscape architecture research, this paper explores aframework to construct wild places with intelligent machines. In thisframework, machines are not understood as a layer of \"digital infrastructure\"that is used to extend localized human intelligence and agency. Rather machinesare conceptualized as active agents who can participate in the intelligence ofco-production. Recent developments in cybernetic technologies such as sensingnetworks, artificial intelligence, and cyberphysical systems can alsocontribute to establishing the framework. At the heart of this framework is\"technodiversity,\" in parallel with biodiversity, since a singular vision ontechnological development driven by optimization and efficiency reinforces amonocultural approach that eliminates other possible relationships to constructwith the environment. Thus, cultivated wildness is also about recognizing\"wildness\" in machines.", "output": "Cultivated Wildness: Technodiversity and Wildness in Machines."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Consumers are exposed to advertisements across many different domains on theinternet, such as fashion, beauty, car, food, and others. On the other hand,fashion represents second highest e-commerce shopping category. Does consumerdigital record behavior on various fashion ad images reveal their fashiontaste? Does ads from other domains infer their fashion taste as well? In thispaper, we study the correlation between advertisements and fashion taste.Towards this goal, we introduce a new dataset, Fashionpedia-Ads, which askssubjects to provide their preferences on both ad (fashion, beauty, car, anddessert) and fashion product (social network and e-commerce style) images.Furthermore, we exhaustively collect and annotate the emotional, visual andtextual information on the ad images from multi-perspectives (abstractivelevel, physical level, captions, and brands). We open-source Fashionpedia-Adsto enable future studies and encourage more approaches to interpretabilityresearch between advertisements and fashion taste.", "output": "Fashionpedia-Ads: Do Your Favorite Advertisements Reveal Your Fashion Taste?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As Machine Learning models are considered for autonomous decisions withsignificant social impact, the need for understanding how these models workrises rapidly. Explainable Artificial Intelligence (XAI) aims to provideinterpretations for predictions made by Machine Learning models, in order tomake the model trustworthy and more transparent for the user. For example,selecting relevant input variables for the problem directly impacts the model'sability to learn and make accurate predictions, so obtaining information aboutinput importance play a crucial role when training the model. One of the mainXAI techniques to obtain input variable importance is the sensitivity analysisbased on partial derivatives. However, existing literature of this methodprovide no justification of the aggregation metrics used to retrievedinformation from the partial derivatives.In this paper, a theoretical framework is proposed to study sensitivities ofML models using metric techniques. From this metric interpretation, a completefamily of new quantitative metrics called $alpha$-curves is extracted. These$alpha$-curves provide information with greater depth on the importance of theinput variables for a machine learning model than existing XAI methods in theliterature. We demonstrate the effectiveness of the $alpha$-curves usingsynthetic and real datasets, comparing the results against other XAI methodsfor variable importance and validating the analysis results with the groundtruth or literature information.", "output": "Metric Tools for Sensitivity Analysis with Applications to Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Knowledge graph reasoning (KGR) -- answering complex logical queries overlarge knowledge graphs -- represents an important artificial intelligence task,entailing a range of applications (e.g., cyber threat hunting). However,despite its surging popularity, the potential security risks of KGR are largelyunexplored, which is concerning, given the increasing use of such capability insecurity-critical domains.This work represents a solid initial step towards bridging the striking gap.We systematize the security threats to KGR according to the adversary'sobjectives, knowledge, and attack vectors. Further, we present ROAR, a newclass of attacks that instantiate a variety of such threats. Through empiricalevaluation in representative use cases (e.g., medical decision support, cyberthreat hunting, and commonsense reasoning), we demonstrate that ROAR is highlyeffective to mislead KGR to suggest pre-defined answers for target queries, yetwith negligible impact on non-target ones. Finally, we explore potentialcountermeasures against ROAR, including filtering of potentially poisoningknowledge and training with adversarially augmented queries, which leads toseveral promising research directions.", "output": "On the Security Risks of Knowledge Graph Reasoning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Textual backdoor attack, as a novel attack model, has been shown to beeffective in adding a backdoor to the model during training. Defending againstsuch backdoor attacks has become urgent and important. In this paper, wepropose AttDef, an efficient attribution-based pipeline to defend against twoinsertion-based poisoning attacks, BadNL and InSent. Specifically, we regardthe tokens with larger attribution scores as potential triggers since largerattribution words contribute more to the false prediction results and thereforeare more likely to be poison triggers. Additionally, we further utilize anexternal pre-trained language model to distinguish whether input is poisoned ornot. We show that our proposed method can generalize sufficiently well in twocommon attack scenarios (poisoning training data and testing data), whichconsistently improves previous methods. For instance, AttDef can successfullymitigate both attacks with an average accuracy of 79.97% (56.59% up) and 48.34%(3.99% up) under pre-training and post-training attack defense respectively,achieving the new state-of-the-art performance on prediction recovery over fourbenchmark datasets.", "output": "Defending against Insertion-based Textual Backdoor Attacks via Attribution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the increasing number and sophistication of malware attacks, malwaredetection systems based on machine learning (ML) grow in importance. At thesame time, many popular ML models used in malware classification are supervisedsolutions. These supervised classifiers often do not generalize well to novelmalware. Therefore, they need to be re-trained frequently to detect new malwarespecimens, which can be time-consuming. Our work addresses this problem in ahybrid framework of theoretical Quantum ML, combined with feature selectionstrategies to reduce the data size and malware classifier training time. Thepreliminary results show that VQC with XGBoost selected features can get a78.91% test accuracy on the simulator. The average accuracy for the modeltrained using the features selected with XGBoost was 74% (+- 11.35%) on the IBM5 qubits machines.", "output": "Can Feature Engineering Help Quantum Machine Learning for Malware Detection?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Pre-trained large language models (LLMs) capture procedural knowledge aboutthe world. Recent work has leveraged LLM's ability to generate abstract plansto simplify challenging control tasks, either by action scoring, or actionmodeling (fine-tuning). However, the transformer architecture inherits severalconstraints that make it difficult for the LLM to directly serve as the agent:e.g. limited input lengths, fine-tuning inefficiency, bias from pre-training,and incompatibility with non-text environments. To maintain compatibility witha low-level trainable actor, we propose to instead use the knowledge in LLMs tosimplify the control problem, rather than solving it. We propose the Plan,Eliminate, and Track (PET) framework. The Plan module translates a taskdescription into a list of high-level sub-tasks. The Eliminate module masks outirrelevant objects and receptacles from the observation for the currentsub-task. Finally, the Track module determines whether the agent hasaccomplished each sub-task. On the AlfWorld instruction following benchmark,the PET framework leads to a significant 15% improvement over SOTA forgeneralization to human goal specifications.", "output": "Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent studies show that prompt tuning can better leverage the power of largelanguage models than fine-tuning on downstream natural language understandingtasks. However, the existing prompt tuning methods have training instabilityissues, as the variance of scores under different random seeds is quite large.To address this critical problem, we first investigate and find that the losslandscape of vanilla prompt tuning is precipitous when it is visualized, wherea slight change of input data can cause a big fluctuation in the losslandscape. This is an essential factor that leads to the instability of prompttuning. Based on this observation, we introduce perturbation-basedregularizers, which can smooth the loss landscape, into prompt tuning. Wepropose a new algorithm, called Prompt Tuning with Perturbation-basedregularizer~(PTP), which can not only alleviate training instabilitydramatically but also boost the performance of prompt tuning. We design twokinds of perturbation-based regularizers, including random-noise-based andadversarial-based. In particular, our proposed perturbations are flexible onboth text space and embedding space. Extensive experiments show theeffectiveness of our proposed methods in stabilizing the training. Our newalgorithms improve the state-of-the-art prompt tuning methods by 1.94% and2.34% on SuperGLUE and FewGLUE benchmarks, respectively.", "output": "PTP: Boosting Stability and Performance of Prompt Tuning with Perturbation-Based Regularizer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With direct access to human-written reference as memory, retrieval-augmentedgeneration has achieved much progress in a wide range of text generation tasks.Since better memory would typically prompt better generation~(we define this asprimal problem), previous works mainly focus on how to retrieve better memory.However, one fundamental limitation exists for current literature: the memoryis retrieved from a fixed corpus and is bounded by the quality of the corpus.Due to the finite retrieval space, bounded memory would greatly limit thepotential of the memory-augmented generation model. In this paper, by exploringthe duality of the primal problem: better generation also prompts bettermemory, we propose a framework called Selfmem, which iteratively adopts aretrieval-augmented generator itself to generate an unbounded memory pool anduses a memory selector to pick one generated memory for the next generationround. By combining the primal and dual problem, a retrieval-augmentedgeneration model could lift itself up with its own output in the infinitegeneration space. To verify our framework, we conduct extensive experimentsacross various text generation scenarios including neural machine translation,abstractive summarization and dialogue generation over seven datasets andachieve state-of-the-art results in JRC-Acquis(four directions), XSum(50.3ROUGE-1) and BigPatent(62.9 ROUGE-1).", "output": "Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Minimal trap spaces (MTSs) capture subspaces in which the Boolean dynamics istrapped, whatever the update mode. They correspond to the attractors of themost permissive mode. Due to their versatility, the computation of MTSs hasrecently gained traction, essentially by focusing on their enumeration. In thispaper, we address the logical reasoning on universal properties of MTSs in thescope of two problems: the reprogramming of Boolean networks for identifyingthe permanent freeze of Boolean variables that enforce a given property on allthe MTSs, and the synthesis of Boolean networks from universal properties ontheir MTSs. Both problems reduce to solving the satisfiability of quantifiedpropositional logic formula with 3 levels of quantifiers($existsforallexists$). In this paper, we introduce a Counter-Example GuidedRefinement Abstraction (CEGAR) to efficiently solve these problems by couplingthe resolution of two simpler formulas. We provide a prototype relying onAnswer-Set Programming for each formula and show its tractability on a widerange of Boolean models of biological networks.", "output": "Tackling Universal Properties of Minimal Trap Spaces of Boolean Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This is a speculative essay on interface design and artificial intelligence.Recently there has been a surge of attention to chatbots based on largelanguage models, including widely reported unsavory interactions. We contendthat part of the problem is that text is not all you need: sophisticated AIsystems should have dashboards, just like all other complicated devices.Assuming the hypothesis that AI systems based on neural networks will containinterpretable models of aspects of the world around them, we discuss what datasuch dashboards might display. We conjecture that, for many systems, the twomost important models will be of the user and of the system itself. We callthese the System Model and User Model. We argue that, for usability and safety,interfaces to dialogue-based AI systems should have a parallel display based onthe state of the System Model and the User Model. Finding ways to identify,interpret, and display these two models should be a core part of interfaceresearch for AI.", "output": "The System Model and the User Model: Exploring AI Dashboard Design."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most existing exoplanets are discovered using validation techniques ratherthan being confirmed by complementary observations. These techniques generate ascore that is typically the probability of the transit signal being anexoplanet (y(x)=exoplanet) given some information related to that signal(represented by x). Except for the validation technique in Rowe et al. (2014)that uses multiplicity information to generate these probability scores, theexisting validation techniques ignore the multiplicity boost information. Inthis work, we introduce a framework with the following premise: given anexisting transit signal vetter (classifier), improve its performance usingmultiplicity information. We apply this framework to several existingclassifiers, which include vespa (Morton et al. 2016), Robovetter (Coughlin etal. 2017), AstroNet (Shallue & Vanderburg 2018), ExoNet (Ansdel et al. 2018),GPC and RFC (Armstrong et al. 2020), and ExoMiner (Valizadegan et al. 2022), tosupport our claim that this framework is able to improve the performance of agiven classifier. We then use the proposed multiplicity boost framework forExoMiner V1.2, which addresses some of the shortcomings of the originalExoMiner classifier (Valizadegan et al. 2022), and validate 69 new exoplanetsfor systems with multiple KOIs from the Kepler catalog.", "output": "Multiplicity Boost Of Transit Signal Classifiers: Validation of 69 New Exoplanets Using The Multiplicity Boost of ExoMiner."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "International maritime crime is becoming increasingly sophisticated, oftenassociated with wider criminal networks. Detecting maritime threats by means offusing data purely related to physical movement (i.e., those generated byphysical sensors, or hard data) is not sufficient. This has led to research anddevelopment efforts aimed at combining hard data with other types of data(especially human-generated or soft data). Existing work often assumes thatinput soft data is available in a structured format, or is focused onextracting certain relevant entities or concepts to accompany or annotate harddata. Much less attention has been given to extracting the rich knowledge aboutthe situations of interest implicitly embedded in the large amount of soft dataexisting in unstructured formats (such as intelligence reports and newsarticles). In order to exploit the potentially useful and rich information fromsuch sources, it is necessary to extract not only the relevant entities andconcepts but also their semantic relations, together with the uncertaintyassociated with the extracted knowledge (i.e., in the form of probabilisticknowledge graphs). This will increase the accuracy of and confidence in, theextracted knowledge and facilitate subsequent reasoning and learning. To thisend, we propose Maritime DeepDive, an initial prototype for the automatedconstruction of probabilistic knowledge graphs from natural language data forthe maritime domain. In this paper, we report on the current implementation ofMaritime DeepDive, together with preliminary results on extractingprobabilistic events from maritime piracy incidents. This pipeline wasevaluated on a manually crafted gold standard, yielding promising results.", "output": "Toward the Automated Construction of Probabilistic Knowledge Graphs for the Maritime Domain."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Breast cancer is one of the most threatening diseases in women's life; thus,the early and accurate diagnosis plays a key role in reducing the risk of deathin a patient's life. Mammography stands as the reference technique for breastcancer screening; nevertheless, many countries still lack access to mammogramsdue to economic, social, and cultural issues. Latest advances in computationaltools, infrared cameras and devices for bio-impedance quantification, havegiven a chance to emerge other reference techniques like thermography, infraredthermography, electrical impedance tomography and biomarkers found in bloodtests, therefore being faster, reliable and cheaper than other methods. In thelast two decades, the techniques mentioned above have been considered asparallel and extended approaches for breast cancer diagnosis, as well manyauthors concluded that false positives and false negatives rates aresignificantly reduced. Moreover, when a screening method works together with acomputational technique, it generates a \"computer-aided diagnosis\" system. Thepresent work aims to review the last breakthroughs about the three techniquesmentioned earlier, suggested machine learning techniques to breast cancerdiagnosis, thus, describing the benefits of some methods in relation with otherones, such as, logistic regression, decision trees, random forest, deep andconvolutional neural networks. With this, we studied several hyperparametersoptimization approaches with parzen tree optimizers to improve the performanceof baseline models. An exploratory data analysis for each database and abenchmark of convolutional neural networks for the database of thermal imagesare presented. The benchmark process, reviews image classification techniqueswith convolutional neural networks, like, Resnet50, NasNetmobile,InceptionResnet and Xception.", "output": "Breast Cancer Diagnosis Using Machine Learning Techniques."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In face of the pressing need of decarbonization in the power sector, there-design of electricity market is necessary as a Marco-level approach toaccommodate the high penetration of renewable generations, and to achieve powersystem operation security, economic efficiency, and environmental friendliness.However, existing market design methodologies suffer from the lack ofcoordination among energy spot market (ESM), ancillary service market (ASM) andfinancial market (FM), i.e., the \"joint market\", and the lack of reliablesimulation-based verification. To tackle these deficiencies, this two-partpaper develops a paradigmatic theory and detailed methods of the joint marketdesign using reinforcement-learning (RL)-based simulation. In Part 1, thetheory and framework of this novel market design philosophy are proposed.First, the controversial market design options while designing the joint marketare summarized as the targeted research questions. Second, the Markov gamemodel is developed to describe the bidding game in the joint market,incorporating the market design options to be determined. Third, a framework ofdeploying multiple types of RL algorithms to simulate the market model isdeveloped. Finally, several market operation performance indicators areproposed to validate the market design based on the simulation results.", "output": "How to Use Reinforcement Learning to Facilitate Future Electricity Market Design? Part 1: A Paradigmatic Theory."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent years have witnessed the proliferation of traffic accidents, which ledwide researches on Automated Vehicle (AV) technologies to reduce vehicleaccidents, especially on risk assessment framework of AV technologies. However,existing time-based frameworks can not handle complex traffic scenarios andignore the motion tendency influence of each moving objects on the riskdistribution, leading to performance degradation. To address this problem, wenovelly propose a comprehensive driving risk management framework named RCP-RFbased on potential field theory under Connected and Automated Vehicles (CAV)environment, where the pedestrian risk metric are combined into a unifiedroad-vehicle driving risk management framework. Different from existingalgorithms, the motion tendency between ego and obstacle cars and thepedestrian factor are legitimately considered in the proposed framework, whichcan improve the performance of the driving risk model. Moreover, it requiresonly O(N 2) of time complexity in the proposed method. Empirical studiesvalidate the superiority of our proposed framework against state-of-the-artmethods on real-world dataset NGSIM and real AV platform.", "output": "RCP-RF: A Comprehensive Road-car-pedestrian Risk Management Framework based on Driving Risk Potential Field."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Combining Graph neural networks (GNNs) with contrastive learning for anomalydetection has drawn rising attention recently. Existing graph contrastiveanomaly detection (GCAD) methods have primarily focused on improving detectioncapability through graph augmentation and multi-scale contrast modules.However, the underlying mechanisms of how these modules work have not beenfully explored. We dive into the multi-scale and graph augmentation mechanismand observed that multi-scale contrast modules do not enhance the expression,while the multi-GNN modules are the hidden contributors. Previous studies havetended to attribute the benefits brought by multi-GNN to the multi-scalemodules. In the paper, we delve into the misconception and propose Multi-GNNand Augmented Graph contrastive framework MAG, which unified the existing GCADmethods in the contrastive self-supervised perspective. We extracted twovariants from the MAG framework, L-MAG and M-MAG. The L-MAG is the lightweightinstance of the MAG, which outperform the state-of-the-art on Cora and Pubmedwith the low computational cost. The variant M-MAG equipped with multi-GNNmodules further improve the detection performance. Our study sheds light on thedrawback of the existing GCAD methods and demonstrates the potential ofmulti-GNN and graph augmentation modules. Our code is available at", "output": "Revisiting Graph Contrastive Learning for Anomaly Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "AI tasks encompass a wide range of domains and fields. While numerous AImodels have been designed for specific tasks and applications, they oftenrequire considerable human efforts in finding the right model architecture,optimization algorithm, and hyperparameters. Recent advances in large languagemodels (LLMs) like ChatGPT show remarkable capabilities in various aspects ofreasoning, comprehension, and interaction. Consequently, we propose developingtask-oriented prompts and automatically utilizing LLMs to automate the trainingpipeline. To implement this concept, we present the AutoML-GPT, which employsGPT as the bridge to diverse AI models and dynamically trains models withoptimized hyperparameters. AutoML-GPT dynamically takes user requests from themodel and data cards and composes the corresponding prompt paragraph.Ultimately, with this prompt paragraph, AutoML-GPT will automatically conductthe experiments from data processing to model architecture, hyperparametertuning, and predicted training log. By leveraging {ours}'s robust languagecapabilities and the available AI models, AutoML-GPT can tackle numerousintricate AI tasks across various tasks and datasets. This approach achievesremarkable results in computer vision, natural language processing, and otherchallenging areas. Extensive experiments and ablation studies demonstrate thatour method can be general, effective, and beneficial for many AI tasks.", "output": "AutoML-GPT: Automatic Machine Learning with GPT."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Residual networks have shown great success and become indispensable in recentdeep neural network models. In this work, we aim to re-investigate the trainingprocess of residual networks from a novel social psychology perspective ofloafing, and further propose a new training scheme as well as three improvedstrategies for boosting residual networks beyond their performance limits.Previous research has suggested that residual networks can be considered asensembles of shallow networks, which implies that the final performance of aresidual network is influenced by a group of subnetworks. We identify apreviously overlooked problem that is analogous to social loafing, wheresubnetworks within a residual network are prone to exert less effort whenworking as part of a group compared to working alone. We define this problem astextit{network loafing}. Similar to the decreased individual productivity andoverall performance as demonstrated in society, network loafing inevitablycauses sub-par performance. Inspired by solutions from social psychology, wefirst propose a novel training scheme called stimulative training, whichrandomly samples a residual subnetwork and calculates the KL divergence lossbetween the sampled subnetwork and the given residual network for extrasupervision. In order to unleash the potential of stimulative training, wefurther propose three simple-yet-effective strategies, including a novel KL-loss that only aligns the network logits direction, random smaller inputs forsubnetworks, and inter-stage sampling rules. Comprehensive experiments andanalysis verify the effectiveness of stimulative training as well as its threeimproved strategies.", "output": "Stimulative Training++: Go Beyond The Performance Limits of Residual Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We investigate an infinite-horizon average reward Markov Decision Process(MDP) with delayed, composite, and partially anonymous reward feedback. Thedelay and compositeness of rewards mean that rewards generated as a result oftaking an action at a given state are fragmented into different components, andthey are sequentially realized at delayed time instances. The partial anonymityattribute implies that a learner, for each state, only observes the aggregateof past reward components generated as a result of different actions taken atthat state, but realized at the observation instance. We propose an algorithmnamed $mathrm{DUCRL2}$ to obtain a near-optimal policy for this setting andshow that it achieves a regret bound of $tilde{mathcal{O}}left(DSsqrt{AT} +d (SA)^3right)$ where $S$ and $A$ are the sizes of the state and actionspaces, respectively, $D$ is the diameter of the MDP, $d$ is a parameter upperbounded by the maximum reward delay, and $T$ denotes the time horizon. Thisdemonstrates the optimality of the bound in the order of $T$, and an additiveimpact of the delay.", "output": "Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language has a strong influence on our perceptions of time and rewards. Thisraises the question of whether large language models, when asked in differentlanguages, show different preferences for rewards over time and if theirchoices are similar to those of humans. In this study, we analyze the responsesof GPT-3.5 (hereafter referred to as GPT) to prompts in multiple languages,exploring preferences between smaller, sooner rewards and larger, laterrewards. Our results show that GPT displays greater patience when prompted inlanguages with weak future tense references (FTR), such as German and Mandarin,compared to languages with strong FTR, like English and French. These findingsare consistent with existing literature and suggest a correlation between GPT'schoices and the preferences of speakers of these languages. However, furtheranalysis reveals that the preference for earlier or later rewards does notsystematically change with reward gaps, indicating a lexicographic preferencefor earlier payments. While GPT may capture intriguing variations acrosslanguages, our findings indicate that the choices made by these models do notcorrespond to those of human decision-makers.", "output": "Language, Time Preferences, and Consumer Behavior: Evidence from Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the many use cases for large language models (LLMs) in the design ofchatbots in various industries and the research showing the importance ofpersonalizing chatbots to cater to different personality traits, little workhas been done to evaluate whether the behaviors of personalized LLMs canreflect certain personality traits accurately and consistently. We considerstudying the behavior of LLM-based simulated agents which refer to as LLMpersonas and present a case study with GPT-3.5 (text-davinci-003) toinvestigate whether LLMs can generate content with consistent, personalizedtraits when assigned Big Five personality types and gender roles. We created320 LLM personas (5 females and 5 males for each of the 32 Big Five personalitytypes) and prompted them to complete the classic 44-item Big Five Inventory(BFI) and then write an 800-word story about their childhood. Results showedthat LLM personas' self-reported BFI scores are consistent with their assignedpersonality types, with large effect sizes found on all five traits. Moreover,significant correlations were found between assigned personality types and someLinguistic Inquiry and Word Count (LIWC) psycholinguistic features of theirwritings. For instance, extroversion is associated with pro-social and activewords, and neuroticism is associated with words related to negative emotionsand mental health. Besides, we only found significant differences in usingtechnological and cultural words in writing between LLM-generated female andmale personas. This work provides a first step for further research onpersonalized LLMs and their applications in Human-AI conversation.", "output": "PersonaLLM: Investigating the Ability of GPT-3.5 to Express Personality Traits and Gender Differences."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With various AI tools such as ChatGPT becoming increasingly popular, we areentering a true AI era. We can foresee that exceptional AI tools will soon reapconsiderable profits. A crucial question arise: should AI tools share revenuewith their training data providers in additional to traditional stakeholdersand shareholders? The answer is Yes. Large AI tools, such as large languagemodels, always require more and better quality data to continuously improve,but current copyright laws limit their access to various types of data. Sharingrevenue between AI tools and their data providers could transform the currenthostile zero-sum game relationship between AI tools and a majority ofcopyrighted data owners into a collaborative and mutually beneficial one, whichis necessary to facilitate the development of a virtuous cycle among AI tools,their users and data providers that drives forward AI technology and builds ahealthy AI ecosystem. However, current revenue-sharing business models do notwork for AI tools in the forthcoming AI era, since the most widely used metricsfor website-based traffic and action, such as clicks, will be replaced by newmetrics such as prompts and cost per prompt for generative AI tools. Acompletely new revenue-sharing business model, which must be almost independentof AI tools and be easily explained to data providers, needs to establish aprompt-based scoring system to measure data engagement of each data provider.This paper systematically discusses how to build such a scoring system for alldata providers for AI tools based on classification and content similaritymodels, and outlines the requirements for AI tools or third parties to buildit. Sharing revenue with data providers using such a scoring system wouldencourage more data owners to participate in the revenue-sharing program. Thiswill be a utilitarian AI era where all parties benefit.", "output": "Should ChatGPT and Bard Share Revenue with Their Data Providers? A New Business Model for the AI Era."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Keyphrase prediction aims to generate phrases (keyphrases) that highlysummarizes a given document. Recently, researchers have conducted in-depthstudies on this task from various perspectives. In this paper, wecomprehensively summarize representative studies from the perspectives ofdominant models, datasets and evaluation metrics. Our work analyzes up to 167previous works, achieving greater coverage of this task than previous surveys.Particularly, we focus highly on deep learning-based keyphrase prediction,which attracts increasing attention of this task in recent years. Afterwards,we conduct several groups of experiments to carefully compare representativemodels. To the best of our knowledge, our work is the first attempt to comparethese models using the identical commonly-used datasets and evaluation metric,facilitating in-depth analyses of their disadvantages and advantages. Finally,we discuss the possible research directions of this task in the future.", "output": "From Statistical Methods to Deep Learning, Automatic Keyphrase Prediction: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Bayesian optimization (BO) is a powerful tool for seeking the global optimumof black-box functions. While evaluations of the black-box functions can behighly costly, it is desirable to reduce the use of expensive labeled data. Forthe first time, we introduce a teacher-student model to exploit semi-supervisedlearning that can make use of large amounts of unlabelled data under thecontext of BO. Importantly, we show that the selection of the validation andunlabeled data is key to the performance of BO. To optimize the sampling ofunlabeled data, we employ a black-box parameterized sampling distributionoptimized as part of the employed bi-level optimization framework. Taking onestep further, we demonstrate that the performance of BO can be further improvedby selecting unlabeled data from a dynamically fitted extreme valuedistribution. Our BO method operates in a learned latent space with reduceddimensionality, making it scalable to high-dimensional problems. The proposedapproach outperforms significantly the existing BO methods on several syntheticand real-world optimization tasks.", "output": "High-dimensional Bayesian Optimization via Semi-supervised Learning with Optimized Unlabeled Data Sampling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The affective reasoning task is a set of emerging affect-based tasks inconversation, including Emotion Recognition in Conversation (ERC),Emotion-CausePair Extraction (ECPE), and Emotion-Cause Span Recognition (ECSR). Existingmethods make various assumptions on the apparent relationship while neglectingthe essential causal model due to the nonuniqueness of skeletons andunobservability of implicit causes. This paper settled down the above twoproblems and further proposed Conversational Affective Causal Discovery (CACD).It is a novel causal discovery method showing how to discover causalrelationships in a conversation via designing a common skeleton and generatinga substitute for implicit causes. CACD contains two steps: (i) building acommon centering one graph node causal skeleton for all utterances invariable-length conversations; (ii) Causal Auto-Encoder (CAE) correcting theskeleton to yield causal representation through generated implicit causes andknown explicit causes. Comprehensive experiments demonstrate that our novelmethod significantly outperforms the SOTA baselines in six affect-relateddatasets on the three tasks.", "output": "Affective Reasoning at Utterance Level in Conversations: A Causal Discovery Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As the popularity of large language models (LLMs) soars across variousapplications, ensuring their alignment with human values has become a paramountconcern. In particular, given that LLMs have great potential to serve asgeneral-purpose AI assistants in daily life, their subtly unethical suggestionsbecome a serious and real concern. Tackling the challenge of automaticallytesting and repairing unethical suggestions is thus demanding.This paper introduces the first framework for testing and repairing unethicalsuggestions made by LLMs. We first propose ETHICSSUITE, a test suite thatpresents complex, contextualized, and realistic moral scenarios to test LLMs.We then propose a novel suggest-critic-reflect (SCR) process, serving as anautomated test oracle to detect unethical suggestions. We recast deciding ifLLMs yield unethical suggestions (a hard problem; often requiring humanexpertise and costly to decide) into a PCR task that can be automaticallychecked for violation. Moreover, we propose a novel on-the-fly (OTF) repairingscheme that repairs unethical suggestions made by LLMs in real-time. The OTFscheme is applicable to LLMs in a black-box API setting with moderate cost.With ETHICSSUITE, our study on seven popular LLMs (e.g., ChatGPT, GPT-4)uncovers in total 109,824 unethical suggestions. We apply our OTF scheme on twoLLMs (Llama-13B and ChatGPT), which generates valid repair to a considerableamount of unethical ones, paving the way for more ethically conscious LLMs.", "output": "\"Oops, Did I Just Say That?\" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Artificial neural networks (ANNs) are increasingly used as research models,but questions remain about their generalizability and representationalinvariance. Biological neural networks under social constraints evolved toenable communicable representations, demonstrating generalization capabilities.This study proposes a communication protocol between cooperative agents toanalyze the formation of individual and shared abstractions and their impact ontask performance. This communication protocol aims to mimic language featuresby encoding high-dimensional information through low-dimensionalrepresentation. Using grid-world mazes and reinforcement learning, teacher ANNspass a compressed message to a student ANN for better task completion. Throughthis, the student achieves a higher goal-finding rate and generalizes the goallocation across task worlds. Further optimizing message content to maximizestudent reward improves information encoding, suggesting that an accuraterepresentation in the space of messages requires bi-directional input. Thishighlights the role of language as a common representation between agents andits implications on generalization capabilities.", "output": "A framework for the emergence and analysis of language in social learning agents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As pointed out by several scholars, current research on hate speech (HS)recognition is characterized by unsystematic data creation strategies anddiverging annotation schemata. Subsequently, supervised-learning models tend togeneralize poorly to datasets they were not trained on, and the performance ofthe models trained on datasets labeled using different HS taxonomies cannot becompared. To ease this problem, we propose applying extremely weak supervisionthat only relies on the class name rather than on class samples from theannotated data. We demonstrate the effectiveness of a state-of-the-artweakly-supervised text classification model in various in-dataset andcross-dataset settings. Furthermore, we conduct an in-depth quantitative andqualitative analysis of the source of poor generalizability of HSclassification models.", "output": "Towards Weakly-Supervised Hate Speech Classification Across Datasets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In Causal Discovery with latent variables, We define two data paradigms:definite data: a single-skeleton structure with observed nodes single-value,and indefinite data: a set of multi-skeleton structures with observed nodesmulti-value. Multi,skeletons induce low sample utilization and multi valuesinduce incapability of the distribution assumption, both leading thatrecovering causal relations from indefinite data is, as of yet, largelyunexplored. We design the causal strength variational model to settle downthese two problems. Specifically, we leverage the causal strength instead ofindependent noise as latent variable to mediate evidence lower bound. By thisdesign ethos, The causal strength of different skeletons is regarded as adistribution and can be expressed as a single-valued causal graph matrix.Moreover, considering the latent confounders, we disentangle the causal graph Ginto two relatisubgraphs O and C. O contains pure relations between observednodes, while C represents the relations from latent variables to observednodes. We summarize the above designs as Confounding Disentanglement CausalDiscovery (biCD), which is tailored to learn causal representation fromindefinite data under the latent confounding. Finally, we conduct comprehensiveexperiments on synthetic and real-world data to demonstrate the effectivenessof our method.", "output": "Learning to Recover Causal Relationship from Indefinite Data in the Presence of Latent Confounders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multilingual Machine Translation promises to improve translation qualitybetween non-English languages. This is advantageous for several reasons, namelylower latency (no need to translate twice), and reduced error cascades (e.g.,avoiding losing gender and formality information when translating throughEnglish). On the downside, adding more languages reduces model capacity perlanguage, which is usually countered by increasing the overall model size,making training harder and inference slower. In this work, we introduceLanguage-Specific Transformer Layers (LSLs), which allow us to increase modelcapacity, while keeping the amount of computation and the number of parametersused in the forward pass constant. The key idea is to have some layers of theencoder be source or target language-specific, while keeping the remaininglayers shared. We study the best way to place these layers using a neuralarchitecture search inspired approach, and achieve an improvement of 1.3 chrF(1.5 spBLEU) points over not using LSLs on a separate decoder architecture, and1.9 chrF (2.2 spBLEU) on a shared decoder one.", "output": "Learning Language-Specific Layers for Multilingual Machine Translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "DECICE is a Horizon Europe project that is developing an AI-enabled open andportable management framework for automatic and adaptive optimization anddeployment of applications in computing continuum encompassing from IoT sensorson the Edge to large-scale Cloud / HPC computing infrastructures. In thispaper, we describe the DECICE framework and architecture. Furthermore, wehighlight use-cases for framework evaluation: intelligent traffic intersection,magnetic resonance imaging, and emergency response.", "output": "DECICE: Device-Edge-Cloud Intelligent Collaboration Framework."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reinforcement learning has been applied to train the dialog systems in manyworks. Previous approaches divide the dialog system into multiple modulesincluding DST (dialog state tracking) and DP (dialog policy), and train thesemodules simultaneously. However, different modules influence each other duringtraining. The errors from DST might misguide the dialog policy, and the systemaction brings extra difficulties for the DST module. To alleviate this problem,we propose Asynchronous Updating Reinforcement Learning framework (AURL) thatupdates the DST module and the DP module asynchronously under a cooperativesetting. Furthermore, curriculum learning is implemented to address the problemof unbalanced data distribution during reinforcement learning sampling, andmultiple user models are introduced to increase the dialog diversity. Resultson the public SSD-PHONE dataset show that our method achieves a compellingresult with a 31.37% improvement on the dialog success rate. The code ispublicly available via ", "output": "An Asynchronous Updating Reinforcement Learning Framework for Task-oriented Dialog System."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated Learning (FL) enables training ML models on edge clients withoutsharing data. However, the federated model's performance on local data varies,disincentivising the participation of clients who benefit little from FL. FairFL reduces accuracy disparity by focusing on clients with higher losses whilepersonalisation locally fine-tunes the model. Personalisation provides aparticipation incentive when an FL model underperforms relative to one trainedlocally. For situations where the federated model provides a lower accuracythan a model trained entirely locally by a client, personalisation improves theaccuracy of the pre-trained federated weights to be similar to or exceed thoseof the local client model. This paper evaluates two Fair FL (FFL) algorithms asstarting points for personalisation. Our results show that FFL provides nobenefit to relative performance in a language task and may double the number ofunderperforming clients for an image task. Instead, we proposePersonalisation-aware Federated Learning (PaFL) as a paradigm thatpre-emptively uses personalisation losses during training. Our technique showsa 50% reduction in the number of underperforming clients for the language taskwhile lowering the number of underperforming clients in the image task insteadof doubling it. Thus, evidence indicates that it may allow a broader set ofdevices to benefit from FL and represents a promising avenue for futureexperimentation and theoretical analysis.", "output": "Can Fair Federated Learning reduce the need for Personalisation?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One of the major challenges we face with ethical AI today is developingcomputational systems whose reasoning and behaviour are provably aligned withhuman values. Human values, however, are notorious for being ambiguous,contradictory and ever-changing. In order to bridge this gap, and get us closerto the situation where we can formally reason about implementing values intoAI, this paper presents a formal representation of values, grounded in thesocial sciences. We use this formal representation to articulate the keychallenges for achieving value-aligned behaviour in multiagent systems (MAS)and a research roadmap for addressing them.", "output": "Human Values in Multiagent Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dialogue Topic Segmentation (DTS) plays an essential role in a variety ofdialogue modeling tasks. Previous DTS methods either focus on semanticsimilarity or dialogue coherence to assess topic similarity for unsuperviseddialogue segmentation. However, the topic similarity cannot be fully identifiedvia semantic similarity or dialogue coherence. In addition, the unlabeleddialogue data, which contains useful clues of utterance relationships, remainsunderexploited. In this paper, we propose a novel unsupervised DTS framework,which learns topic-aware utterance representations from unlabeled dialogue datathrough neighboring utterance matching and pseudo-segmentation. Extensiveexperiments on two benchmark datasets (i.e., DialSeg711 and Doc2Dial)demonstrate that our method significantly outperforms the strong baselinemethods. For reproducibility, we provide our code and dataat:", "output": "Unsupervised Dialogue Topic Segmentation with Topic-aware Utterance Representation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the diverse array of work investigating the nature of human values frompsychology, philosophy and social sciences, there is a clear consensus thatvalues guide behaviour. More recently, a recognition that values provide ameans to engineer ethical AI has emerged. Indeed, Stuart Russell proposedshifting AI's focus away from simply ``intelligence'' towards intelligence``provably aligned with human values''. This challenge -- the value alignmentproblem -- with others including an AI's learning of human values, aggregatingindividual values to groups, and designing computational mechanisms to reasonover values, has energised a sustained research effort. Despite this, noformal, computational definition of values has yet been proposed. We addressthis through a formal conceptual framework rooted in the social sciences, thatprovides a foundation for the systematic, integrated and interdisciplinaryinvestigation into how human values can support designing ethical AI.", "output": "A computational framework of human values for ethical AI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Proactive dialogue systems, related to a wide range of real-worldconversational applications, equip the conversational agent with the capabilityof leading the conversation direction towards achieving pre-defined targets orfulfilling certain goals from the system side. It is empowered by advancedtechniques to progress to more complicated tasks that require strategical andmotivational interactions. In this survey, we provide a comprehensive overviewof the prominent problems and advanced designs for conversational agent'sproactivity in different types of dialogues. Furthermore, we discuss challengesthat meet the real-world application needs but require a greater research focusin the future. We hope that this first survey of proactive dialogue systems canprovide the community with a quick access and an overall picture to thispractical problem, and stimulate more progresses on conversational AI to thenext level.", "output": "A Survey on Proactive Dialogue Systems: Problems, Methods, and Prospects."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent studies show that graph neural networks (GNNs) are prevalent to modelhigh-order relationships for collaborative filtering (CF). Towards thisresearch line, graph contrastive learning (GCL) has exhibited powerfulperformance in addressing the supervision label shortage issue by learningaugmented user and item representations. While many of them show theireffectiveness, two key questions still remain unexplored: i) Most existingGCL-based CF models are still limited by ignoring the fact that user-iteminteraction behaviors are often driven by diverse latent intent factors (e.g.,shopping for family party, preferred color or brand of products); ii) Theirintroduced non-adaptive augmentation techniques are vulnerable to noisyinformation, which raises concerns about the model's robustness and the risk ofincorporating misleading self-supervised signals. In light of theselimitations, we propose a Disentangled Contrastive Collaborative Filteringframework (DCCF) to realize intent disentanglement with self-supervisedaugmentation in an adaptive fashion. With the learned disentangledrepresentations with global context, our DCCF is able to not only distillfiner-grained latent factors from the entangled self-supervision signals butalso alleviate the augmentation-induced noise. Finally, the cross-viewcontrastive learning task is introduced to enable adaptive augmentation withour parameterized interaction mask generator. Experiments on various publicdatasets demonstrate the superiority of our method compared to existingsolutions. Our model implementation is released at the link", "output": "Disentangled Contrastive Collaborative Filtering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing neural machine translation (NMT) studies mainly focus on developingdataset-specific models based on data from different tasks (e.g., documenttranslation and chat translation). Although the dataset-specific models haveachieved impressive performance, it is cumbersome as each dataset demands amodel to be designed, trained, and stored. In this work, we aim to unify thesetranslation tasks into a more general setting. Specifically, we propose a``versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) thatworks with data from different tasks, and can translate well in multiplesettings simultaneously, and theoretically it can be as many as possible.Through unified learning, UMLNMT is able to jointly train across multipletasks, implementing intelligent on-demand translation. On seven widely-usedtranslation tasks, including sentence translation, document translation, andchat translation, our UMLNMT results in substantial improvements overdataset-specific models with significantly reduced model deployment costs.Furthermore, UMLNMT can achieve competitive or better performance thanstate-of-the-art dataset-specific methods. Human evaluation and in-depthanalysis also demonstrate the superiority of our approach on generating diverseand high-quality translations. Additionally, we provide a new genre translationdataset about famous aphorisms with 186k Chinese->English sentence pairs.", "output": "Unified Model Learning for Various Neural Machine Translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent improvement in code generation capabilities due to the use oflarge language models has mainly benefited general purpose programminglanguages. Domain specific languages, such as the ones used for IT Automation,have received far less attention, despite involving many active developers andbeing an essential component of modern cloud platforms. This work focuses onthe generation of Ansible-YAML, a widely used markup language for ITAutomation. We present Ansible Wisdom, a natural-language to Ansible-YAML codegeneration tool, aimed at improving IT automation productivity. Ansible Wisdomis a transformer-based model, extended by training with a new datasetcontaining Ansible-YAML. We also develop two novel performance metrics for YAMLand Ansible to capture the specific characteristics of this domain. Resultsshow that Ansible Wisdom can accurately generate Ansible script from naturallanguage prompts with performance comparable or better than existing state ofthe art code generation models.", "output": "Automated Code generation for Information Technology Tasks in YAML through Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "For solving combinatorial optimisation problems with metaheuristics,different search operators are applied for sampling new solutions in theneighbourhood of a given solution. It is important to understand therelationship between operators for various purposes, e.g., adaptively decidingwhen to use which operator to find optimal solutions efficiently. However, itis difficult to theoretically analyse this relationship, especially in thecomplex solution space of combinatorial optimisation problems. In this paper,we propose to empirically analyse the relationship between operators in termsof the correlation between their local optima and develop a measure forquantifying their relationship. The comprehensive analyses on a wide range ofcapacitated vehicle routing problem benchmark instances show that there is aconsistent pattern in the correlation between commonly used operators. Based onthis newly proposed local optima correlation metric, we propose a novelapproach for adaptively selecting among the operators during the searchprocess. The core intention is to improve search efficiency by preventingwasting computational resources on exploring neighbourhoods where the localoptima have already been reached. Experiments on randomly generated instancesand commonly used benchmark datasets are conducted. Results show that theproposed approach outperforms commonly used adaptive operator selectionmethods.", "output": "Local Optima Correlation Assisted Adaptive Operator Selection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Subset selection tasks, arise in recommendation systems and search enginesand ask to select a subset of items that maximize the value for the user. Thevalues of subsets often display diminishing returns, and hence, submodularfunctions have been used to model them. If the inputs defining the submodularfunction are known, then existing algorithms can be used. In many applications,however, inputs have been observed to have social biases that reduce theutility of the output subset. Hence, interventions to improve the utility aredesired. Prior works focus on maximizing linear functions -- a special case ofsubmodular functions -- and show that fairness constraint-based interventionscan not only ensure proportional representation but also achieve near-optimalutility in the presence of biases. We study the maximization of a family ofsubmodular functions that capture functions arising in the aforementionedapplications. Our first result is that, unlike linear functions,constraint-based interventions cannot guarantee any constant fraction of theoptimal utility for this family of submodular functions. Our second result isan algorithm for submodular maximization. The algorithm provably outputssubsets that have near-optimal utility for this family under mild assumptionsand that proportionally represent items from each group. In empiricalevaluation, with both synthetic and real-world data, we observe that thisalgorithm improves the utility of the output subset for this family ofsubmodular functions over baselines.", "output": "Maximizing Submodular Functions for Recommendation in the Presence of Biases."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-task learning has proven to be effective in improving the performanceof correlated tasks. Most of the existing methods use a backbone to extractinitial features with independent branches for each task, and the exchange ofinformation between the branches usually occurs through the concatenation orsum of the feature maps of the branches. However, this type of informationexchange does not directly consider the local characteristics of the image northe level of importance or correlation between the tasks. In this paper, wepropose a semantic segmentation method, MTLSegFormer, which combines multi-tasklearning and attention mechanisms. After the backbone feature extraction, twofeature maps are learned for each task. The first map is proposed to learnfeatures related to its task, while the second map is obtained by applyinglearned visual attention to locally re-weigh the feature maps of the othertasks. In this way, weights are assigned to local regions of the image of othertasks that have greater importance for the specific task. Finally, the two mapsare combined and used to solve a task. We tested the performance in twochallenging problems with correlated tasks and observed a significantimprovement in accuracy, mainly in tasks with high dependence on the others.", "output": "MTLSegFormer: Multi-task Learning with Transformers for Semantic Segmentation in Precision Agriculture."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multimodal emotion recognition identifies human emotions from various datamodalities like video, text, and audio. However, we found that this task can beeasily affected by noisy information that does not contain useful semantics. Tothis end, we present a novel paradigm that attempts to extract noise-resistantfeatures in its pipeline and introduces a noise-aware learning scheme toeffectively improve the robustness of multimodal emotion understanding. Our newpipeline, namely Noise-Resistant Multimodal Transformer (NORM-TR), mainlyintroduces a Noise-Resistant Generic Feature (NRGF) extractor and a Transformerfor the multimodal emotion recognition task. In particular, we make the NRGFextractor learn a generic and disturbance-insensitive representation so thatconsistent and meaningful semantics can be obtained. Furthermore, we apply aTransformer to incorporate Multimodal Features (MFs) of multimodal inputs basedon their relations to the NRGF. Therefore, the possible insensitive but usefulinformation of NRGF could be complemented by MFs that contain more details. Totrain the NORM-TR properly, our proposed noise-aware learning schemecomplements normal emotion recognition losses by enhancing the learning againstnoises. Our learning scheme explicitly adds noises to either all the modalitiesor a specific modality at random locations of a multimodal input sequence. Wecorrespondingly introduce two adversarial losses to encourage the NRGFextractor to learn to extract the NRGFs invariant to the added noises, thusfacilitating the NORM-TR to achieve more favorable multimodal emotionrecognition performance. In practice, on several popular multimodal datasets,our NORM-TR achieves state-of-the-art performance and outperforms existingmethods by a large margin, which demonstrates that the ability to resist noisyinformation is important for effective emotion recognition.", "output": "Noise-Resistant Multimodal Transformer for Emotion Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "When deploying artificial agents in real-world environments where theyinteract with humans, it is crucial that their behavior is aligned with thevalues, social norms or other requirements of that environment. However, manyenvironments have implicit constraints that are difficult to specify andtransfer to a learning agent. To address this challenge, we propose a novelmethod that utilizes the principle of maximum causal entropy to learnconstraints and an optimal policy that adheres to these constraints, usingdemonstrations of agents that abide by the constraints. We prove convergence ina tabular setting and provide an approximation which scales to complexenvironments. We evaluate the effectiveness of the learned policy by assessingthe reward received and the number of constraint violations, and we evaluatethe learned cost function based on its transferability to other agents. Ourmethod has been shown to outperform state-of-the-art approaches across avariety of tasks and environments, and it is able to handle problems withstochastic dynamics and a continuous state-action space.", "output": "Maximum Causal Entropy Inverse Constrained Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Domain shift is a big challenge in NLP, thus, many approaches resort tolearning domain-invariant features to mitigate the inference phase domainshift. Such methods, however, fail to leverage the domain-specific nuancesrelevant to the task at hand. To avoid such drawbacks, domain counterfactualgeneration aims to transform a text from the source domain to a given targetdomain. However, due to the limited availability of data, such frequency-basedmethods often miss and lead to some valid and spurious domain-tokenassociations. Hence, we employ a three-step domain obfuscation approach thatinvolves frequency and attention norm-based masking, to mask domain-specificcues, and unmasking to regain the domain generic context. Our experimentsempirically show that the counterfactual samples sourced from our masked textlead to improved domain transfer on 10 out of 12 domain sentimentclassification settings, with an average of 2% accuracy improvement over thestate-of-the-art for unsupervised domain adaptation (UDA). Further, our modeloutperforms the state-of-the-art by achieving 1.4% average accuracy improvementin the adversarial domain adaptation (ADA) setting. Moreover, our model alsoshows its domain adaptation efficacy on a large multi-domain intentclassification dataset where it attains state-of-the-art results. We releasethe codes publicly at url{", "output": "ReMask: A Robust Information-Masking Approach for Domain Counterfactual Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dataset bias, i.e., the over-reliance on dataset-specific literal heuristics,is getting increasing attention for its detrimental effect on thegeneralization ability of NLU models. Existing works focus on eliminatingdataset bias by down-weighting problematic data in the training process, whichinduce the omission of valid feature information while mitigating bias. In thiswork, We analyze the causes of dataset bias from the perspective of causalinference and propose CausalAPM, a generalizable literal disentanglingframework to ameliorate the bias problem from feature granularity. The proposedapproach projects literal and semantic information into independent featuresubspaces, and constrains the involvement of literal information in subsequentpredictions. Extensive experiments on three NLP benchmarks (MNLI, FEVER, andQQP) demonstrate that our proposed framework significantly improves the OODgeneralization performance while maintaining ID performance.", "output": "CausalAPM: Generalizable Literal Disentanglement for NLU Debiasing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph Transformer is gaining increasing attention in the field of machinelearning and has demonstrated state-of-the-art performance on benchmarks forgraph representation learning. However, as current implementations of GraphTransformer primarily focus on learning representations of small-scale graphs,the quadratic complexity of the global self-attention mechanism presents achallenge for full-batch training when applied to larger graphs. Additionally,conventional sampling-based methods fail to capture necessary high-levelcontextual information, resulting in a significant loss of performance. In thispaper, we introduce the Hierarchical Scalable Graph Transformer (HSGT) as asolution to these challenges. HSGT successfully scales the Transformerarchitecture to node representation learning tasks on large-scale graphs, whilemaintaining high performance. By utilizing graph hierarchies constructedthrough coarsening techniques, HSGT efficiently updates and stores multi-scaleinformation in node embeddings at different levels. Together withsampling-based training methods, HSGT effectively captures and aggregatesmulti-level information on the hierarchical graph using only Transformerblocks. Empirical evaluations demonstrate that HSGT achieves state-of-the-artperformance on large-scale benchmarks with graphs containing millions of nodeswith high efficiency.", "output": "Hierarchical Transformer for Scalable Graph Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This study aimed to explore the experiences, perceptions, knowledge,concerns, and intentions of Gen Z students with Gen X and Gen Y teachersregarding the use of generative AI (GenAI) in higher education. A sample ofstudents and teachers were recruited to investigate the above using a surveyconsisting of both open and closed questions. The findings showed that Gen Zparticipants were generally optimistic about the potential benefits of GenAI,including enhanced productivity, efficiency, and personalized learning, andexpressed intentions to use GenAI for various educational purposes. Gen X andGen Y teachers acknowledged the potential benefits of GenAI but expressedheightened concerns about overreliance, ethical and pedagogical implications,emphasizing the need for proper guidelines and policies to ensure responsibleuse of the technology. The study highlighted the importance of combiningtechnology with traditional teaching methods to provide a more effectivelearning experience. Implications of the findings include the need to developevidence-based guidelines and policies for GenAI integration, foster criticalthinking and digital literacy skills among students, and promote responsibleuse of GenAI technologies in higher education.", "output": "The AI generation gap: Are Gen Z students more interested in adopting generative AI such as ChatGPT in teaching and learning than their Gen X and Millennial Generation teachers?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data augmentation is a widely used technique for improving model performancein machine learning, particularly in computer vision and natural languageprocessing. Recently, there has been increasing interest in applyingaugmentation techniques to reinforcement learning (RL) problems, with a focuson image-based augmentation. In this paper, we explore a set of genericwrappers designed to augment RL environments with noise and encourage agentexploration and improve training data diversity which are applicable to a broadspectrum of RL algorithms and environments. Specifically, we concentrate onaugmentations concerning states, rewards, and transition dynamics and introducetwo novel augmentation techniques. In addition, we introduce a noise ratehyperparameter for control over the frequency of noise injection. We presentexperimental results on the impact of these wrappers on return using threepopular RL algorithms, Soft Actor-Critic (SAC), Twin Delayed DDPG (TD3), andProximal Policy Optimization (PPO), across five MuJoCo environments. To supportthe choice of augmentation technique in practice, we also present analysis thatexplores the performance these techniques across environments. Lastly, wepublish the wrappers in our noisyenv repository for use with gym environments.", "output": "Simple Noisy Environment Augmentation for Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Emergent chain-of-thought (CoT) reasoning capabilities promise to improveperformance and explainability of large language models (LLMs). However,uncertainties remain about how prompting strategies formulated for previousmodel generations generalize to new model generations and different datasets.In this small-scale study we compare the performance of a range of zero-shotprompts for inducing CoT reasoning across six recently released LLMs(davinci-002, davinci-003, GPT-3.5-turbo, GPT-4, Flan-T5-xxl and Coherecommand-xlarge) on a mixture of six question-answering datasets, includingdatasets from scientific and medical domains. We find that a CoT prompt thatwas previously discovered through automated prompt discovery shows robustperformance across experimental conditions and produces best results whenapplied to the state-of-the-art model GPT-4.", "output": "An automatically discovered chain-of-thought prompt generalizes to novel models and datasets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph neural networks (GNNs) have achieved remarkable success in variousreal-world applications. However, recent studies highlight the vulnerability ofGNNs to malicious perturbations. Previous adversaries primarily focus on graphmodifications or node injections to existing graphs, yielding promising resultsbut with notable limitations. Graph modification attack~(GMA) requiresmanipulation of the original graph, which is often impractical, while graphinjection attack~(GIA) necessitates training a surrogate model in the black-boxsetting, leading to significant performance degradation due to divergencebetween the surrogate architecture and the actual victim model. Furthermore,most methods concentrate on a single attack goal and lack a generalizableadversary to develop distinct attack strategies for diverse goals, thuslimiting precise control over victim model behavior in real-world scenarios. Toaddress these issues, we present a gradient-free generalizable adversary thatinjects a single malicious node to manipulate the classification result of atarget node in the black-box evasion setting. We propose Gradient-freeGeneralizable Single Node Injection Attack, namely G$^2$-SNIA, a reinforcementlearning framework employing Proximal Policy Optimization. By directly queryingthe victim model, G$^2$-SNIA learns patterns from exploration to achievediverse attack goals with extremely limited attack budgets. Throughcomprehensive experiments over three acknowledged benchmark datasets and fourprominent GNNs in the most challenging and realistic scenario, we demonstratethe superior performance of our proposed G$^2$-SNIA over the existingstate-of-the-art baselines. Moreover, by comparing G$^2$-SNIA with multiplewhite-box evasion baselines, we confirm its capacity to generate solutionscomparable to those of the best adversaries.", "output": "Single Node Injection Label Specificity Attack on Graph Neural Networks via Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Obtaining high-quality data for collaborative training of machine learningmodels can be a challenging task due to A) the regulatory concerns and B) lackof incentive to participate. The first issue can be addressed through the useof privacy enhancing technologies (PET), one of the most frequently used onebeing differentially private (DP) training. The second challenge can beaddressed by identifying which data points can be beneficial for model trainingand rewarding data owners for sharing this data. However, DP in deep learningtypically adversely affects atypical (often informative) data samples, makingit difficult to assess the usefulness of individual contributions. In this workwe investigate how to leverage gradient information to identify trainingsamples of interest in private training settings. We show that there existtechniques which are able to provide the clients with the tools for principleddata selection even in strictest privacy settings.", "output": "Leveraging gradient-derived metrics for data selection and valuation in differentially private training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recommender system or crowdsourcing applications of online learning, ahuman's preferences or abilities are often a function of the algorithm's recentactions. Motivated by this, a significant line of work has formalized settingswhere an action's loss is a function of the number of times that action wasrecently played in the prior $m$ timesteps, where $m$ corresponds to a bound onhuman memory capacity. To more faithfully capture decay of human memory withtime, we introduce the Weighted Tallying Bandit (WTB), which generalizes thissetting by requiring that an action's loss is a function of a emph{weighted}summation of the number of times that arm was played in the last $m$ timesteps.This WTB setting is intractable without further assumption. So we study itunder Repeated Exposure Optimality (REO), a condition motivated by theliterature on human physiology, which requires the existence of an action thatwhen repetitively played will eventually yield smaller loss than any othersequence of actions. We study the minimization of the complete policy regret(CPR), which is the strongest notion of regret, in WTB under REO. Since $m$ istypically unknown, we assume we only have access to an upper bound $M$ on $m$.We show that for problems with $K$ actions and horizon $T$, a simplemodification of the successive elimination algorithm has $O left( sqrt{KT} +(m+M)K right)$ CPR. Interestingly, upto an additive (in lieu ofmutliplicative) factor in $(m+M)K$, this recovers the classical guarantee forthe simpler stochastic multi-armed bandit with traditional regret. Weadditionally show that in our setting, any algorithm will suffer additive CPRof $Omega left( mK + M right)$, demonstrating our result is nearly optimal.Our algorithm is computationally efficient, and we experimentally demonstrateits practicality and superiority over natural baselines.", "output": "Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many machine learning (ML) libraries are accessible online for MLpractitioners. Typical ML pipelines are complex and consist of a series ofsteps, each of them invoking several ML libraries. In this demo paper, wepresent ExeKGLib, a Python library that allows users with coding skills andminimal ML knowledge to build ML pipelines. ExeKGLib relies on knowledge graphsto improve the transparency and reusability of the built ML workflows, and toensure that they are executable. We demonstrate the usage of ExeKGLib andcompare it with conventional ML code to show its benefits.", "output": "ExeKGLib: Knowledge Graphs-Empowered Machine Learning Analytics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce Masked Trajectory Models (MTM) as a generic abstraction forsequential decision making. MTM takes a trajectory, such as a state-actionsequence, and aims to reconstruct the trajectory conditioned on random subsetsof the same trajectory. By training with a highly randomized masking pattern,MTM learns versatile networks that can take on different roles or capabilities,by simply choosing appropriate masks at inference time. For example, the sameMTM network can be used as a forward dynamics model, inverse dynamics model, oreven an offline RL agent. Through extensive experiments in several continuouscontrol tasks, we show that the same MTM network -- i.e. same weights -- canmatch or outperform specialized networks trained for the aforementionedcapabilities. Additionally, we find that state representations learned by MTMcan significantly accelerate the learning speed of traditional RL algorithms.Finally, in offline RL benchmarks, we find that MTM is competitive withspecialized offline RL algorithms, despite MTM being a generic self-supervisedlearning method without any explicit RL components. Code is available at", "output": "Masked Trajectory Models for Prediction, Representation, and Control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper describes the results of SemEval 2023 task 7 -- Multi-EvidenceNatural Language Inference for Clinical Trial Data (NLI4CT) -- consisting of 2tasks, a Natural Language Inference (NLI) task, and an evidence selection taskon clinical trial data. The proposed challenges require multi-hop biomedicaland numerical reasoning, which are of significant importance to the developmentof systems capable of large-scale interpretation and retrieval of medicalevidence, to provide personalized evidence-based care.Task 1, the entailment task, received 643 submissions from 40 participants,and Task 2, the evidence selection task, received 364 submissions from 23participants. The tasks are challenging, with the majority of submitted systemsfailing to significantly outperform the majority class baseline on theentailment task, and we observe significantly better performance on theevidence selection task than on the entailment task. Increasing the number ofmodel parameters leads to a direct increase in performance, far moresignificant than the effect of biomedical pre-training. Future works couldexplore the limitations of large models for generalization and numericalinference, and investigate methods to augment clinical datasets to allow formore rigorous testing and to facilitate fine-tuning.We envisage that the dataset, models, and results of this task will be usefulto the biomedical NLI and evidence retrieval communities. The dataset,competition leaderboard, and website are publicly available.", "output": "SemEval-2023 Task 7: Multi-Evidence Natural Language Inference for Clinical Trial Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Understanding the performance of machine learning (ML) models across diversedata distributions is critically important for reliable applications. Despiterecent empirical studies positing a near-perfect linear correlation betweenin-distribution (ID) and out-of-distribution (OOD) accuracies, we empiricallydemonstrate that this correlation is more nuanced under subpopulation shifts.Through rigorous experimentation and analysis across a variety of datasets,models, and training epochs, we demonstrate that OOD performance often has anonlinear correlation with ID performance in subpopulation shifts. Ourfindings, which contrast previous studies that have posited a linearcorrelation in model performance during distribution shifts, reveal a \"moonshape\" correlation (parabolic uptrend curve) between the test performance onthe majority subpopulation and the minority subpopulation. This non-trivialnonlinear correlation holds across model architectures, hyperparameters,training durations, and the imbalance between subpopulations. Furthermore, wefound that the nonlinearity of this \"moon shape\" is causally influenced by thedegree of spurious correlations in the training data. Our controlledexperiments show that stronger spurious correlation in the training datacreates more nonlinear performance correlation. We provide complementaryexperimental and theoretical analyses for this phenomenon, and discuss itsimplications for ML reliability and fairness. Our work highlights theimportance of understanding the nonlinear effects of model improvement onperformance in different subpopulations, and has the potential to inform thedevelopment of more equitable and responsible machine learning models.", "output": "On the nonlinear correlation of ML performance between data subpopulations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Tabular data is one of the most commonly used types of data in machinelearning. Despite recent advances in neural nets (NNs) for tabular data, thereis still an active discussion on whether or not NNs generally outperformgradient-boosted decision trees (GBDTs) on tabular data, with several recentworks arguing either that GBDTs consistently outperform NNs on tabular data, orvice versa. In this work, we take a step back and ask, 'does it matter?' Weconduct the largest tabular data analysis to date, by comparing 19 algorithmsacross 176 datasets, and we find that the 'NN vs. GBDT' debate isoveremphasized: for a surprisingly high number of datasets, either theperformance difference between GBDTs and NNs is negligible, or lighthyperparameter tuning on a GBDT is more important than selecting the bestalgorithm. Next, we analyze 965 metafeatures to determine what properties of adataset make NNs or GBDTs better-suited to perform well. For example, we findthat GBDTs are much better than NNs at handling skewed feature distributions,heavy-tailed feature distributions, and other forms of dataset irregularities.Our insights act as a guide for practitioners to decide whether or not theyneed to run a neural net to reach top performance on their dataset. Ourcodebase and all raw results are available at", "output": "When Do Neural Nets Outperform Boosted Trees on Tabular Data?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite Convolutional Neural Networks having reached human-level performancein some medical tasks, their clinical use has been hindered by their lack ofinterpretability. Two major interpretability strategies have been proposed totackle this problem: post-hoc methods and intrinsic methods. Although there areseveral post-hoc methods to interpret DL models, there is significant variationbetween the explanations provided by each method, and it a difficult tovalidate them due to the lack of ground-truth. To address this challenge, weadapted the intrinsical interpretable ProtoPNet for the context ofhistopathology imaging and compared the attribution maps produced by it and thesaliency maps made by post-hoc methods. To evaluate the similarity betweensaliency map methods and attribution maps we adapted 10 saliency metrics fromthe saliency model literature, and used the breast cancer metastases detectiondataset PatchCamelyon with 327,680 patches of histopathological images ofsentinel lymph node sections to validate the proposed approach. Overall,SmoothGrad and Occlusion were found to have a statistically bigger overlap withProtoPNet while Deconvolution and Lime have been found to have the least.", "output": "Evaluating Post-hoc Interpretability with Intrinsic Interpretability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The study of code example recommendation has been conducted extensively inthe past and recently in order to assist developers in their softwaredevelopment tasks. This is because developers often spend significant timesearching for relevant code examples on the internet, utilizing open-sourceprojects and informal documentation. For finding useful code examples, informaldocumentation, such as Stack Overflow discussions and forums, can beinvaluable. We have focused our research on Stack Overflow, which is a popularresource for discussing different topics among software developers. Forincreasing the quality of the recommended code examples, we have collected andrecommended the best code examples in the Java programming language. We haveutilized BERT in our approach, which is a Large Language Model (LLM) for textrepresentation that can effectively extract semantic information from textualdata. Our first step involved using BERT to convert code examples intonumerical vectors. Subsequently, we applied LSH to identify Approximate NearestNeighbors (ANN). Our research involved the implementation of two variants ofthis approach, namely the Random Hyperplane-based LSH and the Query-Aware LSH.Our study compared two algorithms using four parameters: HitRate, MeanReciprocal Rank (MRR), Average Execution Time, and Relevance. The results ofour analysis revealed that the Query- Aware (QA) approach outperformed theRandom Hyperplane-based (RH) approach in terms of HitRate. Specifically, the QAapproach achieved a HitRate improvement of 20% to 35% for query pairs comparedto the RH approach. Creating hashing tables and assigning data samples tobuckets using the QA approach is at least four times faster than the RHapproach. The QA approach returns code examples within milliseconds, while ittakes several seconds (sec) for the RH approach to recommend code examples.", "output": "Improving Code Example Recommendations on Informal Documentation Using BERT and Query-Aware LSH: A Comparative Study."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Clustering is at the very core of machine learning, and its applicationsproliferate with the increasing availability of data. However, as datasetsgrow, comparing clusterings with an adjustment for chance becomescomputationally difficult, preventing unbiased ground-truth comparisons andsolution selection. We propose FastAMI, a Monte Carlo-based method toefficiently approximate the Adjusted Mutual Information (AMI) and extend it tothe Standardized Mutual Information (SMI). The approach is compared with theexact calculation and a recently developed variant of the AMI based on pairwisepermutations, using both synthetic and real data. In contrast to the exactcalculation our method is fast enough to enable these adjustedinformation-theoretic comparisons for large datasets while maintainingconsiderably more accurate results than the pairwise approach.", "output": "FastAMI -- a Monte Carlo Approach to the Adjustment for Chance in Clustering Comparison Metrics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This project focuses on enhancing open-source large language models throughinstruction-tuning and providing comprehensive evaluations of theirperformance. We explore how various training data factors, such as quantity,quality, and linguistic distribution, influence the performance ofinstruction-tuned models trained on publicly accessible high-qualityinstruction datasets for both English and Chinese languages. Our goal is tosupplement evaluation with quantitative analyses, providing valuable insightsfor the continued advancement of open-source chat models. Our model, data, andcode are publicly available for others to use and build upon.", "output": "Panda LLM: Training Data and Evaluation for Open-Sourced Chinese Instruction-Following Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce three simple randomized variants of byte pair encoding (BPE) andexplore whether randomizing the selection of merge operations substantiallyaffects a downstream machine translation task. We focus on translation intomorphologically rich languages, hypothesizing that this task may showsensitivity to the method of choosing subwords. Analysis using a Bayesianlinear model indicates that two of the variants perform nearlyindistinguishably compared to standard BPE while the other degrades performanceless than we anticipated. We conclude that although standard BPE is widelyused, there exists an interesting universe of potential variations on it worthinvestigating. Our code is available at: ", "output": "What changes when you randomly choose BPE merge operations? Not much."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Prior works for reconstructing hand-held objects from a single image rely ondirect 3D shape supervision which is challenging to gather in real world atscale. Consequently, these approaches do not generalize well when presentedwith novel objects in in-the-wild settings. While 3D supervision is a majorbottleneck, there is an abundance of in-the-wild raw video data showinghand-object interactions. In this paper, we automatically extract 3Dsupervision (via multiview 2D supervision) from such raw video data to scale upthe learning of models for hand-held object reconstruction. This requirestackling two key challenges: unknown camera pose and occlusion. For the former,we use hand pose (predicted from existing techniques, e.g. FrankMocap) as aproxy for object pose. For the latter, we learn data-driven 3D shape priorsusing synthetic objects from the ObMan dataset. We use these indirect 3D cuesto train occupancy networks that predict the 3D shape of objects from a singleRGB image. Our experiments on the MOW and HO3D datasets show the effectivenessof these supervisory signals at predicting the 3D shape for real-worldhand-held objects without any direct real-world 3D supervision.", "output": "Learning Hand-Held Object Reconstruction from In-The-Wild Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervisedfine-tuning (SFT) with human annotations and reinforcement learning from humanfeedback (RLHF) to align the output of large language models (LLMs) with humanintentions, ensuring they are helpful, ethical, and reliable. However, thisdependence can significantly constrain the true potential of AI-assistantagents due to the high cost of obtaining human supervision and the relatedissues on quality, reliability, diversity, self-consistency, and undesirablebiases. To address these challenges, we propose a novel approach calledSELF-ALIGN, which combines principle-driven reasoning and the generative powerof LLMs for the self-alignment of AI agents with minimal human supervision. Ourapproach encompasses four stages: first, we use an LLM to generate syntheticprompts, and a topic-guided method to augment the prompt diversity; second, weuse a small set of human-written principles for AI models to follow, and guidethe LLM through in-context learning from demonstrations (of principlesapplication) to produce helpful, ethical, and reliable responses to user'squeries; third, we fine-tune the original LLM with the high-qualityself-aligned responses so that the resulting model can generate desirableresponses for each query directly without the principle set and thedemonstrations anymore; and finally, we offer a refinement step to address theissues of overly-brief or indirect responses. Applying SELF-ALIGN to theLLaMA-65b base language model, we develop an AI assistant named Dromedary. Withfewer than 300 lines of human annotations (including < 200 seed prompts, 16generic principles, and 5 exemplars for in-context learning). Dromedarysignificantly surpasses the performance of several state-of-the-art AI systems,including Text-Davinci-003 and Alpaca, on benchmark datasets with varioussettings.", "output": "Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Driven by large-data pre-training, Segment Anything Model (SAM) has beendemonstrated as a powerful and promptable framework, revolutionizing thesegmentation models. Despite the generality, customizing SAM for specificvisual concepts without man-powered prompting is under explored, e.g.,automatically segmenting your pet dog in different images. In this paper, wepropose a training-free Personalization approach for SAM, termed as PerSAM.Given only a single image with a reference mask, PerSAM first localizes thetarget concept by a location prior, and segments it within other images orvideos via three techniques: target-guided attention, target-semanticprompting, and cascaded post-refinement. In this way, we effectively adapt SAMfor private use without any training. To further alleviate the mask ambiguity,we present an efficient one-shot fine-tuning variant, PerSAM-F. Freezing theentire SAM, we introduce two learnable weights for multi-scale masks, onlytraining 2 parameters within 10 seconds for improved performance. Todemonstrate our efficacy, we construct a new segmentation dataset, PerSeg, forpersonalized evaluation, and test our methods on video object segmentation withcompetitive performance. Besides, our approach can also enhance DreamBooth topersonalize Stable Diffusion for text-to-image generation, which discards thebackground disturbance for better target appearance learning. Code is releasedat ", "output": "Personalize Segment Anything Model with One Shot."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Tracking objects with persistence in cluttered and dynamic environmentsremains a difficult challenge for computer vision systems. In this paper, weintroduce $textbf{TCOW}$, a new benchmark and model for visual trackingthrough heavy occlusion and containment. We set up a task where the goal is to,given a video sequence, segment both the projected extent of the target object,as well as the surrounding container or occluder whenever one exists. To studythis task, we create a mixture of synthetic and annotated real datasets tosupport both supervised learning and structured evaluation of model performanceunder various forms of task variation, such as moving or nested containment. Weevaluate two recent transformer-based video models and find that while they canbe surprisingly capable of tracking targets under certain settings of taskvariation, there remains a considerable performance gap before we can claim atracking model to have acquired a true notion of object permanence.", "output": "Tracking through Containers and Occluders in the Wild."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Nowadays, many of the images captured are `observed' by machines only and notby humans, e.g., in autonomous systems. High-level machine vision models, suchas object recognition or semantic segmentation, assume images are transformedinto some canonical image space by the camera ans{Image Signal Processor(ISP)}. However, the camera ISP is optimized for producing visually pleasingimages for human observers and not for machines. Therefore, one may spare theISP compute time and apply vision models directly to RAW images. Yet, it hasbeen shown that training such models directly on RAW images results in aperformance drop. To mitigate this drop, we use a RAW and RGB image pairsdataset, which can be easily acquired with no human labeling. We then train amodel that is applied directly to the RAW data by using knowledge distillationsuch that the model predictions for RAW images will be aligned with thepredictions of an off-the-shelf pre-trained model for processed RGB images. Ourexperiments show that our performance on RAW images for object classificationand semantic segmentation is significantly better than models trained onlabeled RAW images. It also reasonably matches the predictions of a pre-trainedmodel on processed RGB images, while saving the ISP compute overhead.", "output": "ISP Distillation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Quantum Natural Language Processing (QNLP) deals with the design andimplementation of NLP models intended to be run on quantum hardware. In thispaper, we present results on the first NLP experiments conducted on NoisyIntermediate-Scale Quantum (NISQ) computers for datasets of size greater than100 sentences. Exploiting the formal similarity of the compositional model ofmeaning by Coecke, Sadrzadeh and Clark (2010) with quantum theory, we createrepresentations for sentences that have a natural mapping to quantum circuits.We use these representations to implement and successfully train NLP modelsthat solve simple sentence classification tasks on quantum hardware. We conductquantum simulations that compare the syntax-sensitive model of Coecke et al.with two baselines that use less or no syntax; specifically, we implement thequantum analogues of a \"bag-of-words\" model, where syntax is not taken intoaccount at all, and of a word-sequence model, where only word order isrespected. We demonstrate that all models converge smoothly both in simulationsand when run on quantum hardware, and that the results are the expected onesbased on the nature of the tasks and the datasets used. Another important goalof this paper is to describe in a way accessible to AI and NLP researchers themain principles, process and challenges of experiments on quantum hardware. Ouraim in doing this is to take the first small steps in this unexplored researchterritory and pave the way for practical Quantum Natural Language Processing.", "output": "QNLP in Practice: Running Compositional Models of Meaning on a Quantum Computer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Discrete event systems (DES) have been deeply developed and applied inpractice, but state complexity in DES still is an important problem to bebetter solved with innovative methods. With the development of quantumcomputing and quantum control, a natural problem is to simulate DES by means ofquantum computing models and to establish {it quantum DES} (QDES). Themotivation is twofold: on the one hand, QDES have potential applications whenDES are simulated and processed by quantum computers, where quantum systems areemployed to simulate the evolution of states driven by discrete events, and onthe other hand, QDES may have essential advantages over DES concerning statecomplexity for imitating some practical problems. So, the goal of this paper isto establish a basic framework of QDES by using {it quantum finite automata}(QFA) as the modelling formalisms, and the supervisory control theorems of QDESare established and proved. Then we present a polynomial-time algorithm todecide whether or not the controllability condition holds. In particular, weconstruct a number of new examples of QFA to illustrate the supervisory controlof QDES and to verify the essential advantages of QDES over classical DES instate complexity.", "output": "Supervisory Control of Quantum Discrete Event Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In linear algebra applications, elementary matrices hold a significant role.This paper presents a diagrammatic representation of all $2^mtimes 2^n$-sizedelementary matrices in algebraic ZX-calculus, showcasing their properties oninverses and transpose through diagrammatic rewriting. Additionally, the paperuses this representation to depict the Jozsa-style matchgate in algebraicZX-calculus. To further enhance practical use, we have implemented thisrepresentation in texttt{discopy}. Overall, this work sets the groundwork formore applications of ZX-calculus such as synthesising controlled matrices[arXiv:2212.04462] in quantum computing.", "output": "Representing and Implementing Matrices Using Algebraic ZX-calculus."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In many board games and other abstract games, patterns have been used asfeatures that can guide automated game-playing agents. Such patterns orfeatures often represent particular configurations of pieces, empty positions,etc., which may be relevant for a game's strategies. Their use has beenparticularly prevalent in the game of Go, but also many other games used asbenchmarks for AI research. In this paper, we formulate a design and efficientimplementation of spatial state-action features for general games. These arepatterns that can be trained to incentivise or disincentivise actions based onwhether or not they match variables of the state in a local area around actionvariables. We provide extensive details on several design and implementationchoices, with a primary focus on achieving a high degree of generality tosupport a wide variety of different games using different board geometries orother graphs. Secondly, we propose an efficient approach for evaluating activefeatures for any given set of features. In this approach, we take inspirationfrom heuristics used in problems such as SAT to optimise the order in whichparts of patterns are matched and prune unnecessary evaluations. This approachis defined for a highly general and abstract description of the problem --phrased as optimising the order in which propositions of formulas indisjunctive normal form are evaluated -- and may therefore also be of interestto other types of problems than board games. An empirical evaluation on 33distinct games in the Ludii general game system demonstrates the efficiency ofthis approach in comparison to a naive baseline, as well as a baseline based onprefix trees, and demonstrates that the additional efficiency significantlyimproves the playing strength of agents using the features to guide search.", "output": "Spatial State-Action Features for General Games."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Metaheuristics are widely recognized gradient-free solvers to hard problemsthat do not meet the rigorous mathematical assumptions of conventional solvers.The automated design of metaheuristic algorithms provides an attractive path torelieve manual design effort and gain enhanced performance beyond human-madealgorithms. However, the specific algorithm prototype and linear algorithmrepresentation in the current automated design pipeline restrict the designwithin a fixed algorithm structure, which hinders discovering novelties anddiversity across the metaheuristic family. To address this challenge, thispaper proposes a general framework, AutoOpt, for automatically designingmetaheuristic algorithms with diverse structures. AutoOpt contains threeinnovations: (i) A general algorithm prototype dedicated to covering themetaheuristic family as widely as possible. It promotes high-quality automateddesign on different problems by fully discovering potentials and noveltiesacross the family. (ii) A directed acyclic graph algorithm representation tofit the proposed prototype. Its flexibility and evolvability enable discoveringvarious algorithm structures in a single run of design, thus boosting thepossibility of finding high-performance algorithms. (iii) A graphrepresentation embedding method offering an alternative compact form of thegraph to be manipulated, which ensures AutoOpt's generality. Experiments onnumeral functions and real applications validate AutoOpt's efficiency andpracticability.", "output": "AutoOpt: A General Framework for Automatically Designing Metaheuristic Optimization Algorithms with Diverse Structures."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Document Visual Question Answering (VQA) aims to understand visually-richdocuments to answer questions in natural language, which is an emergingresearch topic for both Natural Language Processing and Computer Vision. Inthis work, we introduce a new Document VQA dataset, named TAT-DQA, whichconsists of 3,067 document pages comprising semi-structured table(s) andunstructured text as well as 16,558 question-answer pairs by extending theTAT-QA dataset. These documents are sampled from real-world financial reportsand contain lots of numbers, which means discrete reasoning capability isdemanded to answer questions on this dataset. Based on TAT-DQA, we furtherdevelop a novel model named MHST that takes into account the information inmulti-modalities, including text, layout and visual image, to intelligentlyaddress different types of questions with corresponding strategies, i.e.,extraction or reasoning. Extensive experiments show that the MHST modelsignificantly outperforms the baseline methods, demonstrating itseffectiveness. However, the performance still lags far behind that of experthumans. We expect that our new TAT-DQA dataset would facilitate the research ondeep understanding of visually-rich documents combining vision and language,especially for scenarios that require discrete reasoning. Also, we hope theproposed model would inspire researchers to design more advanced Document VQAmodels in future. Our dataset will be publicly available for non-commercial useat ", "output": "Towards Complex Document Understanding By Discrete Reasoning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automatic synthesis of realistic co-speech gestures is an increasinglyimportant yet challenging task in artificial embodied agent creation. Previoussystems mainly focus on generating gestures in an end-to-end manner, whichleads to difficulties in mining the clear rhythm and semantics due to thecomplex yet subtle harmony between speech and gestures. We present a novelco-speech gesture synthesis method that achieves convincing results both on therhythm and semantics. For the rhythm, our system contains a robust rhythm-basedsegmentation pipeline to ensure the temporal coherence between the vocalizationand gestures explicitly. For the gesture semantics, we devise a mechanism toeffectively disentangle both low- and high-level neural embeddings of speechand motion based on linguistic theory. The high-level embedding corresponds tosemantics, while the low-level embedding relates to subtle variations. Lastly,we build correspondence between the hierarchical embeddings of the speech andthe motion, resulting in rhythm- and semantics-aware gesture synthesis.Evaluations with existing objective metrics, a newly proposed rhythmic metric,and human feedback show that our method outperforms state-of-the-art systems bya clear margin.", "output": "Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Controller synthesis is in essence a case of model-based planning fornon-deterministic environments in which plans (actually ''strategies'') aremeant to preserve system goals indefinitely. In the case of supervisory controlenvironments are specified as the parallel composition of state machines andvalid strategies are required to be ''non-blocking'' (i.e., always enabling theenvironment to reach certain marked states) in addition to safe (i.e., keep thesystem within a safe zone). Recently, On-the-fly Directed Controller Synthesistechniques were proposed to avoid the exploration of the entire -andexponentially large-environment space, at the cost of non-maximalpermissiveness, to either find a strategy or conclude that there is none. Theincremental exploration of the plant is currently guided by adomain-independent human-designed heuristic. In this work, we propose a newmethod for obtaining heuristics based on Reinforcement Learning (RL). Thesynthesis algorithm is thus framed as an RL task with an unbounded action spaceand a modified version of DQN is used. With a simple and general set offeatures that abstracts both states and actions, we show that it is possible tolearn heuristics on small versions of a problem that generalize to the largerinstances, effectively doing zero-shot policy transfer. Our agents learn fromscratch in a highly partially observable RL task and outperform the existingheuristic overall, in instances unseen during training.", "output": "Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This tutorial is an andragogical guide for students and practitioners seekingto understand the fundamentals and practice of linear programming. Theexercises demonstrate how to solve classical optimization problems with anemphasis on spatial analysis in supply chain management and transportlogistics. All exercises display the Python programs and optimization librariesused to solve them. The first chapter introduces key concepts in linearprogramming and contributes a new cognitive framework to help students andpractitioners set up each optimization problem. The cognitive frameworkorganizes the decision variables, constraints, the objective function, andvariable bounds in a format for direct application to optimization software.The second chapter introduces two types of mobility optimization problems(shortest path in a network and minimum cost tour) in the context of deliveryand service planning logistics. The third chapter introduces four types ofspatial optimization problems (neighborhood coverage, flow capturing, zoneheterogeneity, service coverage) and contributes a workflow to visualize theoptimized solutions in maps. The workflow creates decision variables from mapsby using the free geographic information systems (GIS) programs QGIS and GeoDA.The fourth chapter introduces three types of spatial logistical problems(spatial distribution, flow maximization, warehouse location optimization) anddemonstrates how to scale the cognitive framework in software to reachsolutions. The final chapter summarizes lessons learned and provides insightsabout how students and practitioners can modify the Phyton programs and GISworkflows to solve their own optimization problem and visualize the results.", "output": "Tutorial and Practice in Linear Programming: Optimization Problems in Supply Chain and Transport Logistics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Image harmonization is a critical task in computer vision, which aims toadjust the foreground to make it compatible with the background. Recent worksmainly focus on using global transformations (i.e., normalization and colorcurve rendering) to achieve visual consistency. However, these models ignorelocal visual consistency and their huge model sizes limit their harmonizationability on edge devices. In this paper, we propose a hierarchical dynamicnetwork (HDNet) to adapt features from local to global view for better featuretransformation in efficient image harmonization. Inspired by the success ofvarious dynamic models, local dynamic (LD) module and mask-aware global dynamic(MGD) module are proposed in this paper. Specifically, LD matches localrepresentations between the foreground and background regions based on semanticsimilarities, then adaptively adjust every foreground local representationaccording to the appearance of its $K$-nearest neighbor background regions. Inthis way, LD can produce more realistic images at a more fine-grained level,and simultaneously enjoy the characteristic of semantic alignment. The MGDeffectively applies distinct convolution to the foreground and background,learning the representations of foreground and background regions as well astheir correlations to the global harmonization, facilitating local visualconsistency for the images much more efficiently. Experimental resultsdemonstrate that the proposed HDNet significantly reduces the total modelparameters by more than 80% compared to previous methods, while stillattaining state-of-the-art performance on the popular iHarmony4 dataset.Notably, the HDNet achieves a 4% improvement in PSNR and a 19% reduction inMSE compared to the prior state-of-the-art methods.", "output": "Hierarchical Dynamic Image Harmonization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reasoning, as an essential ability for complex problem-solving, can provideback-end support for various real-world applications, such as medicaldiagnosis, negotiation, etc. This paper provides a comprehensive survey ofcutting-edge research on reasoning with language model prompting. We introduceresearch works with comparisons and summaries and provide systematic resourcesto help beginners. We also discuss the potential reasons for emerging suchreasoning abilities and highlight future research directions. Resources areavailable at (updatedperiodically).", "output": "Reasoning with Language Model Prompting: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spectral bias is an important observation of neural network training, statingthat the network will learn a low frequency representation of the targetfunction before converging to higher frequency components. This property isinteresting due to its link to good generalization in over-parameterizednetworks. However, in low dimensional settings, a severe spectral bias occursthat obstructs convergence to high frequency components entirely. In order toovercome this limitation, one can encode the inputs using a high frequencysinusoidal encoding. Previous works attempted to explain this phenomenon usingNeural Tangent Kernel (NTK) and Fourier analysis. However, NTK does not capturereal network dynamics, and Fourier analysis only offers a global perspective onthe network properties that induce this bias. In this paper, we provide a novelapproach towards understanding spectral bias by directly studying ReLU MLPtraining dynamics. Specifically, we focus on the connection between thecomputations of ReLU networks (activation regions), and the speed of gradientdescent convergence. We study these dynamics in relation to the spatialinformation of the signal to understand how they influence spectral bias. Wethen use this formulation to study the severity of spectral bias in lowdimensional settings, and how positional encoding overcomes this.", "output": "Understanding the Spectral Bias of Coordinate Based MLPs Via Training Dynamics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Future collaborative robots must be capable of finding objects. As such afundamental skill, we expect object search to eventually become anoff-the-shelf capability for any robot, similar to e.g., object detection,SLAM, and motion planning. However, existing approaches either make unrealisticcompromises (e.g., reduce the problem from 3D to 2D), resort to ad-hoc, greedysearch strategies, or attempt to learn end-to-end policies in simulation thatare yet to generalize across real robots and environments. This thesis arguesthat through using Partially Observable Markov Decision Processes (POMDPs) tomodel object search while exploiting structures in the human world (e.g.,octrees, correlations) and in human-robot interaction (e.g., spatial language),a practical and effective system for generalized object search can be achieved.In support of this argument, I develop methods and systems for (multi-)objectsearch in 3D environments under uncertainty due to limited field of view,occlusion, noisy, unreliable detectors, spatial correlations between objects,and possibly ambiguous spatial language (e.g., \"The red car is behind ChaseBank\"). Besides evaluation in simulators such as PyGame, AirSim, and AI2-THOR,I design and implement a robot-independent, environment-agnostic system forgeneralized object search in 3D and deploy it on the Boston Dynamics Spotrobot, the Kinova MOVO robot, and the Universal Robots UR5e robotic arm, toperform object search in different environments. The system enables, forexample, a Spot robot to find a toy cat hidden underneath a couch in a kitchenarea in under one minute. This thesis also broadly surveys the object searchliterature, proposing taxonomies in object search problem settings, methods andsystems.", "output": "Generalized Object Search."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, pre-trained large language models have demonstratedremarkable efficiency in achieving an inference-time few-shot learningcapability known as in-context learning. However, existing literature hashighlighted the sensitivity of this capability to the selection of few-shotdemonstrations. The underlying mechanisms by which this capability arises fromregular language model pretraining objectives remain poorly understood. In thisstudy, we aim to examine the in-context learning phenomenon through a Bayesianlens, viewing large language models as topic models that implicitly infertask-related information from demonstrations. On this premise, we propose analgorithm for selecting optimal demonstrations from a set of annotated data anddemonstrate a significant 12.5% improvement relative to the random selectionbaseline, averaged over eight GPT2 and GPT3 models on eight differentreal-world text classification datasets. Our empirical findings support ourhypothesis that large language models implicitly infer a latent conceptvariable.", "output": "Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances in Transformers have come with a huge requirement oncomputing resources, highlighting the importance of developing efficienttraining techniques to make Transformer training faster, at lower cost, and tohigher accuracy by the efficient use of computation and memory resources. Thissurvey provides the first systematic overview of the efficient training ofTransformers, covering the recent progress in acceleration arithmetic andhardware, with a focus on the former. We analyze and compare methods that savecomputation and memory costs for intermediate tensors during training, togetherwith techniques on hardware/algorithm co-design. We finally discuss challengesand promising areas for future research.", "output": "A Survey on Efficient Training of Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Desktop-based virtual colonoscopy has been proven to be an asset in theidentification of colon anomalies. The process is accurate, althoughtime-consuming. The use of immersive interfaces for virtual colonoscopy isincipient and not yet understood. In this work, we present a new designexploring elements of the VR paradigm to make the immersive analysis moreefficient while still effective. We also plan the conduction of experimentswith experts to assess the multi-factor influences of coverage, duration, anddiagnostic accuracy.", "output": "Development of an Immersive Virtual Colonoscopy Viewer for Colon Growths Diagnosis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Whole Slide Images (WSIs) or histopathology images are used in digitalpathology. WSIs pose great challenges to deep learning models for clinicaldiagnosis, owing to their size and lack of pixel-level annotations. With therecent advancements in computational pathology, newer multiple-instancelearning-based models have been proposed. Multiple-instance learning for WSIsnecessitates creating patches and uses the encoding of these patches fordiagnosis. These models use generic pre-trained models (ResNet-50 pre-trainedon ImageNet) for patch encoding. The recently proposed KimiaNet, a DenseNet121model pre-trained on TCGA slides, is a domain-specific pre-trained model. Thispaper shows the effect of domain-specific pre-training on WSI classification.To investigate the effect of domain-specific pre-training, we considered thecurrent state-of-the-art multiple-instance learning models, 1) CLAM, anattention-based model, and 2) TransMIL, a self-attention-based model, andevaluated the models' confidence and predictive performance in detectingprimary brain tumors - gliomas. Domain-specific pre-training improves theconfidence of the models and also achieves a new state-of-the-art performanceof WSI-based glioma subtype classification, showing a high clinicalapplicability in assisting glioma diagnosis. We will publicly share our codeand experimental results at", "output": "Domain-Specific Pre-training Improves Confidence in Whole Slide Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In autonomous driving, data augmentation is commonly used for improving 3Dobject detection. The most basic methods include insertion of copied objectsand rotation and scaling of the entire training frame. Numerous variants havebeen developed as well. The existing methods, however, are considerably limitedwhen compared to the variety of the real world possibilities. In this work, wedevelop a diversified and realistic augmentation method that can flexiblyconstruct a whole-body object, freely locate and rotate the object, and applyself-occlusion and external-occlusion accordingly. To improve the diversity ofthe whole-body object construction, we develop an iterative method thatstochastically combines multiple objects observed from the real world into asingle object. Unlike the existing augmentation methods, the constructedobjects can be randomly located and rotated in the training frame becauseproper occlusions can be reflected to the whole-body objects in the final step.Finally, proper self-occlusion at each local object level andexternal-occlusion at the global frame level are applied using the Hidden PointRemoval (HPR) algorithm that is computationally efficient. HPR is also used foradaptively controlling the point density of each object according to theobject's distance from the LiDAR. Experiment results show that the proposedDR.CPO algorithm is data-efficient and model-agnostic without incurring anycomputational overhead. Also, DR.CPO can improve mAP performance by 2.08% whencompared to the best 3D detection result known for KITTI dataset. The code isavailable at ", "output": "DR.CPO: Diversified and Realistic 3D Augmentation via Iterative Construction, Random Placement, and HPR Occlusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a simple pairwise sigmoid loss for image-text pre-training. Unlikestandard contrastive learning with softmax normalization, the sigmoid lossoperates solely on image-text pairs and does not require a global view of thepairwise similarities for normalization. The sigmoid loss simultaneously allowsfurther scaling up the batch size, while also performing better at smallerbatch sizes. With only four TPUv4 chips, we can train a Base CLIP model at 4kbatch size and a Large LiT model at 20k batch size, the latter achieves 84.5%ImageNet zero-shot accuracy in two days. This disentanglement of the batch sizefrom the loss further allows us to study the impact of examples vs pairs andnegative to positive ratio. Finally, we push the batch size to the extreme, upto one million, and find that the benefits of growing batch size quicklydiminish, with a more reasonable batch size of 32k being sufficient. We hopeour research motivates further explorations in improving the quality andefficiency of language-image pre-training.", "output": "Sigmoid Loss for Language Image Pre-Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language is essentially a complex, intricate system of human expressionsgoverned by grammatical rules. It poses a significant challenge to developcapable AI algorithms for comprehending and grasping a language. As a majorapproach, language modeling has been widely studied for language understandingand generation in the past two decades, evolving from statistical languagemodels to neural language models. Recently, pre-trained language models (PLMs)have been proposed by pre-training Transformer models over large-scale corpora,showing strong capabilities in solving various NLP tasks. Since researchershave found that model scaling can lead to performance improvement, they furtherstudy the scaling effect by increasing the model size to an even larger size.Interestingly, when the parameter scale exceeds a certain level, these enlargedlanguage models not only achieve a significant performance improvement but alsoshow some special abilities that are not present in small-scale languagemodels. To discriminate the difference in parameter scale, the researchcommunity has coined the term large language models (LLM) for the PLMs ofsignificant size. Recently, the research on LLMs has been largely advanced byboth academia and industry, and a remarkable progress is the launch of ChatGPT,which has attracted widespread attention from society. The technical evolutionof LLMs has been making an important impact on the entire AI community, whichwould revolutionize the way how we develop and use AI algorithms. In thissurvey, we review the recent advances of LLMs by introducing the background,key findings, and mainstream techniques. In particular, we focus on four majoraspects of LLMs, namely pre-training, adaptation tuning, utilization, andcapacity evaluation. Besides, we also summarize the available resources fordeveloping LLMs and discuss the remaining issues for future directions.", "output": "A Survey of Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The retriever-reader framework is popular for open-domain question answering(ODQA), where a retriever samples for the reader a set of relevant candidatepassages from a large corpus. A key assumption behind this method is that highrelevance scores from the retriever likely indicate high answerability from thereader, which implies a high probability that the retrieved passages containanswers to a given question. In this work, we empirically dispel this beliefand observe that recent dense retrieval models based on DPR often rankunanswerable counterfactual passages higher than their answerable originalpassages. To address such answer-unawareness in dense retrievers, we seek touse counterfactual samples as additional training resources to bettersynchronize the relevance measurement of DPR with the answerability ofquestion-passage pairs. Specifically, we present counterfactually-PivotingContrastive Learning (PiCL), a novel representation learning approach forpassage retrieval that leverages counterfactual samples as pivots betweenpositive and negative samples in their learned embedding space. We incorporatePiCL into the retriever training to show the effectiveness of PiCL on ODQAbenchmarks and the robustness of the learned models.", "output": "Revisiting Dense Retrieval with Unanswerable Counterfactuals."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances in generative artificial intelligence (AI) have capturedworldwide attention. Tools such as Dalle-2 and ChatGPT suggest that taskspreviously thought to be beyond the capabilities of AI may now augment theproductivity of creative media in various new ways, including through thegeneration of synthetic video. This research paper explores the utility ofusing AI-generated synthetic video to create viable educational content foronline educational settings. To date, there is limited research investigatingthe real-world educational value of AI-generated synthetic media. To addressthis gap, we examined the impact of using AI-generated synthetic video in anonline learning platform on both learners content acquisition and learningexperience. We took a mixed-method approach, randomly assigning adult learners(n=83) into one of two micro-learning conditions, collecting pre- andpost-learning assessments, and surveying participants on their learningexperience. The control condition included a traditionally produced instructorvideo, while the experimental condition included a synthetic video with arealistic AI-generated character. The results show that learners in bothconditions demonstrated significant improvement from pre- to post-learning(p<.001), with no significant differences in gains between the two conditions(p=.80). In addition, no differences were observed in how learners perceivedthe traditional and synthetic videos. These findings suggest that AI-generatedsynthetic learning videos have the potential to be a viable substitute forvideos produced via traditional methods in online educational settings, makinghigh quality educational content more accessible across the globe.", "output": "Generative AI for learning: Investigating the potential of synthetic learning videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Face anti-spoofing (FAS) is an essential mechanism for safeguarding theintegrity of automated face recognition systems. Despite substantialadvancements, the generalization of existing approaches to real-worldapplications remains challenging. This limitation can be attributed to thescarcity and lack of diversity in publicly available FAS datasets, which oftenleads to overfitting during training or saturation during testing. In terms ofquantity, the number of spoof subjects is a critical determinant. Most datasetscomprise fewer than 2,000 subjects. With regard to diversity, the majority ofdatasets consist of spoof samples collected in controlled environments usingrepetitive, mechanical processes. This data collection methodology results inhomogenized samples and a dearth of scenario diversity. To address theseshortcomings, we introduce the Wild Face Anti-Spoofing (WFAS) dataset, alarge-scale, diverse FAS dataset collected in unconstrained settings. Ourdataset encompasses 853,729 images of 321,751 spoof subjects and 529,571 imagesof 148,169 live subjects, representing a substantial increase in quantity.Moreover, our dataset incorporates spoof data obtained from the internet,spanning a wide array of scenarios and various commercial sensors, including 17presentation attacks (PAs) that encompass both 2D and 3D forms. This novel datacollection strategy markedly enhances FAS data diversity. Leveraging the WFASdataset and Protocol 1 (Known-Type), we host the Wild Face Anti-SpoofingChallenge at the CVPR2023 workshop. Additionally, we meticulously evaluaterepresentative methods using Protocol 1 and Protocol 2 (Unknown-Type). Throughan in-depth examination of the challenge outcomes and benchmark baselines, weprovide insightful analyses and propose potential avenues for future research.The dataset is released under Insightface.", "output": "Wild Face Anti-Spoofing Challenge 2023: Benchmark and Results."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent years have witnessed an exponential increase in the demand for facevideo compression, and the success of artificial intelligence has expanded theboundaries beyond traditional hybrid video coding. Generative coding approacheshave been identified as promising alternatives with reasonable perceptualrate-distortion trade-offs, leveraging the statistical priors of face videos.However, the great diversity of distortion types in spatial and temporaldomains, ranging from the traditional hybrid coding frameworks to generativemodels, present grand challenges in compressed face video quality assessment(VQA). In this paper, we introduce the large-scale Compressed Face VideoQuality Assessment (CFVQA) database, which is the first attempt tosystematically understand the perceptual quality and diversified compressiondistortions in face videos. The database contains 3,240 compressed face videoclips in multiple compression levels, which are derived from 135 source videoswith diversified content using six representative video codecs, including twotraditional methods based on hybrid coding frameworks, two end-to-end methods,and two generative methods. In addition, a FAce VideO IntegeRity (FAVOR) indexfor face video compression was developed to measure the perceptual quality,considering the distinct content characteristics and temporal priors of theface videos. Experimental results exhibit its superior performance on theproposed CFVQA dataset. The benchmark is now made publicly available at:", "output": "Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wilddatasets is important to capture the diversity in human speech such as speakeridentities, prosodies, and styles (e.g., singing). Current large TTS systemsusually quantize speech into discrete tokens and use language models togenerate these tokens one by one, which suffer from unstable prosody, wordskipping/repeating issue, and poor voice quality. In this paper, we developNaturalSpeech 2, a TTS system that leverages a neural audio codec with residualvector quantizers to get the quantized latent vectors and uses a diffusionmodel to generate these latent vectors conditioned on text input. To enhancethe zero-shot capability that is important to achieve diverse speech synthesis,we design a speech prompting mechanism to facilitate in-context learning in thediffusion model and the duration/pitch predictor. We scale NaturalSpeech 2 tolarge-scale datasets with 44K hours of speech and singing data and evaluate itsvoice quality on unseen speakers. NaturalSpeech 2 outperforms previous TTSsystems by a large margin in terms of prosody/timbre similarity, robustness,and voice quality in a zero-shot setting, and performs novel zero-shot singingsynthesis with only a speech prompt. Audio samples are available at", "output": "NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial attacks aim to disturb the functionality of a target system byadding specific noise to the input samples, bringing potential threats tosecurity and robustness when applied to facial recognition systems. Althoughexisting defense techniques achieve high accuracy in detecting some specificadversarial faces (adv-faces), new attack methods especially GAN-based attackswith completely different noise patterns circumvent them and reach a higherattack success rate. Even worse, existing techniques require attack data beforeimplementing the defense, making it impractical to defend newly emergingattacks that are unseen to defenders. In this paper, we investigate theintrinsic generality of adv-faces and propose to generate pseudo adv-faces byperturbing real faces with three heuristically designed noise patterns. We arethe first to train an adv-face detector using only real faces and theirself-perturbations, agnostic to victim facial recognition systems, and agnosticto unseen attacks. By regarding adv-faces as out-of-distribution data, we thennaturally introduce a novel cascaded system for adv-face detection, whichconsists of training data self-perturbations, decision boundary regularization,and a max-pooling-based binary classifier focusing on abnormal local coloraberrations. Experiments conducted on LFW and CelebA-HQ datasets with eightgradient-based and two GAN-based attacks validate that our method generalizesto a variety of unseen adversarial attacks.", "output": "Detecting Adversarial Faces Using Only Real Face Self-Perturbations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Monitoring the health status of patients in the Intensive Care Unit (ICU) isa critical aspect of providing superior care and treatment. The availability oflarge-scale electronic health records (EHR) provides machine learning modelswith an abundance of clinical text and vital sign data, enabling them to makehighly accurate predictions. Despite the emergence of advanced Natural LanguageProcessing (NLP) algorithms for clinical note analysis, the complex textualstructure and noise present in raw clinical data have posed significantchallenges. Coarse embedding approaches without domain-specific refinement havelimited the accuracy of these algorithms. To address this issue, we proposeFINEEHR, a system that utilizes two representation learning techniques, namelymetric learning and fine-tuning, to refine clinical note embeddings, whileleveraging the intrinsic correlations among different health statuses and notecategories. We evaluate the performance of FINEEHR using two metrics, namelyArea Under the Curve (AUC) and AUC-PR, on a real-world MIMIC III dataset. Ourexperimental results demonstrate that both refinement approaches improveprediction accuracy, and their combination yields the best results. Moreover,our proposed method outperforms prior works, with an AUC improvement of over10%, achieving an average AUC of 96.04% and an average AUC-PR of 96.48% acrossvarious classifiers.", "output": "FineEHR: Refine Clinical Note Representations to Improve Mortality Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "When applied to Image-to-text models, interpretability methods often providetoken-by-token explanations namely, they compute a visual explanation for eachtoken of the generated sequence. Those explanations are expensive to computeand unable to comprehensively explain the model's output. Therefore, thesemodels often require some sort of approximation that eventually leads tomisleading explanations. We develop a framework based on SHAP, that allows forgenerating comprehensive, meaningful explanations leveraging the meaningrepresentation of the output sequence as a whole. Moreover, by exploitingsemantic priors in the visual backbone, we extract an arbitrary number offeatures that allows the efficient computation of Shapley values on large-scalemodels, generating at the same time highly meaningful visual explanations. Wedemonstrate that our method generates semantically more expressive explanationsthan traditional methods at a lower compute cost and that it can be generalizedover other explainability methods.", "output": "Interpreting Vision and Language Generative Models with Semantic Visual Priors."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Experts advising decision-makers are likely to display expertise which variesas a function of the problem instance. In practice, this may lead tosub-optimal or discriminatory decisions against minority cases. In this work wemodel such changes in depth and breadth of knowledge as a partitioning of theproblem space into regions of differing expertise. We provide here newalgorithms that explicitly consider and adapt to the relationship betweenproblem instances and experts' knowledge. We first propose and highlight thedrawbacks of a naive approach based on nearest neighbor queries. To addressthese drawbacks we then introduce a novel algorithm - expertise trees - thatconstructs decision trees enabling the learner to select appropriate models. Weprovide theoretical insights and empirically validate the improved performanceof our novel approach on a range of problems for which existing methods provedto be inadequate.", "output": "Expertise Trees Resolve Knowledge Limitations in Collective Decision-Making."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most exploration research on reinforcement learning (RL) has paid attentionto `the way of exploration', which is `how to explore'. The other explorationresearch, `when to explore', has not been the main focus of RL explorationresearch. The issue of `when' of a monolithic exploration in the usual RLexploration behaviour binds an exploratory action to an exploitational actionof an agent. Recently, a non-monolithic exploration research has emerged toexamine the mode-switching exploration behaviour of humans and animals. Theultimate purpose of our research is to enable an agent to decide when toexplore or exploit autonomously. We describe the initial research of anautonomous multi-mode exploration of non-monolithic behaviour in an optionsframework. The higher performance of our method is shown against the existingnon-monolithic exploration method through comparative experimental results.", "output": "An Autonomous Non-monolithic Agent with Multi-mode Exploration based on Options Framework."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dataset Distillation aims to distill an entire dataset's knowledge into a fewsynthetic images. The idea is to synthesize a small number of synthetic datapoints that, when given to a learning algorithm as training data, result in amodel approximating one trained on the original data. Despite recent progressin the field, existing dataset distillation methods fail to generalize to newarchitectures and scale to high-resolution datasets. To overcome the aboveissues, we propose to use the learned prior from pre-trained deep generativemodels to synthesize the distilled data. To achieve this, we present a newoptimization algorithm that distills a large number of images into a fewintermediate feature vectors in the generative model's latent space. Our methodaugments existing techniques, significantly improving cross-architecturegeneralization in all settings.", "output": "Generalizing Dataset Distillation via Deep Generative Prior."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Question answering over knowledge bases is considered a difficult problem dueto the challenge of generalizing to a wide variety of possible natural languagequestions. Additionally, the heterogeneity of knowledge base schema itemsbetween different knowledge bases often necessitates specialized training fordifferent knowledge base question-answering (KBQA) datasets. To handlequestions over diverse KBQA datasets with a unified training-free framework, wepropose KB-BINDER, which for the first time enables few-shot in-contextlearning over KBQA tasks. Firstly, KB-BINDER leverages large language modelslike Codex to generate logical forms as the draft for a specific question byimitating a few demonstrations. Secondly, KB-BINDER grounds on the knowledgebase to bind the generated draft to an executable one with BM25 score matching.The experimental results on four public heterogeneous KBQA datasets show thatKB-BINDER can achieve a strong performance with only a few in-contextdemonstrations. Especially on GraphQA and 3-hop MetaQA, KB-BINDER can evenoutperform the state-of-the-art trained models. On GrailQA and WebQSP, ourmodel is also on par with other fully-trained models. We believe KB-BINDER canserve as an important baseline for future research. Our code is available at", "output": "Few-shot In-context Learning for Knowledge Base Question Answering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Concepts benefit natural language understanding but are far from complete inexisting knowledge graphs (KGs). Recently, pre-trained language models (PLMs)have been widely used in text-based concept extraction (CE). However, PLMs tendto mine the co-occurrence associations from massive corpus as pre-trainedknowledge rather than the real causal effect between tokens. As a result, thepre-trained knowledge confounds PLMs to extract biased concepts based onspurious co-occurrence correlations, inevitably resulting in low precision. Inthis paper, through the lens of a Structural Causal Model (SCM), we proposeequipping the PLM-based extractor with a knowledge-guided prompt as anintervention to alleviate concept bias. The prompt adopts the topic of thegiven entity from the existing knowledge in KGs to mitigate the spuriousco-occurrence correlations between entities and biased concepts. Our extensiveexperiments on representative multilingual KG datasets justify that ourproposed prompt can effectively alleviate concept bias and improve theperformance of PLM-based CE models.The code has been released on", "output": "Causality-aware Concept Extraction based on Knowledge-guided Prompting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Contrastive learning has become a popular approach in natural languageprocessing, particularly for the learning of sentence embeddings. However, thediscrete nature of natural language makes it difficult to ensure the quality ofpositive and negative sample pairs generated through data augmentation methods.Although supervised contrastive learning can produce more accurate sample pairswith human feedback labels, it still lacks fine-grained training signals. Inthis paper, we propose to improve textbf{C}ontrastive textbf{L}earning ofsentence embeddings from textbf{AI} textbf{F}eedback textbf{(CLAIF)}. Ourmethod utilizes AI feedback from large pre-trained language models (LLMs) toconstruct sample pairs with fine-grained sample similarity scores to improvecontrastive learning. Besides, we combine human feedback and AI feedback toprovide better supervision signals for supervised contrastive learning ofsentence embeddings. Experimental results show that our method achievesstate-of-the-art performance on several semantic textual similarity (STS) andtransfer learning tasks compared to other unsupervised and supervisedcontrastive learning methods.", "output": "Improving Contrastive Learning of Sentence Embeddings from AI Feedback."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Discrete reasoning over table-text documents (e.g., financial reports) gainsincreasing attention in recent two years. Existing works mostly simplify thischallenge by manually selecting and transforming document pages to structuredtables and paragraphs, hindering their practical application. In this work, weexplore a more realistic problem setting in the form of TAT-DQA, i.e. to answerthe question over a visually-rich table-text document. Specifically, we proposea novel Doc2SoarGraph framework with enhanced discrete reasoning capability byharnessing the differences and correlations among different elements (e.g.,quantities, dates) of the given question and document with Semantic-orientedhierarchical Graph structures. We conduct extensive experiments on TAT-DQAdataset, and the results show that our proposed framework outperforms the bestbaseline model by 17.73% and 16.91% in terms of Exact Match (EM) and F1 scorerespectively on the test set, achieving the new state-of-the-art.", "output": "Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents with Semantic-Oriented Hierarchical Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent introduction of ChatGPT has drawn significant attention from bothindustry and academia due to its impressive capabilities in solving a diverserange of tasks, including language translation, text summarization, andcomputer programming. Its capability for writing, modifying, and evencorrecting code together with its ease of use and access is alreadydramatically impacting computer science education. This paper aims to explorehow well ChatGPT can perform in an introductory-level functional languageprogramming course. In our systematic evaluation, we treated ChatGPT as one ofour students and demonstrated that it can achieve a grade B- and its rank inthe class is 155 out of 314 students overall. Our comprehensive evaluationprovides valuable insights into ChatGPT's impact from both student andinstructor perspectives. Additionally, we identify several potential benefitsthat ChatGPT can offer to both groups. Overall, we believe that this studysignificantly clarifies and advances our understanding of ChatGPT'scapabilities and potential impact on computer science education.", "output": "Can ChatGPT Pass An Introductory Level Functional Language Programming Course?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the field of antibody engineering, an essential task is to design a novelantibody whose paratopes bind to a specific antigen with correct epitopes.Understanding antibody structure and its paratope can facilitate a mechanisticunderstanding of its function. Therefore, antibody structure prediction fromits sequence alone has always been a highly valuable problem for de novoantibody design. AlphaFold2, a breakthrough in the field of structural biology,provides a solution to predict protein structure based on protein sequences andcomputationally expensive coevolutionary multiple sequence alignments (MSAs).However, the computational efficiency and undesirable prediction accuracy ofantibodies, especially on the complementarity-determining regions (CDRs) ofantibodies limit their applications in the industrially high-throughput drugdesign. To learn an informative representation of antibodies, we employed adeep antibody language model (ALM) on curated sequences from the observedantibody space database via a transformer model. We also developed a novelmodel named xTrimoABFold to predict antibody structure from antibody sequencebased on the pretrained ALM as well as efficient evoformers and structuralmodules. The model was trained end-to-end on the antibody structures in PDB byminimizing the ensemble loss of domain-specific focal loss on CDR and theframe-aligned point loss. xTrimoABFold outperforms AlphaFold2 and other proteinlanguage model based SOTAs, e.g., OmegaFold, HelixFold-Single, and IgFold witha large significant margin (30+% improvement on RMSD) while performing 151times faster than AlphaFold2. To the best of our knowledge, xTrimoABFoldachieved state-of-the-art antibody structure prediction. Its improvement inboth accuracy and efficiency makes it a valuable tool for de novo antibodydesign and could make further improvements in immuno-theory.", "output": "xTrimoABFold: De novo Antibody Structure Prediction without MSA."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a method for learning topology-preserving data representations(dimensionality reduction). The method aims to provide topological similaritybetween the data manifold and its latent representation via enforcing thesimilarity in topological features (clusters, loops, 2D voids, etc.) and theirlocalization. The core of the method is the minimization of the RepresentationTopology Divergence (RTD) between original high-dimensional data andlow-dimensional representation in latent space. RTD minimization providescloseness in topological features with strong theoretical guarantees. Wedevelop a scheme for RTD differentiation and apply it as a loss term for theautoencoder. The proposed method \"RTD-AE\" better preserves the global structureand topology of the data manifold than state-of-the-art competitors as measuredby linear correlation, triplet distance ranking accuracy, and Wassersteindistance between persistence barcodes.", "output": "Learning Topology-Preserving Data Representations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Self-supervised skeleton-based action recognition enjoys a rapid growth alongwith the development of contrastive learning. The existing methods rely onimposing invariance to augmentations of 3D skeleton within a single datastream, which merely leverages the easy positive pairs and limits the abilityto explore the complicated movement patterns. In this paper, we advocate thatthe defect of single-stream contrast and the lack of necessary featuretransformation are responsible for easy positives, and therefore propose aCross-Stream Contrastive Learning framework for skeleton-based actionRepresentation learning (CSCLR). Specifically, the proposed CSCLR not onlyutilizes intra-stream contrast pairs, but introduces inter-stream contrastpairs as hard samples to formulate a better representation learning. Besides,to further exploit the potential of positive pairs and increase the robustnessof self-supervised representation learning, we propose a Positive FeatureTransformation (PFT) strategy which adopts feature-level manipulation toincrease the variance of positive pairs. To validate the effectiveness of ourmethod, we conduct extensive experiments on three benchmark datasets NTU-RGB+D60, NTU-RGB+D 120 and PKU-MMD. Experimental results show that our proposedCSCLR exceeds the state-of-the-art methods on a diverse range of evaluationprotocols.", "output": "Cross-Stream Contrastive Learning for Self-Supervised Skeleton-Based Action Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Culling newly hatched male chicks in industrial hatcheries poses a seriousethical problem. Both laying and broiler breeders need males, but it is aproblem because they are produced more than needed. Being able to determine thesex of chicks in the egg at the beginning or early stage of incubation caneliminate ethical problems as well as many additional costs. When we look atthe literature, the methods used are very costly, low in applicability,invasive, inadequate in accuracy, or too late to eliminate ethical problems.Considering the embryo's development, the earliest observed candidate featurefor sex determination is blood vessels. Detection from blood vessels caneliminate ethical issues, and these vessels can be seen when light is shinedinto the egg until the first seven days. In this study, sex determination wasmade by morphological analysis from embryonic vascular images obtained in thefirst week when the light was shined into the egg using a standard camerawithout any invasive procedure to the egg.", "output": "Sex Detection in the Early Stage of Fertilized Chicken Eggs via Image Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Coral reefs are fast-changing and complex ecosystems that are crucial tomonitor and study. Biological hotspot detection can help coral reef managersprioritize limited resources for monitoring and intervention tasks. Here, weexplore the use of autonomous underwater vehicles (AUVs) with cameras, coupledwith visual detectors and photogrammetry, to map and identify these hotspots.This approach can provide high spatial resolution information in fast feedbackcycles. To the best of our knowledge, we present one of the first attempts atusing an AUV to gather visually-observed, fine-grain biological hotspot maps inconcert with topography of a coral reefs. Our hotspot maps correlate withrugosity, an established proxy metric for coral reef biodiversity andabundance, as well as with our visual inspections of the 3D reconstruction. Wealso investigate issues of scaling this approach when applied to new reefs byusing these visual detectors pre-trained on large public datasets.", "output": "Biological Hotspot Mapping in Coral Reefs with Robotic Visual Surveys."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Consumers are exposed to advertisements across many different domains on theinternet, such as fashion, beauty, car, food, and others. On the other hand,fashion represents second highest e-commerce shopping category. Does consumerdigital record behavior on various fashion ad images reveal their fashiontaste? Does ads from other domains infer their fashion taste as well? In thispaper, we study the correlation between advertisements and fashion taste.Towards this goal, we introduce a new dataset, Fashionpedia-Ads, which askssubjects to provide their preferences on both ad (fashion, beauty, car, anddessert) and fashion product (social network and e-commerce style) images.Furthermore, we exhaustively collect and annotate the emotional, visual andtextual information on the ad images from multi-perspectives (abstractivelevel, physical level, captions, and brands). We open-source Fashionpedia-Adsto enable future studies and encourage more approaches to interpretabilityresearch between advertisements and fashion taste.", "output": "Fashionpedia-Ads: Do Your Favorite Advertisements Reveal Your Fashion Taste?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose SimSC, a remarkably simple framework, to address the problem ofsemantic matching only based on the feature backbone. We discover that whenfine-tuning ImageNet pre-trained backbone on the semantic matching task, L2normalization of the feature map, a standard procedure in feature matching,produces an overly smooth matching distribution and significantly hinders thefine-tuning process. By setting an appropriate temperature to the softmax, thisover-smoothness can be alleviated and the quality of features can besubstantially improved. We employ a learning module to predict the optimaltemperature for fine-tuning feature backbones. This module is trained togetherwith the backbone and the temperature is updated online. We evaluate our methodon three public datasets and demonstrate that we can achieve accuracy on parwith state-of-the-art methods under the same backbone without using a learnedmatching head. Our method is versatile and works on various types of backbones.We show that the accuracy of our framework can be easily improved by couplingit with more powerful backbones.", "output": "SimSC: A Simple Framework for Semantic Correspondence with Temperature Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Intelligent robots require object-level scene understanding to reason aboutpossible tasks and interactions with the environment. Moreover, many perceptiontasks such as scene reconstruction, image retrieval, or place recognition canbenefit from reasoning on the level of objects. While keypoint-based matchingcan yield strong results for finding correspondences for images with small tomedium view point changes, for large view point changes, matching semanticallyon the object-level becomes advantageous. In this paper, we propose alearning-based approach which combines local keypoints with novel object-levelfeatures for matching object detections between RGB images. We train ourobject-level matching features based on appearance and inter-frame andcross-frame spatial relations between objects in an associative graph neuralnetwork. We demonstrate our approach in a large variety of views onrealistically rendered synthetic images. Our approach compares favorably toprevious state-of-the-art object-level matching approaches and achievesimproved performance over a pure keypoint-based approach for large view-pointchanges.", "output": "Learning-based Relational Object Matching Across Views."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning algorithms have the potential to improve patient outcomes indigital pathology. However, generalization of these tools is currently limitedby sensitivity to variations in tissue preparation, staining procedures andscanning equipment that lead to domain shift in digitized slides. To overcomethis limitation and improve model generalization, we studied the effectivenessof two Synthetic DOmain-Targeted Augmentation (S-DOTA) methods, namelyCycleGAN-enabled Scanner Transform (ST) and targeted Stain Vector Augmentation(SVA), and compared them against the International Color Consortium (ICC)profile-based color calibration (ICC Cal) method and a baseline method usingtraditional brightness, color and noise augmentations. We evaluated the abilityof these techniques to improve model generalization to various tasks andsettings: four models, two model types (tissue segmentation and cellclassification), two loss functions, six labs, six scanners, and threeindications (hepatocellular carcinoma (HCC), nonalcoholic steatohepatitis(NASH), prostate adenocarcinoma). We compared these methods based on themacro-averaged F1 scores on in-distribution (ID) and out-of-distribution (OOD)test sets across multiple domains, and found that S-DOTA methods (i.e., ST andSVA) led to significant improvements over ICC Cal and baseline on OOD datawhile maintaining comparable performance on ID data. Thus, we demonstrate thatS-DOTA may help address generalization due to domain shift in real worldapplications.", "output": "Synthetic DOmain-Targeted Augmentation (S-DOTA) Improves Model Generalization in Digital Pathology."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The mobile cloud gaming industry has been rapidly growing over the lastdecade. When streaming gaming videos are transmitted to customers' clientdevices from cloud servers, algorithms that can monitor distorted video qualitywithout having any reference video available are desirable tools. However,creating No-Reference Video Quality Assessment (NR VQA) models that canaccurately predict the quality of streaming gaming videos rendered by computergraphics engines is a challenging problem, since gaming content generallydiffers statistically from naturalistic videos, often lacks detail, andcontains many smooth regions. Until recently, the problem has been furthercomplicated by the lack of adequate subjective quality databases of mobilegaming content. We have created a new gaming-specific NR VQA model called theGaming Video Quality Evaluator (GAMIVAL), which combines and leverages theadvantages of spatial and temporal gaming distorted scene statistics models, aneural noise model, and deep semantic features. Using a support vectorregression (SVR) as a regressor, GAMIVAL achieves superior performance on thenew LIVE-Meta Mobile Cloud Gaming (LIVE-Meta MCG) video quality database.", "output": "GAMIVAL: Video Quality Prediction on Mobile Cloud Gaming Content."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present Shap-E, a conditional generative model for 3D assets. Unlikerecent work on 3D generative models which produce a single outputrepresentation, Shap-E directly generates the parameters of implicit functionsthat can be rendered as both textured meshes and neural radiance fields. Wetrain Shap-E in two stages: first, we train an encoder that deterministicallymaps 3D assets into the parameters of an implicit function; second, we train aconditional diffusion model on outputs of the encoder. When trained on a largedataset of paired 3D and text data, our resulting models are capable ofgenerating complex and diverse 3D assets in a matter of seconds. When comparedto Point-E, an explicit generative model over point clouds, Shap-E convergesfaster and reaches comparable or better sample quality despite modeling ahigher-dimensional, multi-representation output space. We release modelweights, inference code, and samples at ", "output": "Shap-E: Generating Conditional 3D Implicit Functions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Background. Clinical parameters measured from gated single-photon emissioncomputed tomography myocardial perfusion imaging (SPECT MPI) have value inpredicting cardiac resynchronization therapy (CRT) patient outcomes, but stillshow limitations. The purpose of this study is to combine clinical variables,features from electrocardiogram (ECG), and parameters from assessment ofcardiac function with polarmaps from gated SPECT MPI through deep learning (DL)to predict CRT response. Methods. 218 patients who underwent rest gated SPECTMPI were enrolled in this study. CRT response was defined as an increase inleft ventricular ejection fraction (LVEF) > 5% at a 6-month follow up. A DLmodel was constructed by combining a pre-trained VGG16 module and a multilayerperceptron. Two modalities of data were input to the model: polarmap imagesfrom SPECT MPI and tabular data from clinical features and ECG parameters.Gradient-weighted Class Activation Mapping (Grad-CAM) was applied to the VGG16module to provide explainability for the polarmaps. For comparison, fourmachine learning (ML) models were trained using only the tabular features.Results. Modeling was performed on 218 patients who underwent CRT implantationwith a response rate of 55.5% (n = 121). The DL model demonstrated average AUC(0.83), accuracy (0.73), sensitivity (0.76), and specificity (0.69) surpassingthe ML models and guideline criteria. Guideline recommendations presentedaccuracy (0.53), sensitivity (0.75), and specificity (0.26). Conclusions. TheDL model outperformed the ML models, showcasing the additional predictivebenefit of utilizing SPECT MPI polarmaps. Incorporating additional patient datadirectly in the form of medical imagery can improve CRT response prediction.", "output": "A new method using deep learning to predict the response to cardiac resynchronization therapy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this study, our goal is to show the impact of self-supervised pre-trainingof transformers for organ at risk (OAR) and tumor segmentation as compared tocostly fully-supervised learning. The proposed algorithm is called Monte CarloTransformer based U-Net (MC-Swin-U). Unlike many other available models, ourapproach presents uncertainty quantification with Monte Carlo dropout strategywhile generating its voxel-wise prediction. We test and validate the proposedmodel on both public and one private datasets and evaluate the gross tumorvolume (GTV) as well as nearby risky organs' boundaries. We show thatself-supervised pre-training approach improves the segmentation scoressignificantly while providing additional benefits for avoiding large-scaleannotation costs.", "output": "Self-Supervised Learning for Organs At Risk and Tumor Segmentation with Uncertainty Quantification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "AI tasks encompass a wide range of domains and fields. While numerous AImodels have been designed for specific tasks and applications, they oftenrequire considerable human efforts in finding the right model architecture,optimization algorithm, and hyperparameters. Recent advances in large languagemodels (LLMs) like ChatGPT show remarkable capabilities in various aspects ofreasoning, comprehension, and interaction. Consequently, we propose developingtask-oriented prompts and automatically utilizing LLMs to automate the trainingpipeline. To implement this concept, we present the AutoML-GPT, which employsGPT as the bridge to diverse AI models and dynamically trains models withoptimized hyperparameters. AutoML-GPT dynamically takes user requests from themodel and data cards and composes the corresponding prompt paragraph.Ultimately, with this prompt paragraph, AutoML-GPT will automatically conductthe experiments from data processing to model architecture, hyperparametertuning, and predicted training log. By leveraging {ours}'s robust languagecapabilities and the available AI models, AutoML-GPT can tackle numerousintricate AI tasks across various tasks and datasets. This approach achievesremarkable results in computer vision, natural language processing, and otherchallenging areas. Extensive experiments and ablation studies demonstrate thatour method can be general, effective, and beneficial for many AI tasks.", "output": "AutoML-GPT: Automatic Machine Learning with GPT."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Logo detection plays an integral role in many applications. However, handlingsmall logos is still difficult since they occupy too few pixels in the image,which burdens the extraction of discriminative features. The aggregation ofsmall logos also brings a great challenge to the classification andlocalization of logos. To solve these problems, we creatively proposeCross-direction Task Decoupling Network (CTDNet) for small logo detection. Wefirst introduce Cross-direction Feature Pyramid (CFP) to realizecross-direction feature fusion by adopting horizontal transmission and verticaltransmission. In addition, Multi-frequency Task Decoupling Head (MTDH)decouples the classification and localization tasks into two branches. A multifrequency attention convolution branch is designed to achieve more accurateregression by combining discrete cosine transform and convolution creatively.Comprehensive experiments on four logo datasets demonstrate the effectivenessand efficiency of the proposed method.", "output": "A Cross-direction Task Decoupling Network for Small Logo Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Residual networks have shown great success and become indispensable in recentdeep neural network models. In this work, we aim to re-investigate the trainingprocess of residual networks from a novel social psychology perspective ofloafing, and further propose a new training scheme as well as three improvedstrategies for boosting residual networks beyond their performance limits.Previous research has suggested that residual networks can be considered asensembles of shallow networks, which implies that the final performance of aresidual network is influenced by a group of subnetworks. We identify apreviously overlooked problem that is analogous to social loafing, wheresubnetworks within a residual network are prone to exert less effort whenworking as part of a group compared to working alone. We define this problem astextit{network loafing}. Similar to the decreased individual productivity andoverall performance as demonstrated in society, network loafing inevitablycauses sub-par performance. Inspired by solutions from social psychology, wefirst propose a novel training scheme called stimulative training, whichrandomly samples a residual subnetwork and calculates the KL divergence lossbetween the sampled subnetwork and the given residual network for extrasupervision. In order to unleash the potential of stimulative training, wefurther propose three simple-yet-effective strategies, including a novel KL-loss that only aligns the network logits direction, random smaller inputs forsubnetworks, and inter-stage sampling rules. Comprehensive experiments andanalysis verify the effectiveness of stimulative training as well as its threeimproved strategies.", "output": "Stimulative Training++: Go Beyond The Performance Limits of Residual Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Magnetic resonance imaging (MRI) is known to have reduced signal-to-noiseratios (SNR) at lower field strengths, leading to signal degradation whenproducing a low-field MRI image from a high-field one. Therefore,reconstructing a high-field-like image from a low-field MRI is a complexproblem due to the ill-posed nature of the task. Additionally, obtaining pairedlow-field and high-field MR images is often not practical. We theoreticallyuncovered that the combination of these challenges renders conventional deeplearning methods that directly learn the mapping from a low-field MR image to ahigh-field MR image unsuitable. To overcome these challenges, we introduce anovel meta-learning approach that employs a teacher-student mechanism. Firstly,an optimal-transport-driven teacher learns the degradation process fromhigh-field to low-field MR images and generates pseudo-paired high-field andlow-field MRI images. Then, a score-based student solves the inverse problem ofreconstructing a high-field-like MR image from a low-field MRI within theframework of iterative regularization, by learning the joint distribution ofpseudo-paired images to act as a regularizer. Experimental results on reallow-field MRI data demonstrate that our proposed method outperformsstate-of-the-art unpaired learning methods.", "output": "Meta-Learning Enabled Score-Based Generative Model for 1.5T-Like Image Reconstruction from 0.5T MRI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Building benchmarks to systemically analyze different capabilities of videoquestion answering (VideoQA) models is challenging yet crucial. Existingbenchmarks often use non-compositional simple questions and suffer fromlanguage biases, making it difficult to diagnose model weaknesses incisively. Arecent benchmark AGQA poses a promising paradigm to generate QA pairsautomatically from pre-annotated scene graphs, enabling it to measure diversereasoning abilities with granular control. However, its questions havelimitations in reasoning about the fine-grained semantics in videos as suchinformation is absent in its scene graphs. To this end, we present ANetQA, alarge-scale benchmark that supports fine-grained compositional reasoning overthe challenging untrimmed videos from ActivityNet. Similar to AGQA, the QApairs in ANetQA are automatically generated from annotated video scene graphs.The fine-grained properties of ANetQA are reflected in the following: (i)untrimmed videos with fine-grained semantics; (ii) spatio-temporal scene graphswith fine-grained taxonomies; and (iii) diverse questions generated fromfine-grained templates. ANetQA attains 1.4 billion unbalanced and 13.4 millionbalanced QA pairs, which is an order of magnitude larger than AGQA with asimilar number of videos. Comprehensive experiments are performed forstate-of-the-art methods. The best model achieves 44.5% accuracy while humanperformance tops out at 84.5%, leaving sufficient room for improvement.", "output": "ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "3D scene flow estimation aims to estimate point-wise motions between twoconsecutive frames of point clouds. Superpoints, i.e., points with similargeometric features, are usually employed to capture similar motions of localregions in 3D scenes for scene flow estimation. However, in existing methods,superpoints are generated with the offline clustering methods, which cannotcharacterize local regions with similar motions for complex 3D scenes well,leading to inaccurate scene flow estimation. To this end, we propose aniterative end-to-end superpoint based scene flow estimation framework, wherethe superpoints can be dynamically updated to guide the point-level flowprediction. Specifically, our framework consists of a flow guided superpointgeneration module and a superpoint guided flow refinement module. In oursuperpoint generation module, we utilize the bidirectional flow information atthe previous iteration to obtain the matching points of points and superpointcenters for soft point-to-superpoint association construction, in which thesuperpoints are generated for pairwise point clouds. With the generatedsuperpoints, we first reconstruct the flow for each point by adaptivelyaggregating the superpoint-level flow, and then encode the consistency betweenthe reconstructed flow of pairwise point clouds. Finally, we feed theconsistency encoding along with the reconstructed flow into GRU to refinepoint-level flow. Extensive experiments on several different datasets show thatour method can achieve promising performance.", "output": "Self-Supervised 3D Scene Flow Estimation Guided by Superpoints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Coronary CT angiography (CCTA) scans are widely used for diagnosis ofcoronary artery diseases. An accurate and automatic vessel labeling algorithmfor CCTA analysis can significantly improve the diagnostic efficiency andreduce the clinicians'manual efforts. In this paper, we propose a simple vessellabeling method based on the Point Transformer, which only needs the coronaryartery segmentation. Specifically, firstly, the coronary segmentation istransformed to point cloud. Then, these points are fed into the hierarchicaltransformer blocks to obtain the multi-level features, including local andglobal features. Finally, the network output the semantic classification pointsand map them to centerline labeling. This method is only based on the structureof coronary segmentation and need not other features, so it is easy togeneralize to other vessel labeling tasks, e.g., head and neck vessel labeling.To evaluate the performance of our proposed method, CCTA scans of 53 subjectsare collected in our experiment. The experimental results demonstrate theefficacy of this approach.", "output": "Point Transformer For Coronary Artery Labeling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Predicting human scanpaths when exploring panoramic videos is a challengingtask due to the spherical geometry and the multimodality of the input, and theinherent uncertainty and diversity of the output. Most previous methods fail togive a complete treatment of these characteristics, and thus are prone toerrors. In this paper, we present a simple new criterion for scanpathprediction based on principles from lossy data compression. This criterionsuggests minimizing the expected code length of quantized scanpaths in atraining set, which corresponds to fitting a discrete conditional probabilitymodel via maximum likelihood. Specifically, the probability model isconditioned on two modalities: a viewport sequence as the deformation-reducedvisual input and a set of relative historical scanpaths projected ontorespective viewports as the aligned path input. The probability model isparameterized by a product of discretized Gaussian mixture models to capturethe uncertainty and the diversity of scanpaths from different users. Mostimportantly, the training of the probability model does not rely on thespecification of ``ground-truth'' scanpaths for imitation learning. We alsointroduce a proportional-integral-derivative (PID) controller-based sampler togenerate realistic human-like scanpaths from the learned probability model.Experimental results demonstrate that our method consistently produces betterquantitative scanpath results in terms of prediction accuracy (by comparing tothe assumed ``ground-truths'') and perceptual realism (through machinediscrimination) over a wide range of prediction horizons. We additionallyverify the perceptual realism improvement via a formal psychophysicalexperiment and the generalization improvement on several unseen panoramic videodatasets.", "output": "Scanpath Prediction in Panoramic Videos via Expected Code Length Minimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The popular VQ-VAE models reconstruct images through learning a discretecodebook but suffer from a significant issue in the rapid quality degradationof image reconstruction as the compression rate rises. One major reason is thata higher compression rate induces more loss of visual signals on the higherfrequency spectrum which reflect the details on pixel space. In this paper, aFrequency Complement Module (FCM) architecture is proposed to capture themissing frequency information for enhancing reconstruction quality. The FCM canbe easily incorporated into the VQ-VAE structure, and we refer to the new modelas Frequency Augmented VAE (FA-VAE). In addition, a Dynamic Spectrum Loss (DSL)is introduced to guide the FCMs to balance between various frequenciesdynamically for optimal reconstruction. FA-VAE is further extended to thetext-to-image synthesis task, and a Cross-attention Autoregressive Transformer(CAT) is proposed to obtain more precise semantic attributes in texts.Extensive reconstruction experiments with different compression rates areconducted on several benchmark datasets, and the results demonstrate that theproposed FA-VAE is able to restore more faithfully the details compared to SOTAmethods. CAT also shows improved generation quality with better image-textsemantic alignment.", "output": "Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent advent of self-supervised pre-training techniques has led to asurge in the use of multimodal learning in form document understanding.However, existing approaches that extend the mask language modeling to othermodalities require careful multi-task tuning, complex reconstruction targetdesigns, or additional pre-training data. In FormNetV2, we introduce acentralized multimodal graph contrastive learning strategy to unifyself-supervised pre-training for all modalities in one loss. The graphcontrastive objective maximizes the agreement of multimodal representations,providing a natural interplay for all modalities without special customization.In addition, we extract image features within the bounding box that joins apair of tokens connected by a graph edge, capturing more targeted visual cueswithout loading a sophisticated and separately pre-trained image embedder.FormNetV2 establishes new state-of-the-art performance on FUNSD, CORD, SROIEand Payment benchmarks with a more compact model size.", "output": "FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automatic layout generation that can synthesize high-quality layouts is animportant tool for graphic design in many applications. Though existing methodsbased on generative models such as Generative Adversarial Networks (GANs) andVariational Auto-Encoders (VAEs) have progressed, they still leave much roomfor improving the quality and diversity of the results. Inspired by the recentsuccess of diffusion models in generating high-quality images, this paperexplores their potential for conditional layout generation and proposesTransformer-based Layout Diffusion Model (LayoutDM) by instantiating theconditional denoising diffusion probabilistic model (DDPM) with a purelytransformer-based architecture. Instead of using convolutional neural networks,a transformer-based conditional Layout Denoiser is proposed to learn thereverse diffusion process to generate samples from noised layout data.Benefitting from both transformer and DDPM, our LayoutDM is of desiredproperties such as high-quality generation, strong sample diversity, faithfuldistribution coverage, and stationary training in comparison to GANs and VAEs.Quantitative and qualitative experimental results show that our methodoutperforms state-of-the-art generative models in terms of quality anddiversity.", "output": "LayoutDM: Transformer-based Diffusion Model for Layout Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "AI-enhanced segmentation of neuronal boundaries in electron microscopy (EM)images is crucial for automatic and accurate neuroinformatics studies. Toenhance the limited generalization ability of typical deep learning frameworksfor medical image analysis, unsupervised domain adaptation (UDA) methods havebeen applied. In this work, we propose to improve the performance of UDAmethods on cross-domain neuron membrane segmentation in EM images. First, wedesigned a feature weight module considering the structural features duringadaptation. Second, we introduced a structural feature-based super-resolutionapproach to alleviating the domain gap by adjusting the cross-domain imageresolutions. Third, we proposed an orthogonal decomposition module tofacilitate the extraction of domain-invariant features. Extensive experimentson two domain adaptive membrane segmentation applications have indicated theeffectiveness of our method.", "output": "Unsupervised Domain Adaptation for Neuron Membrane Segmentation based on Structural Features."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, emotional talking face generation has received considerableattention. However, existing methods only adopt one-hot coding, image, or audioas emotion conditions, thus lacking flexible control in practical applicationsand failing to handle unseen emotion styles due to limited semantics. Theyeither ignore the one-shot setting or the quality of generated faces. In thispaper, we propose a more flexible and generalized framework. Specifically, wesupplement the emotion style in text prompts and use an Aligned Multi-modalEmotion encoder to embed the text, image, and audio emotion modality into aunified space, which inherits rich semantic prior from CLIP. Consequently,effective multi-modal emotion space learning helps our method support arbitraryemotion modality during testing and could generalize to unseen emotion styles.Besides, an Emotion-aware Audio-to-3DMM Convertor is proposed to connect theemotion condition and the audio sequence to structural representation. Afollowed style-based High-fidelity Emotional Face generator is designed togenerate arbitrary high-resolution realistic identities. Our texture generatorhierarchically learns flow fields and animated faces in a residual manner.Extensive experiments demonstrate the flexibility and generalization of ourmethod in emotion control and the effectiveness of high-quality face synthesis.", "output": "High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text reading order is a crucial aspect in the output of an OCR engine, with alarge impact on downstream tasks. Its difficulty lies in the large variation ofdomain specific layout structures, and is further exacerbated by real-worldimage degradations such as perspective distortions. We propose a lightweight,scalable and generalizable approach to identify text reading order with amulti-modal, multi-task graph convolutional network (GCN) running on a sparselayout based graph. Predictions from the model provide hints of bidimensionalrelations among text lines and layout region structures, upon which apost-processing cluster-and-sort algorithm generates an ordered sequence of allthe text lines. The model is language-agnostic and runs effectively acrossmulti-language datasets that contain various types of images taken inuncontrolled conditions, and it is small enough to be deployed on virtually anyplatform including mobile devices.", "output": "Text Reading Order in Uncontrolled Conditions by Sparse Graph Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Image coding for machines (ICM) aims to compress images to support downstreamAI analysis instead of human perception. For ICM, developing a unified codec toreduce information redundancy while empowering the compressed features tosupport various vision tasks is very important, which inevitably faces two corechallenges: 1) How should the compression strategy be adjusted based on thedownstream tasks? 2) How to well adapt the compressed features to differentdownstream tasks? Inspired by recent advances in transferring large-scalepre-trained models to downstream tasks via prompting, in this work, we explorea new ICM framework, termed Prompt-ICM. To address both challenges by carefullylearning task-driven prompts to coordinate well the compression process anddownstream analysis. Specifically, our method is composed of two core designs:a) compression prompts, which are implemented as importance maps predicted byan information selector, and used to achieve different content-weighted bitallocations during compression according to different downstream tasks; b)task-adaptive prompts, which are instantiated as a few learnable parametersspecifically for tuning compressed features for the specific intelligent task.Extensive experiments demonstrate that with a single feature codec and a fewextra parameters, our proposed framework could efficiently support differentkinds of intelligent tasks with much higher coding efficiency.", "output": "Prompt-ICM: A Unified Framework towards Image Coding for Machines with Task-driven Prompts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Image compression techniques typically focus on compressing rectangularimages for human consumption, however, resulting in transmitting redundantcontent for downstream applications. To overcome this limitation, some previousworks propose to semantically structure the bitstream, which can meet specificapplication requirements by selective transmission and reconstruction.Nevertheless, they divide the input image into multiple rectangular regionsaccording to semantics and ignore avoiding information interaction among them,causing waste of bitrate and distorted reconstruction of region boundaries. Inthis paper, we propose to decouple an image into multiple groups with irregularshapes based on a customized group mask and compress them independently. Ourgroup mask describes the image at a finer granularity, enabling significantbitrate saving by reducing the transmission of redundant content. Moreover, toensure the fidelity of selective reconstruction, this paper proposes theconcept of group-independent transform that maintain the independence amongdistinct groups. And we instantiate it by the proposed Group-IndependentSwin-Block (GI Swin-Block). Experimental results demonstrate that our frameworkstructures the bitstream with negligible cost, and exhibits superiorperformance on both visual quality and intelligent task supporting.", "output": "Semantically Structured Image Compression via Irregular Group-Based Decoupling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR)have led to models with remarkable performance on both modern and historicalmanuscripts in large benchmark datasets. Nonetheless, those models struggle toobtain the same performance when applied to manuscripts with peculiarcharacteristics, such as language, paper support, ink, and author handwriting.This issue is very relevant for valuable but small collections of documentspreserved in historical archives, for which obtaining sufficient annotatedtraining data is costly or, in some cases, unfeasible. To overcome thischallenge, a possible solution is to pretrain HTR models on large datasets andthen fine-tune them on small single-author collections. In this paper, we takeinto account large, real benchmark datasets and synthetic ones obtained with astyled Handwritten Text Generation model. Through extensive experimentalanalysis, also considering the amount of fine-tuning lines, we give aquantitative indication of the most relevant characteristics of such data forobtaining an HTR model able to effectively transcribe manuscripts in smallcollections with as little as five real fine-tuning lines.", "output": "How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multimodal-driven talking face generation refers to animating a portrait withthe given pose, expression, and gaze transferred from the driving image andvideo, or estimated from the text and audio. However, existing methods ignorethe potential of text modal, and their generators mainly follow thesource-oriented feature rearrange paradigm coupled with unstable GANframeworks. In this work, we first represent the emotion in the text prompt,which could inherit rich semantics from the CLIP, allowing flexible andgeneralized emotion control. We further reorganize these tasks as thetarget-oriented texture transfer and adopt the Diffusion Models. Morespecifically, given a textured face as the source and the rendered faceprojected from the desired 3DMM coefficients as the target, our proposedTexture-Geometry-aware Diffusion Model decomposes the complex transfer probleminto multi-conditional denoising process, where a Texture Attention-basedmodule accurately models the correspondences between appearance and geometrycues contained in source and target conditions, and incorporate extra implicitinformation for high-fidelity talking face generation. Additionally, TGDM canbe gracefully tailored for face swapping. We derive a novel paradigm free ofunstable seesaw-style optimization, resulting in simple, stable, and effectivetraining and inference schemes. Extensive experiments demonstrate thesuperiority of our method.", "output": "Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Image retrieval plays an important role in the Internet world. Usually, thecore parts of mainstream visual retrieval systems include an online service ofthe embedding model and a large-scale vector database. For traditional modelupgrades, the old model will not be replaced by the new one until theembeddings of all the images in the database are re-computed by the new model,which takes days or weeks for a large amount of data. Recently,backward-compatible training (BCT) enables the new model to be immediatelydeployed online by making the new embeddings directly comparable to the oldones. For BCT, improving the compatibility of two models with less negativeimpact on retrieval performance is the key challenge. In this paper, weintroduce AdvBCT, an Adversarial Backward-Compatible Training method with anelastic boundary constraint that takes both compatibility and discriminationinto consideration. We first employ adversarial learning to minimize thedistribution disparity between embeddings of the new model and the old model.Meanwhile, we add an elastic boundary constraint during training to improvecompatibility and discrimination efficiently. Extensive experiments on GLDv2,Revisited Oxford (ROxford), and Revisited Paris (RParis) demonstrate that ourmethod outperforms other BCT methods on both compatibility and discrimination.The implementation of AdvBCT will be publicly available at", "output": "Boundary-aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural radiance fields (NeRF) based methods have shown amazing performance insynthesizing 3D-consistent photographic images, but fail to generate multi-viewportrait drawings. The key is that the basic assumption of these methods -- asurface point is consistent when rendered from different views -- doesn't holdfor drawings. In a portrait drawing, the appearance of a facial point maychanges when viewed from different angles. Besides, portrait drawings usuallypresent little 3D information and suffer from insufficient training data. Tocombat this challenge, in this paper, we propose a Semantic-Aware GEnerator(SAGE) for synthesizing multi-view portrait drawings. Our motivation is thatfacial semantic labels are view-consistent and correlate with drawingtechniques. We therefore propose to collaboratively synthesize multi-viewsemantic maps and the corresponding portrait drawings. To facilitate training,we design a semantic-aware domain translator, which generates portrait drawingsbased on features of photographic faces. In addition, use data augmentation viasynthesis to mitigate collapsed results. We apply SAGE to synthesize multi-viewportrait drawings in diverse artistic styles. Experimental results show thatSAGE achieves significantly superior or highly competitive performance,compared to existing 3D-aware image synthesis methods. The codes are availableat ", "output": "Semantic-aware Generation of Multi-view Portrait Drawings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present the UrbanBIS benchmark for large-scale 3D urban understanding,supporting practical urban-level semantic and building-level instancesegmentation. UrbanBIS comprises six real urban scenes, with 2.5 billionpoints, covering a vast area of 10.78 square kilometers and 3,370 buildings,captured by 113,346 views of aerial photogrammetry. Particularly, UrbanBISprovides not only semantic-level annotations on a rich set of urban objects,including buildings, vehicles, vegetation, roads, and bridges, but alsoinstance-level annotations on the buildings. Further, UrbanBIS is the first 3Ddataset that introduces fine-grained building sub-categories, considering awide variety of shapes for different building types. Besides, we propose B-Seg,a building instance segmentation method to establish UrbanBIS. B-Seg adopts anend-to-end framework with a simple yet effective strategy for handlinglarge-scale point clouds. Compared with mainstream methods, B-Seg achievesbetter accuracy with faster inference speed on UrbanBIS. In addition to thecarefully-annotated point clouds, UrbanBIS provides high-resolutionaerial-acquisition photos and high-quality large-scale 3D reconstructionmodels, which shall facilitate a wide range of studies such as multi-viewstereo, urban LOD generation, aerial path planning, autonomous navigation, roadnetwork extraction, and so on, thus serving as an important platform for manyintelligent city applications.", "output": "UrbanBIS: a Large-scale Benchmark for Fine-grained Urban Building Instance Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neuroimage processing tasks like segmentation, reconstruction, andregistration are central to the study of neuroscience. Robust deep learningstrategies and architectures used to solve these tasks are often similar. Yet,when presented with a new task or a dataset with different visualcharacteristics, practitioners most often need to train a new model, orfine-tune an existing one. This is a time-consuming process that poses asubstantial barrier for the thousands of neuroscientists and clinicalresearchers who often lack the resources or machine-learning expertise to traindeep learning models. In practice, this leads to a lack of adoption of deeplearning, and neuroscience tools being dominated by classical frameworks.We introduce Neuralizer, a single model that generalizes to previously unseenneuroimaging tasks and modalities without the need for re-training orfine-tuning. Tasks do not have to be known a priori, and generalization happensin a single forward pass during inference. The model can solve processing tasksacross multiple image modalities, acquisition methods, and datasets, andgeneralize to tasks and modalities it has not been trained on. Our experimentson coronal slices show that when few annotated subjects are available, ourmulti-task network outperforms task-specific baselines without training on thetask.", "output": "Neuralizer: General Neuroimage Analysis without Re-Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video depth estimation is crucial in various applications, such as scenereconstruction and augmented reality. In contrast to the naive method ofestimating depths from images, a more sophisticated approach uses temporalinformation, thereby eliminating flickering and geometrical inconsistencies. Wepropose a consistent method for dense video depth estimation; however, unlikethe existing monocular methods, ours relates to stereo videos. This techniqueovercomes the limitations arising from the monocular input. As a benefit ofusing stereo inputs, a left-right consistency loss is introduced to improve theperformance. Besides, we use SLAM-based camera pose estimation in the process.To address the problem of depth blurriness during test-time training (TTT), wepresent an edge-preserving loss function that improves the visibility of finedetails while preserving geometrical consistency. We show that our edge-awarestereo video model can accurately estimate the dense depth maps.", "output": "Edge-aware Consistent Stereo Video Depth Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This article introduces Point2Tree, a novel framework that incorporates athree-stage process involving semantic segmentation, instance segmentation,optimization analysis of hyperparemeters importance. It introduces acomprehensive and modular approach to processing laser points clouds inForestry. We tested it on two independent datasets. The first area was locatedin an actively managed boreal coniferous dominated forest in V{aa}ler, Norway,16 circular plots of 400 square meters were selected to cover a range of forestconditions in terms of species composition and stand density. We trained amodel based on Pointnet++ architecture which achieves 0.92 F1-score in semanticsegmentation. As a second step in our pipeline we used graph-based approach forinstance segmentation which reached F1-score approx. 0.6. The optimizationallowed to further boost the performance of the pipeline by approx. 4 %points.", "output": "Point2Tree(P2T) -- framework for parameter tuning of semantic and instance segmentation used with mobile laser scanning data in coniferous forest."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video super-resolution (VSR) techniques, especially deep-learning-basedalgorithms, have drastically improved over the last few years and shownimpressive performance on synthetic data. However, their performance onreal-world video data suffers because of the complexity of real-worlddegradations and misaligned video frames. Since obtaining a synthetic datasetconsisting of low-resolution (LR) and high-resolution (HR) frames are easierthan obtaining real-world LR and HR images, in this paper, we proposesynthesizing real-world degradations on synthetic training datasets. Theproposed synthetic real-world degradations (SRWD) include a combination of theblur, noise, downsampling, pixel binning, and image and video compressionartifacts. We then propose using a random shuffling-based strategy to simulatethese degradations on the training datasets and train a single end-to-end deepneural network (DNN) on the proposed larger variation of realistic synthesizedtraining data. Our quantitative and qualitative comparative analysis shows thatthe proposed training strategy using diverse realistic degradations improvesthe performance by 7.1 % in terms of NRQM compared to RealBasicVSR and by 3.34% compared to BSRGAN on the VideoLQ dataset. We also introduce a new datasetthat contains high-resolution real-world videos that can serve as a commonground for bench-marking.", "output": "Expanding Synthetic Real-World Degradations for Blind Video Super Resolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although data augmentation is a powerful technique for improving theperformance of image classification tasks, it is difficult to identify the bestaugmentation policy. The optimal augmentation policy, which is the latentvariable, cannot be directly observed. To address this problem, this studyproposes $textit{LatentAugment}$, which estimates the latent probability ofoptimal augmentation. The proposed method is appealing in that it candynamically optimize the augmentation strategies for each input and modelparameter in learning iterations. Theoretical analysis shows that LatentAugmentis a general model that includes other augmentation methods as special cases,and it is simple and computationally efficient in comparison with existingaugmentation methods. Experimental results show that the proposed LatentAugmenthas higher test accuracy than previous augmentation methods on the CIFAR-10,CIFAR-100, SVHN, and ImageNet datasets.", "output": "LatentAugment: Dynamically Optimized Latent Probabilities of Data Augmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Humans have the natural ability to recognize actions even if the objectsinvolved in the action or the background are changed. Humans can abstract awaythe action from the appearance of the objects and their context which isreferred to as compositionality of actions. Compositional action recognitiondeals with imparting human-like compositional generalization abilities toaction-recognition models. In this regard, extracting the interactions betweenhumans and objects forms the basis of compositional understanding. Theseinteractions are not affected by the appearance biases of the objects or thecontext. But the context provides additional cues about the interactionsbetween things and stuff. Hence we need to infuse context into the human-objectinteractions for compositional action recognition. To this end, we first designa spatial-temporal interaction encoder that captures the human-object (things)interactions. The encoder learns the spatio-temporal interaction tokensdisentangled from the background context. The interaction tokens are theninfused with contextual information from the video tokens to model theinteractions between things and stuff. The final context-infusedspatio-temporal interaction tokens are used for compositional actionrecognition. We show the effectiveness of our interaction-centric approach onthe compositional Something-Else dataset where we obtain a new state-of-the-artresult of 83.8% top-1 accuracy outperforming recent important object-centricmethods by a significant margin. Our approach of explicit human-object-stuffinteraction modeling is effective even for standard action recognition datasetssuch as Something-Something-V2 and Epic-Kitchens-100 where we obtain comparableor better performance than state-of-the-art.", "output": "Modelling Spatio-Temporal Interactions for Compositional Action Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Controllable image captioning is an emerging multimodal topic that aims todescribe the image with natural language following human purpose,$textit{e.g.}$, looking at the specified regions or telling in a particulartext style. State-of-the-art methods are trained on annotated pairs of inputcontrols and output captions. However, the scarcity of such well-annotatedmultimodal data largely limits their usability and scalability for interactiveAI systems. Leveraging unimodal instruction-following foundation models is apromising alternative that benefits from broader sources of data. In thispaper, we present Caption AnyThing (CAT), a foundation model augmented imagecaptioning framework supporting a wide range of multimodel controls: 1) visualcontrols, including points, boxes, and trajectories; 2) language controls, suchas sentiment, length, language, and factuality. Powered by Segment AnythingModel (SAM) and ChatGPT, we unify the visual and language prompts into amodularized framework, enabling the flexible combination between differentcontrols. Extensive case studies demonstrate the user intention alignmentcapabilities of our framework, shedding light on effective user interactionmodeling in vision-language applications. Our code is publicly available at", "output": "Caption Anything: Interactive Image Description with Diverse Multimodal Controls."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In semi-supervised domain adaptation (SSDA), a few labeled target samples ofeach class help the model to transfer knowledge representation from the fullylabeled source domain to the target domain. Many existing methods ignore thebenefits of making full use of the labeled target samples from multi-level. Tomake better use of this additional data, we propose a novel Prototype-basedMulti-level Learning (ProML) framework to better tap the potential of labeledtarget samples. To achieve intra-domain adaptation, we first introduce apseudo-label aggregation based on the intra-domain optimal transport to helpthe model align the feature distribution of unlabeled target samples and theprototype. At the inter-domain level, we propose a cross-domain alignment lossto help the model use the target prototype for cross-domain knowledge transfer.We further propose a dual consistency based on prototype similarity and linearclassifier to promote discriminative learning of compact target featurerepresentation at the batch level. Extensive experiments on three datasets,including DomainNet, VisDA2017, and Office-Home demonstrate that our proposedmethod achieves state-of-the-art performance in SSDA.", "output": "Semi-supervised Domain Adaptation via Prototype-based Multi-level Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transforming a design into a high-quality product is a challenge in metaladditive manufacturing due to rare events which can cause defects to form.Detecting these events in-situ could, however, reduce inspection costs, enablecorrective action, and is the first step towards a future of tailored materialproperties. In this study a model is trained on laser input information topredict nominal laser melting conditions. An anomaly score is then calculatedby taking the difference between the predictions and new observations. Themodel is evaluated on a dataset with known defects achieving an F1 score of0.821. This study shows that anomaly detection methods are an important tool indeveloping robust defect detection methods.", "output": "In-situ Anomaly Detection in Additive Manufacturing with Graph Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The purpose of this study is to develop a computer-aided diagnosis system forclassifying benign and malignant lung lesions, and to assist physicians inreal-time analysis of radial probe endobronchial ultrasound (EBUS) videos.During the biopsy process of lung cancer, physicians use real-time ultrasoundimages to find suitable lesion locations for sampling. However, most of theseimages are difficult to classify and contain a lot of noise. Previous studieshave employed 2D convolutional neural networks to effectively differentiatebetween benign and malignant lung lesions, but doctors still need to manuallyselect good-quality images, which can result in additional labor costs. Inaddition, the 2D neural network has no ability to capture the temporalinformation of the ultrasound video, so it is difficult to obtain therelationship between the features of the continuous images. This study designsan automatic diagnosis system based on a 3D neural network, uses the SlowFastarchitecture as the backbone to fuse temporal and spatial features, and usesthe SwAV method of contrastive learning to enhance the noise robustness of themodel. The method we propose includes the following advantages, such as (1)using clinical ultrasound films as model input, thereby reducing the need forhigh-quality image selection by physicians, (2) high-accuracy classification ofbenign and malignant lung lesions can assist doctors in clinical diagnosis andreduce the time and risk of surgery, and (3) the capability to classify welleven in the presence of significant image noise. The AUC, accuracy, precision,recall and specificity of our proposed method on the validation set reached0.87, 83.87%, 86.96%, 90.91% and 66.67%, respectively. The results haveverified the importance of incorporating temporal information and theeffectiveness of using the method of contrastive learning on featureextraction.", "output": "Using Spatio-Temporal Dual-Stream Network with Self-Supervised Learning for Lung Tumor Classification on Radial Probe Endobronchial Ultrasound Video."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Knowledge distillation is an effective paradigm for boosting the performanceof pocket-size model, especially when multiple teacher models are available,the student would break the upper limit again. However, it is not economical totrain diverse teacher models for the disposable distillation. In this paper, weintroduce a new concept dubbed Avatars for distillation, which are theinference ensemble models derived from the teacher. Concretely, (1) For eachiteration of distillation training, various Avatars are generated by aperturbation transformation. We validate that Avatars own higher upper limit ofworking capacity and teaching ability, aiding the student model in learningdiverse and receptive knowledge perspectives from the teacher model. (2) Duringthe distillation, we propose an uncertainty-aware factor from the variance ofstatistical differences between the vanilla teacher and Avatars, to adjustAvatars' contribution on knowledge transfer adaptively. Avatar KnowledgeDistillation AKD is fundamentally different from existing methods and refineswith the innovative view of unequal training. Comprehensive experimentsdemonstrate the effectiveness of our Avatars mechanism, which polishes up thestate-of-the-art distillation methods for dense prediction without more extracomputational cost. The AKD brings at most 0.7 AP gains on COCO 2017 for ObjectDetection and 1.83 mIoU gains on Cityscapes for Semantic Segmentation,respectively.", "output": "Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most micro- and macro-expression spotting methods in untrimmed videos sufferfrom the burden of video-wise collection and frame-wise annotation.Weakly-supervised expression spotting (WES) based on video-level labels canpotentially mitigate the complexity of frame-level annotation while achievingfine-grained frame-level spotting. However, we argue that existingweakly-supervised methods are based on multiple instance learning (MIL)involving inter-modality, inter-sample, and inter-task gaps. The inter-samplegap is primarily from the sample distribution and duration. Therefore, wepropose a novel and simple WES framework, MC-WES, using multi-consistencycollaborative mechanisms that include modal-level saliency, video-leveldistribution, label-level duration and segment-level feature consistencystrategies to implement fine frame-level spotting with only video-level labelsto alleviate the above gaps and merge prior knowledge. The modal-level saliencyconsistency strategy focuses on capturing key correlations between raw imagesand optical flow. The video-level distribution consistency strategy utilizesthe difference of sparsity in temporal distribution. The label-level durationconsistency strategy exploits the difference in the duration of facial muscles.The segment-level feature consistency strategy emphasizes that features underthe same labels maintain similarity. Experimental results on two challengingdatasets -- CAS(ME)$^2$ and SAMM-LV -- demonstrate that MC-WES is comparable tostate-of-the-art fully-supervised methods.", "output": "Weakly-supervised Micro- and Macro-expression Spotting Based on Multi-level Consistency."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "D semantic scene graphs are a powerful holistic representation as theydescribe the individual objects and depict the relation between them. They arecompact high-level graphs that enable many tasks requiring scene reasoning. Inreal-world settings, existing 3D estimation methods produce robust predictionsthat mostly rely on dense inputs. In this work, we propose a real-timeframework that incrementally builds a consistent 3D semantic scene graph of ascene given an RGB image sequence. Our method consists of a novel incrementalentity estimation pipeline and a scene graph prediction network. The proposedpipeline simultaneously reconstructs a sparse point map and fuses entityestimation from the input images. The proposed network estimates 3D semanticscene graphs with iterative message passing using multi-view and geometricfeatures extracted from the scene entities. Extensive experiments on the 3RScandataset show the effectiveness of the proposed method in this challenging task,outperforming state-of-the-art approaches.", "output": "Incremental 3D Semantic Scene Graph Prediction from RGB Sequences."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we study face verification in datasets where images of the sameindividuals exhibit significant age differences. This poses a major challengefor current face recognition and verification techniques. To address thisissue, we propose a novel approach that utilizes multitask learning and aWasserstein distance discriminator to disentangle age and identity embeddingsof facial images. Our approach employs multitask learning with a Wassersteindistance discriminator that minimizes the mutual information between the ageand identity embeddings by minimizing the Jensen-Shannon divergence. Thisimproves the encoding of age and identity information in face images andenhances the performance of face verification in age-variant datasets. Weevaluate the effectiveness of our approach using multiple age-variant facedatasets and demonstrate its superiority over state-of-the-art methods in termsof face verification accuracy.", "output": "Age-Invariant Face Embedding using the Wasserstein Distance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "NeRF acquisition typically requires careful choice of near planes for thedifferent cameras or suffers from background collapse, creating floatingartifacts on the edges of the captured scene. The key insight of this work isthat background collapse is caused by a higher density of samples in regionsnear cameras. As a result of this sampling bias, near-camera volumes receivesignificantly more gradients, leading to incorrect density buildup. We proposea gradient scaling approach to counter-balance this bias, removing the need fornear planes, while preventing background collapse. Our method can beimplemented in a few lines, does not induce any significant overhead, and iscompatible with most NeRF implementations.", "output": "Radiance Field Gradient Scaling for Unbiased Near-Camera Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, many convolutional neural network-based models are designedfor JPEG artifacts reduction, and have achieved notable progress. However, fewmethods are suitable for extreme low-bitrate image compression artifactsreduction. The main challenge is that the highly compressed image loses toomuch information, resulting in reconstructing high-quality image difficultly.To address this issue, we propose a multimodal fusion learning method fortext-guided JPEG artifacts reduction, in which the corresponding textdescription not only provides the potential prior information of the highlycompressed image, but also serves as supplementary information to assist inimage deblocking. We fuse image features and text semantic features from theglobal and local perspectives respectively, and design a contrastive loss builtupon contrastive learning to produce visually pleasing results. Extensiveexperiments, including a user study, prove that our method can obtain betterdeblocking results compared to the state-of-the-art methods.", "output": "Multi-Modality Deep Network for JPEG Artifacts Reduction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Table detection is the task of classifying and localizing table objectswithin document images. With the recent development in deep learning methods,we observe remarkable success in table detection. However, a significant amountof labeled data is required to train these models effectively. Manysemi-supervised approaches are introduced to mitigate the need for asubstantial amount of label data. These approaches use CNN-based detectors thatrely on anchor proposals and post-processing stages such as NMS. To tacklethese limitations, this paper presents a novel end-to-end semi-supervised tabledetection method that employs the deformable transformer for detecting tableobjects. We evaluate our semi-supervised method on PubLayNet, DocBank, ICADR-19and TableBank datasets, and it achieves superior performance compared toprevious methods. It outperforms the fully supervised method (Deformabletransformer) by +3.4 points on 10% labels of TableBank-both dataset and theprevious CNN-based semi-supervised approach (Soft Teacher) by +1.8 points on10% labels of PubLayNet dataset. We hope this work opens new possibilitiestowards semi-supervised and unsupervised table detection methods.", "output": "Towards End-to-End Semi-Supervised Table Detection with Deformable Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-modal Magnetic Resonance Imaging (MRI) plays an important role inclinical medicine. However, the acquisitions of some modalities, such as theT2-weighted modality, need a long time and they are always accompanied bymotion artifacts. On the other hand, the T1-weighted image (T1WI) shares thesame underlying information with T2-weighted image (T2WI), which needs ashorter scanning time. Therefore, in this paper we accelerate the acquisitionof the T2WI by introducing the auxiliary modality (T1WI). Concretely, we firstreconstruct high-quality T2WIs with under-sampled T2WIs. Here, we realize fastT2WI reconstruction by reducing the sampling rate in the k-space. Second, weestablish a cross-modal synthesis task to generate the synthetic T2WIs forguiding better T2WI reconstruction. Here, we obtain the synthetic T2WIs bydecomposing the whole cross-modal generation mapping into two OT processes, thespatial alignment mapping on the T1 image manifold and the cross-modalsynthesis mapping from aligned T1WIs to T2WIs. It overcomes the negativetransfer caused by the spatial misalignment. Then, we prove the reconstructionand the synthesis tasks are well complementary. Finally, we compare it withstate-of-the-art approaches on an open dataset FastMRI and an in-house datasetto testify the validity of the proposed method.", "output": "Spatial and Modal Optimal Transport for Fast Cross-Modal MRI Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The aim of this paper is to present a mathematical framework for tensor PCA.The proposed approach is able to overcome the limitations of previous methodsthat extract a low dimensional subspace by iteratively solving an optimizationproblem. The core of the proposed approach is the derivation of a basis intensor space from a real self-adjoint tensor operator, thus reducing theproblem of deriving a basis to an eigenvalue problem. Three different caseshave been studied to derive: i) a basis from a self-adjoint tensor operator;ii) a rank-1 basis; iii) a basis in a subspace. In particular, the equivalencebetween eigenvalue equation for a real self-adjoint tensor operator andstandard matrix eigenvalue equation has been proven. For all the three casesconsidered, a subspace approach has been adopted to derive a tensor PCA.Experiments on image datasets validate the proposed mathematical framework.", "output": "Tensor PCA from basis in tensor space."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-task learning has proven to be effective in improving the performanceof correlated tasks. Most of the existing methods use a backbone to extractinitial features with independent branches for each task, and the exchange ofinformation between the branches usually occurs through the concatenation orsum of the feature maps of the branches. However, this type of informationexchange does not directly consider the local characteristics of the image northe level of importance or correlation between the tasks. In this paper, wepropose a semantic segmentation method, MTLSegFormer, which combines multi-tasklearning and attention mechanisms. After the backbone feature extraction, twofeature maps are learned for each task. The first map is proposed to learnfeatures related to its task, while the second map is obtained by applyinglearned visual attention to locally re-weigh the feature maps of the othertasks. In this way, weights are assigned to local regions of the image of othertasks that have greater importance for the specific task. Finally, the two mapsare combined and used to solve a task. We tested the performance in twochallenging problems with correlated tasks and observed a significantimprovement in accuracy, mainly in tasks with high dependence on the others.", "output": "MTLSegFormer: Multi-task Learning with Transformers for Semantic Segmentation in Precision Agriculture."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multimodal emotion recognition identifies human emotions from various datamodalities like video, text, and audio. However, we found that this task can beeasily affected by noisy information that does not contain useful semantics. Tothis end, we present a novel paradigm that attempts to extract noise-resistantfeatures in its pipeline and introduces a noise-aware learning scheme toeffectively improve the robustness of multimodal emotion understanding. Our newpipeline, namely Noise-Resistant Multimodal Transformer (NORM-TR), mainlyintroduces a Noise-Resistant Generic Feature (NRGF) extractor and a Transformerfor the multimodal emotion recognition task. In particular, we make the NRGFextractor learn a generic and disturbance-insensitive representation so thatconsistent and meaningful semantics can be obtained. Furthermore, we apply aTransformer to incorporate Multimodal Features (MFs) of multimodal inputs basedon their relations to the NRGF. Therefore, the possible insensitive but usefulinformation of NRGF could be complemented by MFs that contain more details. Totrain the NORM-TR properly, our proposed noise-aware learning schemecomplements normal emotion recognition losses by enhancing the learning againstnoises. Our learning scheme explicitly adds noises to either all the modalitiesor a specific modality at random locations of a multimodal input sequence. Wecorrespondingly introduce two adversarial losses to encourage the NRGFextractor to learn to extract the NRGFs invariant to the added noises, thusfacilitating the NORM-TR to achieve more favorable multimodal emotionrecognition performance. In practice, on several popular multimodal datasets,our NORM-TR achieves state-of-the-art performance and outperforms existingmethods by a large margin, which demonstrates that the ability to resist noisyinformation is important for effective emotion recognition.", "output": "Noise-Resistant Multimodal Transformer for Emotion Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "To study whether it is possible to differentiate intermediate age-relatedmacular degeneration (AMD) from healthy controls using partial opticalcoherence tomography (OCT) data, that is, restricting the input B-scans tocertain pre-defined regions of interest (ROIs). A total of 15744 B-scans from269 intermediate AMD patients and 115 normal subjects were used in this study(split on subject level in 80% train, 10% validation and 10% test). From eachOCT B-scan, three ROIs were extracted: retina, complex between retinal pigmentepithelium (RPE) and Bruch membrane (BM), and choroid (CHO). These ROIs wereobtained using two different methods: masking and cropping. In addition to thesix ROIs, the whole OCT B-scan and the binary mask corresponding to thesegmentation of the RPE-BM complex were used. For each subset, a convolutionalneural network (based on VGG16 architecture and pre-trained on ImageNet) wastrained and tested. The performance of the models was evaluated using the areaunder the receiver operating characteristic (AUROC), accuracy, sensitivity, andspecificity. All trained models presented an AUROC, accuracy, sensitivity, andspecificity equal to or higher than 0.884, 0.816, 0.685, and 0.644,respectively. The model trained on the whole OCT B-scan presented the bestperformance (AUROC = 0.983, accuracy = 0.927, sensitivity = 0.862, specificity= 0.913). The models trained on the ROIs obtained with the cropping method ledto significantly higher outcomes than those obtained with masking, with theexception of the retinal tissue, where no statistically significant differencewas observed between cropping and masking (p = 0.47). This study demonstratedthat while using the complete OCT B-scan provided the highest accuracy inclassifying intermediate AMD, models trained on specific ROIs such as theRPE-BM complex or the choroid can still achieve high performance.", "output": "Comparison of different retinal regions-of-interest imaged by OCT for the classification of intermediate AMD."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although empathic interaction between counselor and client is fundamental tosuccess in the psychotherapeutic process, there are currently few datasets toaid a computational approach to empathy understanding. In this paper, weconstruct a multimodal empathy dataset collected from face-to-facepsychological counseling sessions. The dataset consists of 771 video clips. Wealso propose three labels (i.e., expression of experience, emotional reaction,and cognitive reaction) to describe the degree of empathy between counselorsand their clients. Expression of experience describes whether the client hasexpressed experiences that can trigger empathy, and emotional and cognitivereactions indicate the counselor's empathic reactions. As an elementaryassessment of the usability of the constructed multimodal empathy dataset, aninterrater reliability analysis of annotators' subjective evaluations for videoclips is conducted using the intraclass correlation coefficient and Fleiss'Kappa. Results prove that our data annotation is reliable. Furthermore, weconduct empathy prediction using three typical methods, including the tensorfusion network, the sentimental words aware fusion network, and a simpleconcatenation model. The experimental results show that empathy can be wellpredicted on our dataset. Our dataset is available for research purposes.", "output": "MEDIC: A Multimodal Empathy Dataset in Counseling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Binary Neural Networks (BNNs) use 1-bit weights and activations toefficiently execute deep convolutional neural networks on edge devices.Nevertheless, the binarization of the first layer is conventionally excluded,as it leads to a large accuracy loss. The few works addressing the first layerbinarization, typically increase the number of input channels to enhance datarepresentation; such data expansion raises the amount of operations needed andit is feasible only on systems with enough computational resources. In thiswork, we present a new method to binarize the first layer using directly the8-bit representation of input data; we exploit the standard bit-planes encodingto extract features bit-wise (using depth-wise convolutions); after are-weighting stage, features are fused again. The resulting model is fullybinarized and our first layer binarization approach is model independent. Theconcept is evaluated on three classification datasets (CIFAR10, SVHN andCIFAR100) for different model architectures (VGG and ResNet) and, the proposedtechnique outperforms state of the art methods both in accuracy and BMACsreduction.", "output": "Input Layer Binarization with Bit-Plane Encoding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Driver monitoring systems (DMS) are a key component of vehicular safety andessential for the transition from semiautonomous to fully autonomous driving. Akey task for DMS is to ascertain the cognitive state of a driver and todetermine their level of tiredness. Neuromorphic vision systems, based on eventcamera technology, provide advanced sensing of facial characteristics, inparticular the behavior of a driver's eyes. This research explores thepotential to extend neuromorphic sensing techniques to analyze the entirefacial region, detecting yawning behaviors that give a complimentary indicatorof tiredness. A neuromorphic dataset is constructed from 952 video clips (481yawns, 471 not-yawns) captured with an RGB color camera, with 37 subjects. Atotal of 95200 neuromorphic image frames are generated from this video datausing a video-to-event converter. From these data 21 subjects were selected toprovide a training dataset, 8 subjects were used for validation data, and theremaining 8 subjects were reserved for an \"unseen\" test dataset. An additional12300 frames were generated from event simulations of a public dataset to testagainst other methods. A CNN with self-attention and a recurrent head wasdesigned, trained, and tested with these data. Respective precision and recallscores of 95.9 percent and 94.7 percent were achieved on our test set, and 89.9percent and 91 percent on the simulated public test set, demonstrating thefeasibility to add yawn detection as a sensing component of a neuromorphic DMS.", "output": "Neuromorphic Sensing for Yawn Detection in Driver Drowsiness."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "For many driving safety applications, it is of great importance to accuratelyregister LiDAR point clouds generated on distant moving vehicles. However, suchpoint clouds have extremely different point density and sensor perspective onthe same object, making registration on such point clouds very hard. In thispaper, we propose a novel feature extraction framework, called APR, for onlinedistant point cloud registration. Specifically, APR leverages an autoencoderdesign, where the autoencoder reconstructs a denser aggregated point cloud withseveral frames instead of the original single input point cloud. Our designforces the encoder to extract features with rich local geometry informationbased on one single input point cloud. Such features are then used for onlinedistant point cloud registration. We conduct extensive experiments againststate-of-the-art (SOTA) feature extractors on KITTI and nuScenes datasets.Results show that APR outperforms all other extractors by a large margin,increasing average registration recall of SOTA extractors by 7.1% on LoKITTIand 4.6% on LoNuScenes.", "output": "APR: Online Distant Point Cloud Registration Through Aggregated Point Cloud Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a new model, training procedure and architecture to create precisemaps of distinction between two classes of images. The objective is tocomprehend, in pixel-wise resolution, the unique characteristics of a class.These maps can facilitate self-supervised segmentation and objectdetection inaddition to new capabilities in explainable AI (XAI). Our proposed architectureis based on image decomposition, where the output is the sum of multiplegenerative networks (branched-GANs). The distinction between classes isisolated in a dedicated branch. This approach allows clear, precise andinterpretable visualization of the unique characteristics of each class. Weshow how our generic method can be used in several modalities for varioustasks, such as MRI brain tumor extraction, isolating cars in aerial photographyand obtaining feminine and masculine face features. This is a preliminaryreport of our initial findings and results.", "output": "Additive Class Distinction Maps using Branched-GANs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Low-resolution point clouds are challenging for object detection methods dueto their sparsity. Densifying the present point cloud by concatenating it withits predecessors is a popular solution to this challenge. Such concatenation ispossible thanks to the removal of ego vehicle motion using its odometry. Thismethod is called Ego Motion Compensation (EMC). Thanks to the added points, EMCsignificantly improves the performance of single-frame detectors. However, itsuffers from the shadow effect that manifests in dynamic objects' pointsscattering along their trajectories. This effect results in a misalignmentbetween feature maps and objects' locations, thus limiting performanceimprovement to stationary and slow-moving objects only. Scene flow allowsaligning point clouds in 3D space, thus naturally resolving the misalignment infeature spaces. By observing that scene flow computation shares severalcomponents with 3D object detection pipelines, we develop a plug-in module thatenables single-frame detectors to compute scene flow to rectify their Bird-EyeView representation. Experiments on the NuScenes dataset show that our moduleleads to a significant increase (up to 16%) in the Average Precision of largevehicles, which interestingly demonstrates the most severe shadow effect. Thecode is published at ", "output": "Aligning Bird-Eye View Representation of Point Cloud Sequences using Scene Flow."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Urban Physical Disorder (UPD), such as old or abandoned buildings, brokensidewalks, litter, and graffiti, has a negative impact on residents' quality oflife. They can also increase crime rates, cause social disorder, and pose apublic health risk. Currently, there is a lack of efficient and reliablemethods for detecting and understanding UPD. To bridge this gap, we proposeUPDExplainer, an interpretable transformer-based framework for UPD detection.We first develop a UPD detection model based on the Swin Transformerarchitecture, which leverages readily accessible street view images to learndiscriminative representations. In order to provide clear and comprehensibleevidence and analysis, we subsequently introduce a UPD factor identificationand ranking module that combines visual explanation maps with semanticsegmentation maps. This novel integrated approach enables us to identify theexact objects within street view images that are responsible for physicaldisorders and gain insights into the underlying causes. Experimental results onthe re-annotated Place Pulse 2.0 dataset demonstrate promising detectionperformance of the proposed method, with an accuracy of 79.9%. For acomprehensive evaluation of the method's ranking performance, we report themean Average Precision (mAP), R-Precision (RPrec), and Normalized DiscountedCumulative Gain (NDCG), with success rates of 75.51%, 80.61%, and 82.58%,respectively. We also present a case study of detecting and ranking physicaldisorders in the southern region of downtown Los Angeles, California, todemonstrate the practicality and effectiveness of our framework.", "output": "UPDExplainer: an Interpretable Transformer-based Framework for Urban Physical Disorder Detection Using Street View Imagery."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Medical image classification is one of the most important tasks forcomputer-aided diagnosis. Deep learning models, particularly convolutionalneural networks, have been successfully used for disease classification frommedical images, facilitated by automated feature learning. However, the diverseimaging modalities and clinical pathology make it challenging to constructgeneralized and robust classifications. Towards improving the modelperformance, we propose a novel pretraining approach, namely Forward ForwardContrastive Learning (FFCL), which leverages the Forward-Forward Algorithm in acontrastive learning framework--both locally and globally. Our experimentalresults on the chest X-ray dataset indicate that the proposed FFCL achievessuperior performance (3.69% accuracy over ImageNet pretrained ResNet-18) overexisting pretraining models in the pneumonia classification task. Moreover,extensive ablation experiments support the particular local and globalcontrastive pretraining design in FFCL.", "output": "Forward-Forward Contrastive Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Image captioning, a.k.a. \"image-to-text,\" which generates descriptive textfrom given images, has been rapidly developing throughout the era of deeplearning. To what extent is the information in the original image preserved inthe descriptive text generated by an image captioner? To answer that question,we have performed experiments involving the classification of images fromdescriptive text alone, without referring to the images at all, and comparedresults with those from standard image-based classifiers. We have evaluateseveral image captioning models with respect to a disaster image classificationtask, CrisisNLP, and show that descriptive text classifiers can sometimesachieve higher accuracy than standard image-based classifiers. Further, we showthat fusing an image-based classifier with a descriptive text classifier canprovide improvement in accuracy.", "output": "Image Captioners Sometimes Tell More Than Images They See."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents FUSegNet, a new model for foot ulcer segmentation indiabetes patients, which uses the pre-trained EfficientNet-b7 as a backbone toaddress the issue of limited training samples. A modified spatial and channelsqueeze-and-excitation (scSE) module called parallel scSE or P-scSE is proposedthat combines additive and max-out scSE. A new arrangement is introduced forthe module by fusing it in the middle of each decoder stage. As the top decoderstage carries a limited number of feature maps, max-out scSE is bypassed thereto form a shorted P-scSE. A set of augmentations, comprising geometric,morphological, and intensity-based augmentations, is applied before feeding thedata into the network. The proposed model is first evaluated on a publiclyavailable chronic wound dataset where it achieves a data-based dice score of92.70%, which is the highest score among the reported approaches. The modeloutperforms other scSE-based UNet models in terms of Pratt's figure of merits(PFOM) scores in most categories, which evaluates the accuracy of edgelocalization. The model is then tested in the MICCAI 2021 FUSeg challenge,where a variation of FUSegNet called x-FUSegNet is submitted. The x-FUSegNetmodel, which takes the average of outputs obtained by FUSegNet using 5-foldcross-validation, achieves a dice score of 89.23%, placing it at the top of theFUSeg Challenge leaderboard. The source code for the model is available on", "output": "FUSegNet: A Deep Convolutional Neural Network for Foot Ulcer Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a method for generating alpha mattes using a limited data source.We pretrain a novel transformerbased model (StyleMatte) on portrait datasets.We utilize this model to provide image-mask pairs for the StyleGAN3- basednetwork (StyleMatteGAN). This network is trained unsupervisedly and generatespreviously unseen imagemask training pairs that are fed back to StyleMatte. Wedemonstrate that the performance of the matte pulling network improves duringthis cycle and obtains top results on the used datasets. Furthermore,StyleMatteGAN provides high-resolution, privacy-preserving portraits with alphamattes, making it suitable for various image composition tasks. Our code isavailable at ", "output": "Adversarially-Guided Portrait Matting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Understanding the performance of machine learning (ML) models across diversedata distributions is critically important for reliable applications. Despiterecent empirical studies positing a near-perfect linear correlation betweenin-distribution (ID) and out-of-distribution (OOD) accuracies, we empiricallydemonstrate that this correlation is more nuanced under subpopulation shifts.Through rigorous experimentation and analysis across a variety of datasets,models, and training epochs, we demonstrate that OOD performance often has anonlinear correlation with ID performance in subpopulation shifts. Ourfindings, which contrast previous studies that have posited a linearcorrelation in model performance during distribution shifts, reveal a \"moonshape\" correlation (parabolic uptrend curve) between the test performance onthe majority subpopulation and the minority subpopulation. This non-trivialnonlinear correlation holds across model architectures, hyperparameters,training durations, and the imbalance between subpopulations. Furthermore, wefound that the nonlinearity of this \"moon shape\" is causally influenced by thedegree of spurious correlations in the training data. Our controlledexperiments show that stronger spurious correlation in the training datacreates more nonlinear performance correlation. We provide complementaryexperimental and theoretical analyses for this phenomenon, and discuss itsimplications for ML reliability and fairness. Our work highlights theimportance of understanding the nonlinear effects of model improvement onperformance in different subpopulations, and has the potential to inform thedevelopment of more equitable and responsible machine learning models.", "output": "On the nonlinear correlation of ML performance between data subpopulations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "For driverless train operation on mainline railways, several tasks need to beimplemented by technical systems. One of the most challenging tasks is tomonitor the train's driveway and its surroundings for potential obstacles dueto long braking distances. Machine learning algorithms can be used to analyzedata from vision sensors such as infrared (IR) and visual (RGB) cameras,lidars, and radars to detect objects. Such algorithms require large amounts ofannotated data from objects in the rail environment that may pose potentialobstacles, as well as rail-specific objects such as tracks or catenary poles,as training data. However, only very few datasets are publicly available andthese available datasets typically involve only a limited number of sensors.Datasets and trained models from other domains, such as automotive, are usefulbut insufficient for object detection in the railway context. Therefore, thispublication presents OSDaR23, a multi-sensor dataset of 21 sequences capturedin Hamburg, Germany, in September 2021. The sensor setup consisted of multiplecalibrated and synchronized IR/RGB cameras, lidars, a radar, and position andacceleration sensors front-mounted on a railway vehicle. In addition to rawdata, the dataset contains 204091 polyline, polygonal, rectangle and cuboidannotations for 20 different object classes. This dataset can also be used fortasks going beyond collision prediction, which are listed in this paper.", "output": "OSDaR23: Open Sensor Data for Rail 2023."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite Convolutional Neural Networks having reached human-level performancein some medical tasks, their clinical use has been hindered by their lack ofinterpretability. Two major interpretability strategies have been proposed totackle this problem: post-hoc methods and intrinsic methods. Although there areseveral post-hoc methods to interpret DL models, there is significant variationbetween the explanations provided by each method, and it a difficult tovalidate them due to the lack of ground-truth. To address this challenge, weadapted the intrinsical interpretable ProtoPNet for the context ofhistopathology imaging and compared the attribution maps produced by it and thesaliency maps made by post-hoc methods. To evaluate the similarity betweensaliency map methods and attribution maps we adapted 10 saliency metrics fromthe saliency model literature, and used the breast cancer metastases detectiondataset PatchCamelyon with 327,680 patches of histopathological images ofsentinel lymph node sections to validate the proposed approach. Overall,SmoothGrad and Occlusion were found to have a statistically bigger overlap withProtoPNet while Deconvolution and Lime have been found to have the least.", "output": "Evaluating Post-hoc Interpretability with Intrinsic Interpretability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We focus on reconstructing high-fidelity radiance fields of human heads,capturing their animations over time, and synthesizing re-renderings from novelviewpoints at arbitrary time steps. To this end, we propose a new multi-viewcapture setup composed of 16 calibrated machine vision cameras that recordtime-synchronized images at 7.1 MP resolution and 73 frames per second. Withour setup, we collect a new dataset of over 4700 high-resolution,high-framerate sequences of more than 220 human heads, from which we introducea new human head reconstruction benchmark. The recorded sequences cover a widerange of facial dynamics, including head motions, natural expressions,emotions, and spoken language. In order to reconstruct high-fidelity humanheads, we propose Dynamic Neural Radiance Fields using Hash Ensembles(NeRSemble). We represent scene dynamics by combining a deformation field andan ensemble of 3D multi-resolution hash encodings. The deformation field allowsfor precise modeling of simple scene movements, while the ensemble of hashencodings helps to represent complex dynamics. As a result, we obtain radiancefield representations of human heads that capture motion over time andfacilitate re-rendering of arbitrary novel viewpoints. In a series ofexperiments, we explore the design choices of our method and demonstrate thatour approach outperforms state-of-the-art dynamic radiance field approaches bya significant margin.", "output": "NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Object detectors often suffer from the domain gap between training (sourcedomain) and real-world applications (target domain). Mean-teacher self-trainingis a powerful paradigm in unsupervised domain adaptation for object detection,but it struggles with low-quality pseudo-labels. In this work, we identify theintriguing alignment and synergy between mean-teacher self-training andcontrastive learning. Motivated by this, we propose Contrastive Mean Teacher(CMT) -- a unified, general-purpose framework with the two paradigms naturallyintegrated to maximize beneficial learning signals. Instead of usingpseudo-labels solely for final predictions, our strategy extracts object-levelfeatures using pseudo-labels and optimizes them via contrastive learning,without requiring labels in the target domain. When combined with recentmean-teacher self-training methods, CMT leads to new state-of-the-arttarget-domain performance: 51.9% mAP on Foggy Cityscapes, outperforming thepreviously best by 2.1% mAP. Notably, CMT can stabilize performance and providemore significant gains as pseudo-label noise increases.", "output": "Contrastive Mean Teacher for Domain Adaptive Object Detectors."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Prior works for reconstructing hand-held objects from a single image rely ondirect 3D shape supervision which is challenging to gather in real world atscale. Consequently, these approaches do not generalize well when presentedwith novel objects in in-the-wild settings. While 3D supervision is a majorbottleneck, there is an abundance of in-the-wild raw video data showinghand-object interactions. In this paper, we automatically extract 3Dsupervision (via multiview 2D supervision) from such raw video data to scale upthe learning of models for hand-held object reconstruction. This requirestackling two key challenges: unknown camera pose and occlusion. For the former,we use hand pose (predicted from existing techniques, e.g. FrankMocap) as aproxy for object pose. For the latter, we learn data-driven 3D shape priorsusing synthetic objects from the ObMan dataset. We use these indirect 3D cuesto train occupancy networks that predict the 3D shape of objects from a singleRGB image. Our experiments on the MOW and HO3D datasets show the effectivenessof these supervisory signals at predicting the 3D shape for real-worldhand-held objects without any direct real-world 3D supervision.", "output": "Learning Hand-Held Object Reconstruction from In-The-Wild Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Textures are a vital aspect of creating visually appealing and realistic 3Dmodels. In this paper, we study the problem of generating high-fidelity texturegiven shapes of 3D assets, which has been relatively less explored comparedwith generic 3D shape modeling. Our goal is to facilitate a controllabletexture generation process, such that one texture code can correspond to aparticular appearance style independent of any input shapes from a category. Weintroduce Texture UV Radiance Fields (TUVF) that generate textures in alearnable UV sphere space rather than directly on the 3D shape. This allows thetexture to be disentangled from the underlying shape and transferable to othershapes that share the same UV space, i.e., from the same category. We integratethe UV sphere space with the radiance field, which provides a more efficientand accurate representation of textures than traditional texture maps. Weperform our experiments on real-world object datasets where we achieve not onlyrealistic synthesis but also substantial improvements over state-of-the-arts ontexture controlling and editing. Project Page: ", "output": "TUVF: Learning Generalizable Texture UV Radiance Fields."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There is a growing demand for the accessible creation of high-quality 3Davatars that are animatable and customizable. Although 3D morphable modelsprovide intuitive control for editing and animation, and robustness forsingle-view face reconstruction, they cannot easily capture geometric andappearance details. Methods based on neural implicit representations, such assigned distance functions (SDF) or neural radiance fields, approachphoto-realism, but are difficult to animate and do not generalize well tounseen data. To tackle this problem, we propose a novel method for constructingimplicit 3D morphable face models that are both generalizable and intuitive forediting. Trained from a collection of high-quality 3D scans, our face model isparameterized by geometry, expression, and texture latent codes with a learnedSDF and explicit UV texture parameterization. Once trained, we can reconstructan avatar from a single in-the-wild image by leveraging the learned prior toproject the image into the latent space of our model. Our implicit morphableface models can be used to render an avatar from novel views, animate facialexpressions by modifying expression codes, and edit textures by directlypainting on the learned UV-texture maps. We demonstrate quantitatively andqualitatively that our method improves upon photo-realism, geometry, andexpression accuracy compared to state-of-the-art methods.", "output": "Single-Shot Implicit Morphable Faces with Consistent Texture Parameterization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "OctFormer can not only serve as a general and effective backbone for 3D pointcloud segmentation and object detection but also have linear complexity and isscalable for large-scale point clouds. The key challenge in applyingtransformers to point clouds is reducing the quadratic, thus overwhelming,computation complexity of attentions. To combat this issue, several worksdivide point clouds into non-overlapping windows and constrain attentions ineach local window. However, the point number in each window varies greatly,impeding the efficient execution on GPU. Observing that attentions are robustto the shapes of local windows, we propose a novel octree attention, whichleverages sorted shuffled keys of octrees to partition point clouds into localwindows containing a fixed number of points while permitting shapes of windowsto change freely. And we also introduce dilated octree attention to expand thereceptive field further. Our octree attention can be implemented in 10 lines ofcode with open-sourced libraries and runs 17 times faster than other pointcloud attentions when the point number exceeds 200k. Built upon the octreeattention, OctFormer can be easily scaled up and achieves state-of-the-artperformances on a series of 3D segmentation and detection benchmarks,surpassing previous sparse-voxel-based CNNs and point cloud transformers interms of both efficiency and effectiveness. Notably, on the challengingScanNet200 dataset, OctFormer outperforms sparse-voxel-based CNNs by 7.3 inmIoU. Our code and trained models are available at", "output": "OctFormer: Octree-based Transformers for 3D Point Clouds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Driven by large-data pre-training, Segment Anything Model (SAM) has beendemonstrated as a powerful and promptable framework, revolutionizing thesegmentation models. Despite the generality, customizing SAM for specificvisual concepts without man-powered prompting is under explored, e.g.,automatically segmenting your pet dog in different images. In this paper, wepropose a training-free Personalization approach for SAM, termed as PerSAM.Given only a single image with a reference mask, PerSAM first localizes thetarget concept by a location prior, and segments it within other images orvideos via three techniques: target-guided attention, target-semanticprompting, and cascaded post-refinement. In this way, we effectively adapt SAMfor private use without any training. To further alleviate the mask ambiguity,we present an efficient one-shot fine-tuning variant, PerSAM-F. Freezing theentire SAM, we introduce two learnable weights for multi-scale masks, onlytraining 2 parameters within 10 seconds for improved performance. Todemonstrate our efficacy, we construct a new segmentation dataset, PerSeg, forpersonalized evaluation, and test our methods on video object segmentation withcompetitive performance. Besides, our approach can also enhance DreamBooth topersonalize Stable Diffusion for text-to-image generation, which discards thebackground disturbance for better target appearance learning. Code is releasedat ", "output": "Personalize Segment Anything Model with One Shot."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes NeuralEditor that enables neural radiance fields (NeRFs)natively editable for general shape editing tasks. Despite their impressiveresults on novel-view synthesis, it remains a fundamental challenge for NeRFsto edit the shape of the scene. Our key insight is to exploit the explicitpoint cloud representation as the underlying structure to construct NeRFs,inspired by the intuitive interpretation of NeRF rendering as a process thatprojects or \"plots\" the associated 3D point cloud to a 2D image plane. To thisend, NeuralEditor introduces a novel rendering scheme based on deterministicintegration within K-D tree-guided density-adaptive voxels, which produces bothhigh-quality rendering results and precise point clouds through optimization.NeuralEditor then performs shape editing via mapping associated points betweenpoint clouds. Extensive evaluation shows that NeuralEditor achievesstate-of-the-art performance in both shape deformation and scene morphingtasks. Notably, NeuralEditor supports both zero-shot inference and furtherfine-tuning over the edited scene. Our code, benchmark, and demo video areavailable at ", "output": "NeuralEditor: Editing Neural Radiance Fields via Manipulating Point Clouds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep generative models have various content creation applications such asgraphic design, e-commerce, and virtual Try-on. However, current works mainlyfocus on synthesizing realistic visual outputs, often ignoring other sensorymodalities, such as touch, which limits physical interaction with users. Inthis work, we leverage deep generative models to create a multi-sensoryexperience where users can touch and see the synthesized object when slidingtheir fingers on a haptic surface. The main challenges lie in the significantscale discrepancy between vision and touch sensing and the lack of explicitmapping from touch sensing data to a haptic rendering device. To bridge thisgap, we collect high-resolution tactile data with a GelSight sensor and createa new visuotactile clothing dataset. We then develop a conditional generativemodel that synthesizes both visual and tactile outputs from a single sketch. Weevaluate our method regarding image quality and tactile rendering accuracy.Finally, we introduce a pipeline to render high-quality visual and tactileoutputs on an electroadhesion-based haptic device for an immersive experience,allowing for challenging materials and editable sketch inputs.", "output": "Controllable Visual-Tactile Synthesis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Tracking objects with persistence in cluttered and dynamic environmentsremains a difficult challenge for computer vision systems. In this paper, weintroduce $textbf{TCOW}$, a new benchmark and model for visual trackingthrough heavy occlusion and containment. We set up a task where the goal is to,given a video sequence, segment both the projected extent of the target object,as well as the surrounding container or occluder whenever one exists. To studythis task, we create a mixture of synthetic and annotated real datasets tosupport both supervised learning and structured evaluation of model performanceunder various forms of task variation, such as moving or nested containment. Weevaluate two recent transformer-based video models and find that while they canbe surprisingly capable of tracking targets under certain settings of taskvariation, there remains a considerable performance gap before we can claim atracking model to have acquired a true notion of object permanence.", "output": "Tracking through Containers and Occluders in the Wild."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Typical deep visual recognition models are capable of performing the one taskthey were trained on. In this paper, we tackle the extremely difficult problemof combining completely distinct models with different initializations, eachsolving a separate task, into one multi-task model without any additionaltraining. Prior work in model merging permutes one model to the space of theother then adds them together. While this works for models trained on the sametask, we find that this fails to account for the differences in models trainedon disjoint tasks. Thus, we introduce \"ZipIt!\", a general method for mergingtwo arbitrary models of the same architecture that incorporates two simplestrategies. First, in order to account for features that aren't shared betweenmodels, we expand the model merging problem to additionally allow for mergingfeatures within each model by defining a general \"zip\" operation. Second, weadd support for partially zipping the models up until a specified layer,naturally creating a multi-head model. We find that these two changes combinedaccount for a staggering 20-60% improvement over prior work, making the mergingof models trained on disjoint tasks feasible.", "output": "ZipIt! Merging Models from Different Tasks without Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Network pruning and quantization are proven to be effective ways for deepmodel compression. To obtain a highly compact model, most methods first performnetwork pruning and then conduct network quantization based on the prunedmodel. However, this strategy may ignore that they would affect each other andthus performing them separately may lead to sub-optimal performance. To addressthis, performing pruning and quantization jointly is essential. Nevertheless,how to make a trade-off between pruning and quantization is non-trivial.Moreover, existing compression methods often rely on some pre-definedcompression configurations. Some attempts have been made to search for optimalconfigurations, which however may take unbearable optimization cost. To addressthe above issues, we devise a simple yet effective method named Single-path BitSharing (SBS). Specifically, we first consider network pruning as a specialcase of quantization, which provides a unified view for pruning andquantization. We then introduce a single-path model to encode all candidatecompression configurations. In this way, the configuration search problem istransformed into a subset selection problem, which significantly reduces thenumber of parameters, computational cost and optimization difficulty. Relyingon the single-path model, we further introduce learnable binary gates to encodethe choice of bitwidth. By jointly training the binary gates in conjunctionwith network parameters, the compression configurations of each layer can beautomatically determined. Extensive experiments on both CIFAR-100 and ImageNetshow that SBS is able to significantly reduce computational cost whileachieving promising performance. For example, our SBS compressed MobileNetV2achieves 22.6x Bit-Operation (BOP) reduction with only 0.1% drop in the Top-1accuracy.", "output": "Single-path Bit Sharing for Automatic Loss-aware Model Compression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Nowadays, many of the images captured are `observed' by machines only and notby humans, e.g., in autonomous systems. High-level machine vision models, suchas object recognition or semantic segmentation, assume images are transformedinto some canonical image space by the camera ans{Image Signal Processor(ISP)}. However, the camera ISP is optimized for producing visually pleasingimages for human observers and not for machines. Therefore, one may spare theISP compute time and apply vision models directly to RAW images. Yet, it hasbeen shown that training such models directly on RAW images results in aperformance drop. To mitigate this drop, we use a RAW and RGB image pairsdataset, which can be easily acquired with no human labeling. We then train amodel that is applied directly to the RAW data by using knowledge distillationsuch that the model predictions for RAW images will be aligned with thepredictions of an off-the-shelf pre-trained model for processed RGB images. Ourexperiments show that our performance on RAW images for object classificationand semantic segmentation is significantly better than models trained onlabeled RAW images. It also reasonably matches the predictions of a pre-trainedmodel on processed RGB images, while saving the ISP compute overhead.", "output": "ISP Distillation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In vision domain, large-scale natural datasets typically exhibit long-taileddistribution which has large class imbalance between head and tail classes.This distribution poses difficulty in learning good representations for tailclasses. Recent developments have shown good long-tailed model can be learnt bydecoupling the training into representation learning and classifier balancing.However, these works pay insufficient consideration on the long-tailed effecton representation learning. In this work, we propose interpolative centroidcontrastive learning (ICCL) to improve long-tailed representation learning.ICCL interpolates two images from a class-agnostic sampler and a class-awaresampler, and trains the model such that the representation of the interpolativeimage can be used to retrieve the centroids for both source classes. Wedemonstrate the effectiveness of our approach on multiple long-tailed imageclassification benchmarks. Our result shows a significant accuracy gain of 2.8%on the iNaturalist 2018 dataset with a real-world long-tailed distribution.", "output": "Improving Tail-Class Representation with Centroid Contrastive Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Perceiving and interacting with 3D articulated objects, such as cabinets,doors, and faucets, pose particular challenges for future home-assistant robotsperforming daily tasks in human environments. Besides parsing the articulatedparts and joint parameters, researchers recently advocate learning manipulationaffordance over the input shape geometry which is more task-aware andgeometrically fine-grained. However, taking only passive observations asinputs, these methods ignore many hidden but important kinematic constraints(e.g., joint location and limits) and dynamic factors (e.g., joint friction andrestitution), therefore losing significant accuracy for test cases with suchuncertainties. In this paper, we propose a novel framework, named AdaAfford,that learns to perform very few test-time interactions for quickly adapting theaffordance priors to more accurate instance-specific posteriors. We conductlarge-scale experiments using the PartNet-Mobility dataset and prove that oursystem performs better than baselines.", "output": "AdaAfford: Learning to Adapt Manipulation Affordance for 3D Articulated Objects via Few-shot Interactions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper addresses the challenge of reconstructing an animatable humanmodel from a multi-view video. Some recent works have proposed to decompose anon-rigidly deforming scene into a canonical neural radiance field and a set ofdeformation fields that map observation-space points to the canonical space,thereby enabling them to learn the dynamic scene from images. However, theyrepresent the deformation field as translational vector field or SE(3) field,which makes the optimization highly under-constrained. Moreover, theserepresentations cannot be explicitly controlled by input motions. Instead, weintroduce a pose-driven deformation field based on the linear blend skinningalgorithm, which combines the blend weight field and the 3D human skeleton toproduce observation-to-canonical correspondences. Since 3D human skeletons aremore observable, they can regularize the learning of the deformation field.Moreover, the pose-driven deformation field can be controlled by input skeletalmotions to generate new deformation fields to animate the canonical humanmodel. Experiments show that our approach significantly outperforms recenthuman modeling methods. The code is available at", "output": "Animatable Implicit Neural Representations for Creating Realistic Avatars from Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning has largely reshaped remote sensing (RS) research for aerialimage understanding and made a great success. Nevertheless, most of theexisting deep models are initialized with the ImageNet pretrained weights.Since natural images inevitably present a large domain gap relative to aerialimages, probably limiting the finetuning performance on downstream aerial scenetasks. This issue motivates us to conduct an empirical study of remote sensingpretraining (RSP) on aerial images. To this end, we train different networksfrom scratch with the help of the largest RS scene recognition dataset up tonow -- MillionAID, to obtain a series of RS pretrained backbones, includingboth convolutional neural networks (CNN) and vision transformers such as Swinand ViTAE, which have shown promising performance on computer vision tasks.Then, we investigate the impact of RSP on representative downstream tasksincluding scene recognition, semantic segmentation, object detection, andchange detection using these CNN and vision transformer backbones. Empiricalstudy shows that RSP can help deliver distinctive performances in scenerecognition tasks and in perceiving RS related semantics such as \"Bridge\" and\"Airplane\". We also find that, although RSP mitigates the data discrepancies oftraditional ImageNet pretraining on RS images, it may still suffer from taskdiscrepancies, where downstream tasks require different representations fromscene recognition tasks. These findings call for further research efforts onboth large-scale pretraining datasets and effective pretraining methods. Thecodes and pretrained models will be released at", "output": "An Empirical Study of Remote Sensing Pretraining."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Capturing general deforming scenes from monocular RGB video is crucial formany computer graphics and vision applications. However, current approachessuffer from drawbacks such as struggling with large scene deformations,inaccurate shape completion or requiring 2D point tracks. In contrast, ourmethod, Ub4D, handles large deformations, performs shape completion in occludedregions, and can operate on monocular RGB videos directly by usingdifferentiable volume rendering. This technique includes three new in thecontext of non-rigid 3D reconstruction components, i.e., 1) A coordinate-basedand implicit neural representation for non-rigid scenes, which in conjunctionwith differentiable volume rendering enables an unbiased reconstruction ofdynamic scenes, 2) a proof that extends the unbiased formulation of volumerendering to dynamic scenes, and 3) a novel dynamic scene flow loss, whichenables the reconstruction of larger deformations by leveraging the coarseestimates of other methods. Results on our new dataset, which will be madepublicly available, demonstrate a clear improvement over the state of the artin terms of surface reconstruction accuracy and robustness to largedeformations.", "output": "Unbiased 4D: Monocular 4D Reconstruction with a Neural Deformation Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Document Visual Question Answering (VQA) aims to understand visually-richdocuments to answer questions in natural language, which is an emergingresearch topic for both Natural Language Processing and Computer Vision. Inthis work, we introduce a new Document VQA dataset, named TAT-DQA, whichconsists of 3,067 document pages comprising semi-structured table(s) andunstructured text as well as 16,558 question-answer pairs by extending theTAT-QA dataset. These documents are sampled from real-world financial reportsand contain lots of numbers, which means discrete reasoning capability isdemanded to answer questions on this dataset. Based on TAT-DQA, we furtherdevelop a novel model named MHST that takes into account the information inmulti-modalities, including text, layout and visual image, to intelligentlyaddress different types of questions with corresponding strategies, i.e.,extraction or reasoning. Extensive experiments show that the MHST modelsignificantly outperforms the baseline methods, demonstrating itseffectiveness. However, the performance still lags far behind that of experthumans. We expect that our new TAT-DQA dataset would facilitate the research ondeep understanding of visually-rich documents combining vision and language,especially for scenarios that require discrete reasoning. Also, we hope theproposed model would inspire researchers to design more advanced Document VQAmodels in future. Our dataset will be publicly available for non-commercial useat ", "output": "Towards Complex Document Understanding By Discrete Reasoning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automatic synthesis of realistic co-speech gestures is an increasinglyimportant yet challenging task in artificial embodied agent creation. Previoussystems mainly focus on generating gestures in an end-to-end manner, whichleads to difficulties in mining the clear rhythm and semantics due to thecomplex yet subtle harmony between speech and gestures. We present a novelco-speech gesture synthesis method that achieves convincing results both on therhythm and semantics. For the rhythm, our system contains a robust rhythm-basedsegmentation pipeline to ensure the temporal coherence between the vocalizationand gestures explicitly. For the gesture semantics, we devise a mechanism toeffectively disentangle both low- and high-level neural embeddings of speechand motion based on linguistic theory. The high-level embedding corresponds tosemantics, while the low-level embedding relates to subtle variations. Lastly,we build correspondence between the hierarchical embeddings of the speech andthe motion, resulting in rhythm- and semantics-aware gesture synthesis.Evaluations with existing objective metrics, a newly proposed rhythmic metric,and human feedback show that our method outperforms state-of-the-art systems bya clear margin.", "output": "Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce the first work to explore web-scale diffusion models forrobotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by firstinferring a text description of those objects, then generating an imagerepresenting a natural, human-like arrangement of those objects, and finallyphysically arranging the objects according to that goal image. We show thatthis is possible zero-shot using DALL-E, without needing any further examplearrangements, data collection, or training. DALL-E-Bot is fully autonomous andis not restricted to a pre-defined set of objects or scenes, thanks to DALL-E'sweb-scale pre-training. Encouraging real-world results, with both human studiesand objective metrics, show that integrating web-scale diffusion models intorobotics pipelines is a promising direction for scalable, unsupervised robotlearning.", "output": "DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a method to accelerate the joint process of physically acquiringand learning neural Bi-directional Reflectance Distribution Function (BRDF)models. While BRDF learning alone can be accelerated by meta-learning,acquisition remains slow as it relies on a mechanical process. We show thatmeta-learning can be extended to optimize the physical sampling pattern, too.After our method has been meta-trained for a set of fully-sampled BRDFs, it isable to quickly train on new BRDFs with up to five orders of magnitude fewerphysical acquisition samples at similar quality. Our approach also extends toother linear and non-linear BRDF models, which we show in an extensiveevaluation.", "output": "Learning to Learn and Sample BRDFs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "NASA's Solar Dynamics Observatory (SDO) mission gathers 1.4 terabytes of dataeach day from its geosynchronous orbit in space. SDO data includes images ofthe Sun captured at different wavelengths, with the primary scientific goal ofunderstanding the dynamic processes governing the Sun. Recently, end-to-endoptimized artificial neural networks (ANN) have shown great potential inperforming image compression. ANN-based compression schemes have outperformedconventional hand-engineered algorithms for lossy and lossless imagecompression. We have designed an ad-hoc ANN-based image compression scheme toreduce the amount of data needed to be stored and retrieved on space missionsstudying solar dynamics. In this work, we propose an attention module to makeuse of both local and non-local attention mechanisms in an adversariallytrained neural image compression network. We have also demonstrated thesuperior perceptual quality of this neural image compressor. Our proposedalgorithm for compressing images downloaded from the SDO spacecraft performsbetter in rate-distortion trade-off than the popular currently-in-use imagecompression codecs such as JPEG and JPEG2000. In addition we have shown thatthe proposed method outperforms state-of-the art lossy transform codingcompression codec, i.e., BPG.", "output": "Attention-Based Generative Neural Image Compression on Solar Dynamics Observatory."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many datasets are biased, namely they contain easy-to-learn features that arehighly correlated with the target class only in the dataset but not in the trueunderlying distribution of the data. For this reason, learning unbiased modelsfrom biased data has become a very relevant research topic in the last years.In this work, we tackle the problem of learning representations that are robustto biases. We first present a margin-based theoretical framework that allows usto clarify why recent contrastive losses (InfoNCE, SupCon, etc.) can fail whendealing with biased data. Based on that, we derive a novel formulation of thesupervised contrastive loss (epsilon-SupInfoNCE), providing more accuratecontrol of the minimal distance between positive and negative samples.Furthermore, thanks to our theoretical framework, we also propose FairKL, a newdebiasing regularization loss, that works well even with extremely biased data.We validate the proposed losses on standard vision datasets including CIFAR10,CIFAR100, and ImageNet, and we assess the debiasing capability of FairKL withepsilon-SupInfoNCE, reaching state-of-the-art performance on a number of biaseddatasets, including real instances of biases in the wild.", "output": "Unbiased Supervised Contrastive Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Image harmonization is a critical task in computer vision, which aims toadjust the foreground to make it compatible with the background. Recent worksmainly focus on using global transformations (i.e., normalization and colorcurve rendering) to achieve visual consistency. However, these models ignorelocal visual consistency and their huge model sizes limit their harmonizationability on edge devices. In this paper, we propose a hierarchical dynamicnetwork (HDNet) to adapt features from local to global view for better featuretransformation in efficient image harmonization. Inspired by the success ofvarious dynamic models, local dynamic (LD) module and mask-aware global dynamic(MGD) module are proposed in this paper. Specifically, LD matches localrepresentations between the foreground and background regions based on semanticsimilarities, then adaptively adjust every foreground local representationaccording to the appearance of its $K$-nearest neighbor background regions. Inthis way, LD can produce more realistic images at a more fine-grained level,and simultaneously enjoy the characteristic of semantic alignment. The MGDeffectively applies distinct convolution to the foreground and background,learning the representations of foreground and background regions as well astheir correlations to the global harmonization, facilitating local visualconsistency for the images much more efficiently. Experimental resultsdemonstrate that the proposed HDNet significantly reduces the total modelparameters by more than 80% compared to previous methods, while stillattaining state-of-the-art performance on the popular iHarmony4 dataset.Notably, the HDNet achieves a 4% improvement in PSNR and a 19% reduction inMSE compared to the prior state-of-the-art methods.", "output": "Hierarchical Dynamic Image Harmonization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Direct time-of-flight (dToF) sensors are promising for next-generationon-device 3D sensing. However, limited by manufacturing capabilities in acompact module, the dToF data has a low spatial resolution (e.g., $sim20times30$ for iPhone dToF), and it requires a super-resolution step beforebeing passed to downstream tasks. In this paper, we solve this super-resolutionproblem by fusing the low-resolution dToF data with the correspondinghigh-resolution RGB guidance. Unlike the conventional RGB-guided depthenhancement approaches, which perform the fusion in a per-frame manner, wepropose the first multi-frame fusion scheme to mitigate the spatial ambiguityresulting from the low-resolution dToF imaging. In addition, dToF sensorsprovide unique depth histogram information for each local patch, and weincorporate this dToF-specific feature in our network design to furtheralleviate spatial ambiguity. To evaluate our models on complex dynamic indoorenvironments and to provide a large-scale dToF sensor dataset, we introduceDyDToF, the first synthetic RGB-dToF video dataset that features dynamicobjects and a realistic dToF simulator following the physical imaging process.We believe the methods and dataset are beneficial to a broad community as dToFdepth sensing is becoming mainstream on mobile devices. Our code and data arepublicly available: ", "output": "Consistent Direct Time-of-Flight Video Depth Super-Resolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multimodal abstractive summarization (MAS) aims to produce a concise summarygiven the multimodal data (text and vision). Existing studies mainly focus onhow to effectively use the visual features from the perspective of an article,having achieved impressive success on the high-resource English dataset.However, less attention has been paid to the visual features from theperspective of the summary, which may limit the model performance, especiallyin the low- and zero-resource scenarios. In this paper, we propose to improvethe summary quality through summary-oriented visual features. To this end, wedevise two auxiliary tasks including vision to summary task and masked imagemodeling task. Together with the main summarization task, we optimize the MASmodel via the training objectives of all these tasks. By these means, the MASmodel can be enhanced by capturing the summary-oriented visual features,thereby yielding more accurate summaries. Experiments on 44 languages, coveringmid-high-, low-, and zero-resource scenarios, verify the effectiveness andsuperiority of the proposed approach, which achieves state-of-the-artperformance under all scenarios. Additionally, we will contribute a large-scalemultilingual multimodal abstractive summarization (MM-Sum) dataset.", "output": "Summary-Oriented Vision Modeling for Multimodal Abstractive Summarization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reasoning, as an essential ability for complex problem-solving, can provideback-end support for various real-world applications, such as medicaldiagnosis, negotiation, etc. This paper provides a comprehensive survey ofcutting-edge research on reasoning with language model prompting. We introduceresearch works with comparisons and summaries and provide systematic resourcesto help beginners. We also discuss the potential reasons for emerging suchreasoning abilities and highlight future research directions. Resources areavailable at (updatedperiodically).", "output": "Reasoning with Language Model Prompting: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Class-incremental learning (CIL) aims to train a classification model whilethe number of classes increases phase-by-phase. An inherent challenge of CIL isthe stability-plasticity tradeoff, i.e., CIL models should keep stable toretain old knowledge and keep plastic to absorb new knowledge. However, none ofthe existing CIL models can achieve the optimal tradeoff in differentdata-receiving settings--where typically the training-from-half (TFH) settingneeds more stability, but the training-from-scratch (TFS) needs moreplasticity. To this end, we design an online learning method that canadaptively optimize the tradeoff without knowing the setting as a priori.Specifically, we first introduce the key hyperparameters that influence thetrade-off, e.g., knowledge distillation (KD) loss weights, learning rates, andclassifier types. Then, we formulate the hyperparameter optimization process asan online Markov Decision Process (MDP) problem and propose a specificalgorithm to solve it. We apply local estimated rewards and a classic banditalgorithm Exp3 to address the issues when applying online MDP methods to theCIL protocol. Our method consistently improves top-performing CIL methods inboth TFH and TFS settings, e.g., boosting the average accuracy of TFH and TFSby 2.2 percentage points on ImageNet-Full, compared to the state-of-the-art.", "output": "Online Hyperparameter Optimization for Class-Incremental Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "AI-Generated Content (AIGC) has recently gained a surge in popularity,powered by its high efficiency and consistency in production, and itscapability of being customized and diversified. The cross-modality nature ofthe representation learning mechanism in most AIGC technology allows for morefreedom and flexibility in exploring new types of art that would be impossiblein the past. Inspired by the pictogram subset of Chinese characters, weproposed PaCaNet, a CycleGAN-based pipeline for producing novel artworks thatfuse two different art types, traditional Chinese painting and calligraphy. Inan effort to produce stable and diversified output, we adopted three maintechnical innovations: 1. Using one-shot learning to increase the creativity ofpre-trained models and diversify the content of the fused images. 2.Controlling the preference over generated Chinese calligraphy by freezingrandomly sampled parameters in pre-trained models. 3. Using a regularizationmethod to encourage the models to produce images similar to Chinese paintings.Furthermore, we conducted a systematic study to explore the performance ofPaCaNet in diversifying fused Chinese painting and calligraphy, which showedsatisfying results. In conclusion, we provide a new direction of creating artsby fusing the visual information in paintings and the stroke features inChinese calligraphy. Our approach creates a unique aesthetic experience rootedin the origination of Chinese hieroglyph characters. It is also a uniqueopportunity to delve deeper into traditional artwork and, in doing so, tocreate a meaningful impact on preserving and revitalizing traditional heritage.", "output": "PaCaNet: A Study on CycleGAN with Transfer Learning for Diversifying Fused Chinese Painting and Calligraphy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances in Transformers have come with a huge requirement oncomputing resources, highlighting the importance of developing efficienttraining techniques to make Transformer training faster, at lower cost, and tohigher accuracy by the efficient use of computation and memory resources. Thissurvey provides the first systematic overview of the efficient training ofTransformers, covering the recent progress in acceleration arithmetic andhardware, with a focus on the former. We analyze and compare methods that savecomputation and memory costs for intermediate tensors during training, togetherwith techniques on hardware/algorithm co-design. We finally discuss challengesand promising areas for future research.", "output": "A Survey on Efficient Training of Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Whole Slide Images (WSIs) or histopathology images are used in digitalpathology. WSIs pose great challenges to deep learning models for clinicaldiagnosis, owing to their size and lack of pixel-level annotations. With therecent advancements in computational pathology, newer multiple-instancelearning-based models have been proposed. Multiple-instance learning for WSIsnecessitates creating patches and uses the encoding of these patches fordiagnosis. These models use generic pre-trained models (ResNet-50 pre-trainedon ImageNet) for patch encoding. The recently proposed KimiaNet, a DenseNet121model pre-trained on TCGA slides, is a domain-specific pre-trained model. Thispaper shows the effect of domain-specific pre-training on WSI classification.To investigate the effect of domain-specific pre-training, we considered thecurrent state-of-the-art multiple-instance learning models, 1) CLAM, anattention-based model, and 2) TransMIL, a self-attention-based model, andevaluated the models' confidence and predictive performance in detectingprimary brain tumors - gliomas. Domain-specific pre-training improves theconfidence of the models and also achieves a new state-of-the-art performanceof WSI-based glioma subtype classification, showing a high clinicalapplicability in assisting glioma diagnosis. We will publicly share our codeand experimental results at", "output": "Domain-Specific Pre-training Improves Confidence in Whole Slide Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep unsupervised approaches are gathering increased attention forapplications such as pathology detection and segmentation in medical imagessince they promise to alleviate the need for large labeled datasets and aremore generalizable than their supervised counterparts in detecting any kind ofrare pathology. As the Unsupervised Anomaly Detection (UAD) literaturecontinuously grows and new paradigms emerge, it is vital to continuouslyevaluate and benchmark new methods in a common framework, in order to reassessthe state-of-the-art (SOTA) and identify promising research directions. To thisend, we evaluate a diverse selection of cutting-edge UAD methods on multiplemedical datasets, comparing them against the established SOTA in UAD for brainMRI. Our experiments demonstrate that newly developed feature-modeling methodsfrom the industrial and medical literature achieve increased performancecompared to previous work and set the new SOTA in a variety of modalities anddatasets. Additionally, we show that such methods are capable of benefitingfrom recently developed self-supervised pre-training algorithms, furtherincreasing their performance. Finally, we perform a series of experiments inorder to gain further insights into some unique characteristics of selectedmodels and datasets. Our code can be found under", "output": "Unsupervised Pathology Detection: A Deep Dive Into the State of the Art."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In autonomous driving, data augmentation is commonly used for improving 3Dobject detection. The most basic methods include insertion of copied objectsand rotation and scaling of the entire training frame. Numerous variants havebeen developed as well. The existing methods, however, are considerably limitedwhen compared to the variety of the real world possibilities. In this work, wedevelop a diversified and realistic augmentation method that can flexiblyconstruct a whole-body object, freely locate and rotate the object, and applyself-occlusion and external-occlusion accordingly. To improve the diversity ofthe whole-body object construction, we develop an iterative method thatstochastically combines multiple objects observed from the real world into asingle object. Unlike the existing augmentation methods, the constructedobjects can be randomly located and rotated in the training frame becauseproper occlusions can be reflected to the whole-body objects in the final step.Finally, proper self-occlusion at each local object level andexternal-occlusion at the global frame level are applied using the Hidden PointRemoval (HPR) algorithm that is computationally efficient. HPR is also used foradaptively controlling the point density of each object according to theobject's distance from the LiDAR. Experiment results show that the proposedDR.CPO algorithm is data-efficient and model-agnostic without incurring anycomputational overhead. Also, DR.CPO can improve mAP performance by 2.08% whencompared to the best 3D detection result known for KITTI dataset. The code isavailable at ", "output": "DR.CPO: Diversified and Realistic 3D Augmentation via Iterative Construction, Random Placement, and HPR Occlusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a simple pairwise sigmoid loss for image-text pre-training. Unlikestandard contrastive learning with softmax normalization, the sigmoid lossoperates solely on image-text pairs and does not require a global view of thepairwise similarities for normalization. The sigmoid loss simultaneously allowsfurther scaling up the batch size, while also performing better at smallerbatch sizes. With only four TPUv4 chips, we can train a Base CLIP model at 4kbatch size and a Large LiT model at 20k batch size, the latter achieves 84.5%ImageNet zero-shot accuracy in two days. This disentanglement of the batch sizefrom the loss further allows us to study the impact of examples vs pairs andnegative to positive ratio. Finally, we push the batch size to the extreme, upto one million, and find that the benefits of growing batch size quicklydiminish, with a more reasonable batch size of 32k being sufficient. We hopeour research motivates further explorations in improving the quality andefficiency of language-image pre-training.", "output": "Sigmoid Loss for Language Image Pre-Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a study on the soft-Dice loss, one of the most popularloss functions in medical image segmentation, for situations where noise ispresent in target labels. In particular, the set of optimal solutions arecharacterized and sharp bounds on the volume bias of these solutions areprovided. It is further shown that a sequence of soft segmentations convergingto optimal soft-Dice also converges to optimal Dice when converted to hardsegmentations using thresholding. This is an important result because soft-Diceis often used as a proxy for maximizing the Dice metric. Finally, experimentsconfirming the theoretical results are provided.", "output": "Noisy Image Segmentation With Soft-Dice."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Rotated bounding boxes drastically reduce output ambiguity of elongatedobjects, making it superior to axis-aligned bounding boxes. Despite theeffectiveness, rotated detectors are not widely employed. Annotating rotatedbounding boxes is such a laborious process that they are not provided in manydetection datasets where axis-aligned annotations are used instead. In thispaper, we propose a framework that allows the model to predict precise rotatedboxes only requiring cheaper axis-aligned annotation of the target dataset 1.To achieve this, we leverage the fact that neural networks are capable oflearning richer representation of the target domain than what is utilized bythe task. The under-utilized representation can be exploited to address a moredetailed task. Our framework combines task knowledge of an out-of-domain sourcedataset with stronger annotation and domain knowledge of the target datasetwith weaker annotation. A novel assignment process and projection loss are usedto enable the co-training on the source and target datasets. As a result, themodel is able to solve the more detailed task in the target domain, withoutadditional computation overhead during inference. We extensively evaluate themethod on various target datasets including fresh-produce dataset, HRSC2016 andSSDD. Results show that the proposed method consistently performs on par withthe fully supervised approach.", "output": "Knowledge Combination to Learn Rotated Detection Without Rotated Annotation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances in generative artificial intelligence (AI) have capturedworldwide attention. Tools such as Dalle-2 and ChatGPT suggest that taskspreviously thought to be beyond the capabilities of AI may now augment theproductivity of creative media in various new ways, including through thegeneration of synthetic video. This research paper explores the utility ofusing AI-generated synthetic video to create viable educational content foronline educational settings. To date, there is limited research investigatingthe real-world educational value of AI-generated synthetic media. To addressthis gap, we examined the impact of using AI-generated synthetic video in anonline learning platform on both learners content acquisition and learningexperience. We took a mixed-method approach, randomly assigning adult learners(n=83) into one of two micro-learning conditions, collecting pre- andpost-learning assessments, and surveying participants on their learningexperience. The control condition included a traditionally produced instructorvideo, while the experimental condition included a synthetic video with arealistic AI-generated character. The results show that learners in bothconditions demonstrated significant improvement from pre- to post-learning(p<.001), with no significant differences in gains between the two conditions(p=.80). In addition, no differences were observed in how learners perceivedthe traditional and synthetic videos. These findings suggest that AI-generatedsynthetic learning videos have the potential to be a viable substitute forvideos produced via traditional methods in online educational settings, makinghigh quality educational content more accessible across the globe.", "output": "Generative AI for learning: Investigating the potential of synthetic learning videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work presents a study on label noise in medical image segmentation byconsidering a noise model based on Gaussian field deformations. Such noise isof interest because it yields realistic looking segmentations and because it isunbiased in the sense that the expected deformation is the identity mapping.Efficient methods for sampling and closed form solutions for the marginalprobabilities are provided. Moreover, theoretically optimal solutions to theloss functions cross-entropy and soft-Dice are studied and it is shown how theydiverge as the level of noise increases. Based on recent work on loss functioncharacterization, it is shown that optimal solutions to soft-Dice can berecovered by thresholding solutions to cross-entropy with a particular a prioriunknown threshold that efficiently can be computed. This raises the questionwhether the decrease in performance seen when using cross-entropy as comparedto soft-Dice is caused by using the wrong threshold. The hypothesis isvalidated in 5-fold studies on three organ segmentation problems from theTotalSegmentor data set, using 4 different strengths of noise. The results showthat changing the threshold leads the performance of cross-entropy to go fromsystematically worse than soft-Dice to similar or better results thansoft-Dice.", "output": "Marginal Thresholding in Noisy Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Face anti-spoofing (FAS) is an essential mechanism for safeguarding theintegrity of automated face recognition systems. Despite substantialadvancements, the generalization of existing approaches to real-worldapplications remains challenging. This limitation can be attributed to thescarcity and lack of diversity in publicly available FAS datasets, which oftenleads to overfitting during training or saturation during testing. In terms ofquantity, the number of spoof subjects is a critical determinant. Most datasetscomprise fewer than 2,000 subjects. With regard to diversity, the majority ofdatasets consist of spoof samples collected in controlled environments usingrepetitive, mechanical processes. This data collection methodology results inhomogenized samples and a dearth of scenario diversity. To address theseshortcomings, we introduce the Wild Face Anti-Spoofing (WFAS) dataset, alarge-scale, diverse FAS dataset collected in unconstrained settings. Ourdataset encompasses 853,729 images of 321,751 spoof subjects and 529,571 imagesof 148,169 live subjects, representing a substantial increase in quantity.Moreover, our dataset incorporates spoof data obtained from the internet,spanning a wide array of scenarios and various commercial sensors, including 17presentation attacks (PAs) that encompass both 2D and 3D forms. This novel datacollection strategy markedly enhances FAS data diversity. Leveraging the WFASdataset and Protocol 1 (Known-Type), we host the Wild Face Anti-SpoofingChallenge at the CVPR2023 workshop. Additionally, we meticulously evaluaterepresentative methods using Protocol 1 and Protocol 2 (Unknown-Type). Throughan in-depth examination of the challenge outcomes and benchmark baselines, weprovide insightful analyses and propose potential avenues for future research.The dataset is released under Insightface.", "output": "Wild Face Anti-Spoofing Challenge 2023: Benchmark and Results."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have proven to be highly effective in generatinghigh-quality images. However, adapting large pre-trained diffusion models tonew domains remains an open challenge, which is critical for real-worldapplications. This paper proposes DiffFit, a parameter-efficient strategy tofine-tune large pre-trained diffusion models that enable fast adaptation to newdomains. DiffFit is embarrassingly simple that only fine-tunes the bias termand newly-added scaling factors in specific layers, yet resulting insignificant training speed-up and reduced model storage costs. Compared withfull fine-tuning, DiffFit achieves 2$times$ training speed-up and only needsto store approximately 0.12% of the total model parameters. Intuitivetheoretical analysis has been provided to justify the efficacy of scalingfactors on fast adaptation. On 8 downstream datasets, DiffFit achieves superioror competitive performances compared to the full fine-tuning while being moreefficient. Remarkably, we show that DiffFit can adapt a pre-trainedlow-resolution generative model to a high-resolution one by adding minimalcost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of3.02 on ImageNet 512$times$512 benchmark by fine-tuning only 25 epochs from apublic pre-trained ImageNet 256$times$256 checkpoint while being 30$times$more training efficient than the closest competitor.", "output": "DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent years have witnessed an exponential increase in the demand for facevideo compression, and the success of artificial intelligence has expanded theboundaries beyond traditional hybrid video coding. Generative coding approacheshave been identified as promising alternatives with reasonable perceptualrate-distortion trade-offs, leveraging the statistical priors of face videos.However, the great diversity of distortion types in spatial and temporaldomains, ranging from the traditional hybrid coding frameworks to generativemodels, present grand challenges in compressed face video quality assessment(VQA). In this paper, we introduce the large-scale Compressed Face VideoQuality Assessment (CFVQA) database, which is the first attempt tosystematically understand the perceptual quality and diversified compressiondistortions in face videos. The database contains 3,240 compressed face videoclips in multiple compression levels, which are derived from 135 source videoswith diversified content using six representative video codecs, including twotraditional methods based on hybrid coding frameworks, two end-to-end methods,and two generative methods. In addition, a FAce VideO IntegeRity (FAVOR) indexfor face video compression was developed to measure the perceptual quality,considering the distinct content characteristics and temporal priors of theface videos. Experimental results exhibit its superior performance on theproposed CFVQA dataset. The benchmark is now made publicly available at:", "output": "Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning to segmentation without large-scale samples is an inherentcapability of human. Recently, Segment Anything Model (SAM) performs thesignificant zero-shot image segmentation, attracting considerable attentionfrom the computer vision community. Here, we investigate the capability of SAMfor medical image analysis, especially for multi-phase liver tumor segmentation(MPLiTS), in terms of prompts, data resolution, phases. Experimental resultsdemonstrate that there might be a large gap between SAM and expectedperformance. Fortunately, the qualitative results show that SAM is a powerfulannotation tool for the community of interactive medical image segmentation.", "output": "When SAM Meets Medical Images: An Investigation of Segment Anything Model (SAM) on Multi-phase Liver Tumor Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial attacks aim to disturb the functionality of a target system byadding specific noise to the input samples, bringing potential threats tosecurity and robustness when applied to facial recognition systems. Althoughexisting defense techniques achieve high accuracy in detecting some specificadversarial faces (adv-faces), new attack methods especially GAN-based attackswith completely different noise patterns circumvent them and reach a higherattack success rate. Even worse, existing techniques require attack data beforeimplementing the defense, making it impractical to defend newly emergingattacks that are unseen to defenders. In this paper, we investigate theintrinsic generality of adv-faces and propose to generate pseudo adv-faces byperturbing real faces with three heuristically designed noise patterns. We arethe first to train an adv-face detector using only real faces and theirself-perturbations, agnostic to victim facial recognition systems, and agnosticto unseen attacks. By regarding adv-faces as out-of-distribution data, we thennaturally introduce a novel cascaded system for adv-face detection, whichconsists of training data self-perturbations, decision boundary regularization,and a max-pooling-based binary classifier focusing on abnormal local coloraberrations. Experiments conducted on LFW and CelebA-HQ datasets with eightgradient-based and two GAN-based attacks validate that our method generalizesto a variety of unseen adversarial attacks.", "output": "Detecting Adversarial Faces Using Only Real Face Self-Perturbations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Segment Anything Model (SAM) has recently gained popularity in the fieldof image segmentation. Thanks to its impressive capabilities in all-roundsegmentation tasks and its prompt-based interface, SAM has sparked intensivediscussion within the community. It is even said by many prestigious expertsthat image segmentation task has been \"finished\" by SAM. However, medical imagesegmentation, although an important branch of the image segmentation family,seems not to be included in the scope of Segmenting \"Anything\". Many individualexperiments and recent studies have shown that SAM performs subpar in medicalimage segmentation. A natural question is how to find the missing piece of thepuzzle to extend the strong segmentation capability of SAM to medical imagesegmentation. In this paper, instead of fine-tuning the SAM model, we proposeMed SAM Adapter, which integrates the medical specific domain knowledge to thesegmentation model, by a simple yet effective adaptation technique. Althoughthis work is still one of a few to transfer the popular NLP technique Adapterto computer vision cases, this simple implementation shows surprisingly goodperformance on medical image segmentation. A medical image adapted SAM, whichwe have dubbed Medical SAM Adapter (MSA), shows superior performance on 19medical image segmentation tasks with various image modalities including CT,MRI, ultrasound image, fundus image, and dermoscopic images. MSA outperforms awide range of state-of-the-art (SOTA) medical image segmentation methods, suchas nnUNet, TransUNet, UNetr, MedSegDiff, and also outperforms the fullyfine-turned MedSAM with a considerable performance gap. Code will be releasedat: ", "output": "Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large multimodal datasets have been instrumental in recent breakthroughs suchas CLIP, Stable Diffusion, and GPT-4. At the same time, datasets rarely receivethe same research attention as model architectures or training algorithms. Toaddress this shortcoming in the machine learning ecosystem, we introduceDataComp, a benchmark where the training code is fixed and researchers innovateby proposing new training sets. We provide a testbed for dataset experimentscentered around a new candidate pool of 12.8B image-text pairs from CommonCrawl. Participants in our benchmark design new filtering techniques or curatenew data sources and then evaluate their new dataset by running ourstandardized CLIP training code and testing on 38 downstream test sets. Ourbenchmark consists of multiple scales, with four candidate pool sizes andassociated compute budgets ranging from 12.8M to 12.8B samples seen duringtraining. This multi-scale design facilitates the study of scaling trends andmakes the benchmark accessible to researchers with varying resources.Our baseline experiments show that the DataComp workflow is a promising wayof improving multimodal datasets. We introduce DataComp-1B, a dataset createdby applying a simple filtering algorithm to the 12.8B candidate pool. Theresulting 1.4B subset enables training a CLIP ViT-L/14 from scratch to 79.2%zero-shot accuracy on ImageNet. Our new ViT-L/14 model outperforms a largerViT-g/14 trained on LAION-2B by 0.7 percentage points while requiring 9x lesstraining compute. We also outperform OpenAI's CLIP ViT-L/14 by 3.7 percentagepoints, which is trained with the same compute budget as our model. These gainshighlight the potential for improving model performance by carefully curatingtraining sets. We view DataComp-1B as only the first step and hope thatDataComp paves the way toward the next generation of multimodal datasets.", "output": "DataComp: In search of the next generation of multimodal datasets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurate 3D object detection in all weather conditions remains a keychallenge to enable the widespread deployment of autonomous vehicles, as mostwork to date has been performed on clear weather data. In order to generalizeto adverse weather conditions, supervised methods perform best if trained fromscratch on all weather data instead of finetuning a model pretrained on clearweather data. Training from scratch on all data will eventually becomecomputationally infeasible and expensive as datasets continue to grow andencompass the full extent of possible weather conditions. On the other hand,naive finetuning on data from a different weather domain can result incatastrophic forgetting of the previously learned domain. Inspired by thesuccess of replay-based continual learning methods, we propose Gradient-basedMaximally Interfered Retrieval (GMIR), a gradient based sampling strategy forreplay. During finetuning, GMIR periodically retrieves samples from theprevious domain dataset whose gradient vectors show maximal interference withthe gradient vector of the current update. Our 3D object detection experimentson the SeeingThroughFog (STF) dataset show that GMIR not only overcomesforgetting but also offers competitive performance compared to scratch trainingon all data with a 46.25% reduction in total training time.", "output": "Gradient-based Maximally Interfered Retrieval for Domain Incremental 3D Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "When applied to Image-to-text models, interpretability methods often providetoken-by-token explanations namely, they compute a visual explanation for eachtoken of the generated sequence. Those explanations are expensive to computeand unable to comprehensively explain the model's output. Therefore, thesemodels often require some sort of approximation that eventually leads tomisleading explanations. We develop a framework based on SHAP, that allows forgenerating comprehensive, meaningful explanations leveraging the meaningrepresentation of the output sequence as a whole. Moreover, by exploitingsemantic priors in the visual backbone, we extract an arbitrary number offeatures that allows the efficient computation of Shapley values on large-scalemodels, generating at the same time highly meaningful visual explanations. Wedemonstrate that our method generates semantically more expressive explanationsthan traditional methods at a lower compute cost and that it can be generalizedover other explainability methods.", "output": "Interpreting Vision and Language Generative Models with Semantic Visual Priors."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dataset Distillation aims to distill an entire dataset's knowledge into a fewsynthetic images. The idea is to synthesize a small number of synthetic datapoints that, when given to a learning algorithm as training data, result in amodel approximating one trained on the original data. Despite recent progressin the field, existing dataset distillation methods fail to generalize to newarchitectures and scale to high-resolution datasets. To overcome the aboveissues, we propose to use the learned prior from pre-trained deep generativemodels to synthesize the distilled data. To achieve this, we present a newoptimization algorithm that distills a large number of images into a fewintermediate feature vectors in the generative model's latent space. Our methodaugments existing techniques, significantly improving cross-architecturegeneralization in all settings.", "output": "Generalizing Dataset Distillation via Deep Generative Prior."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While the community of 3D point cloud generation has witnessed a big growthin recent years, there still lacks an effective way to enable intuitive usercontrol in the generation process, hence limiting the general utility of suchmethods. Since an intuitive way of decomposing a shape is through its parts, wepropose to tackle the task of controllable part-based point cloud generation.We introduce DiffFacto, a novel probabilistic generative model that learns thedistribution of shapes with part-level control. We propose a factorization thatmodels independent part style and part configuration distributions and presentsa novel cross-diffusion network that enables us to generate coherent andplausible shapes under our proposed factorization. Experiments show that ourmethod is able to generate novel shapes with multiple axes of control. Itachieves state-of-the-art part-level generation quality and generates plausibleand coherent shapes while enabling various downstream editing applications suchas shape interpolation, mixing, and transformation editing. Project website:", "output": "DiffFacto: Controllable Part-Based 3D Point Cloud Generation with Cross Diffusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose to Transform Scene Graphs (TSG) into more descriptive captions. InTSG, we apply multi-head attention (MHA) to design the Graph Neural Network(GNN) for embedding scene graphs. After embedding, different graph embeddingscontain diverse specific knowledge for generating the words with differentpart-of-speech, e.g., object/attribute embedding is good for generatingnouns/adjectives. Motivated by this, we design a Mixture-of-Expert (MOE)-baseddecoder, where each expert is built on MHA, for discriminating the graphembeddings to generate different kinds of words. Since both the encoder anddecoder are built based on the MHA, as a result, we construct a homogeneousencoder-decoder unlike the previous heterogeneous ones which usually applyFully-Connected-based GNN and LSTM-based decoder. The homogeneous architectureenables us to unify the training configuration of the whole model instead ofspecifying different training strategies for diverse sub-networks as in theheterogeneous pipeline, which releases the training difficulty. Extensiveexperiments on the MS-COCO captioning benchmark validate the effectiveness ofour TSG. The code is in: ", "output": "Transforming Visual Scene Graphs to Image Captions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Two-dimensional, resonant scanners have been utilized in a large variety ofimaging modules due to their compact form, low power consumption, large angularrange, and high speed. However, resonant scanners have problems withnon-optimal and inflexible scanning patterns and inherent phase uncertainty,which limit practical applications. Here we propose methods for optimizeddesign and control of the scanning trajectory of two-dimensional resonantscanners under various physical constraints, including high frame-rate andlimited actuation amplitude. First, we propose an analytical design rule foruniform spatial sampling. We demonstrate theoretically and experimentally thatby including non-repeating scanning patterns, the proposed designs outperformprevious designs in terms of scanning range and fill factor. Second, we showthat we can create flexible scanning patterns that allow focusing onuser-defined Regions-of-Interest (RoI) by modulation of the scanningparameters. The scanning parameters are found by an optimization algorithm. Insimulations, we demonstrate the benefits of these designs with standard metricsand higher-level computer vision tasks (LiDAR odometry and 3D objectdetection). Finally, we experimentally implement and verify both unmodulatedand modulated scanning modes using a two-dimensional, resonant MEMS scanner.Central to the implementations is high bandwidth monitoring of the phase of theangular scans in both dimensions. This task is carried out with aposition-sensitive photodetector combined with high-bandwidth electronics,enabling fast spatial sampling at ~ 100Hz frame-rate.", "output": "Resonant Scanning Design and Control for Fast Spatial Sampling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Advanced metering infrastructure (AMI) has been widely used as an intelligentenergy consumption measurement system. Electric power was the representativeenergy source that can be collected by AMI; most existing studies to detectabnormal energy consumption have focused on a single energy source, i.e.,power. Recently, other energy sources such as water, gas, and heating have alsobeen actively collected. As a result, it is necessary to develop a unifiedmethodology for anomaly detection across multiple energy sources; however,research efforts have rarely been made to tackle this issue. The inherentdifficulty with this issue stems from the fact that anomalies are not usuallyannotated. Moreover, existing works of anomaly definition depend on onlyindividual energy sources. In this paper, we first propose a method fordefining anomalies considering not only individual energy sources but alsocorrelations between them. Then, we propose a new Correlation-drivenMulti-Level Multimodal Learning model for anomaly detection on multiple energysources. The distinguishing property of the model incorporates multiple energysources in multi-levels based on the strengths of the correlations betweenthem. Furthermore, we generalize the proposed model in order to integratearbitrary new energy sources with further performance improvement, consideringnot only correlated but also non-correlated sources. Through extensiveexperiments on real-world datasets consisting of three to five energy sources,we demonstrate that the proposed model clearly outperforms the existingmultimodal learning and recent time-series anomaly detection models, and weobserve that our model makes further the performance improvement as morecorrelated or non-correlated energy sources are integrated.", "output": "Correlation-Driven Multi-Level Multimodal Learning for Anomaly Detection on Multiple Energy Sources."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Culling newly hatched male chicks in industrial hatcheries poses a seriousethical problem. Both laying and broiler breeders need males, but it is aproblem because they are produced more than needed. Being able to determine thesex of chicks in the egg at the beginning or early stage of incubation caneliminate ethical problems as well as many additional costs. When we look atthe literature, the methods used are very costly, low in applicability,invasive, inadequate in accuracy, or too late to eliminate ethical problems.Considering the embryo's development, the earliest observed candidate featurefor sex determination is blood vessels. Detection from blood vessels caneliminate ethical issues, and these vessels can be seen when light is shinedinto the egg until the first seven days. In this study, sex determination wasmade by morphological analysis from embryonic vascular images obtained in thefirst week when the light was shined into the egg using a standard camerawithout any invasive procedure to the egg.", "output": "Sex Detection in the Early Stage of Fertilized Chicken Eggs via Image Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We develop a diagrammatic approach to effective field theories (EFTs)corresponding to deep neural networks at initialization, which dramaticallysimplifies computations of finite-width corrections to neuron statistics. Thestructures of EFT calculations make it transparent that a single conditiongoverns criticality of all connected correlators of neuron preactivations.Understanding of such EFTs may facilitate progress in both deep learning andfield theory simulations.", "output": "Structures of Neural Network Effective Theories."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper evaluates the viability of using fixed language models fortraining text classification networks on low-end hardware. We combine languagemodels with a CNN architecture and put together a comprehensive benchmark with8 datasets covering single-label and multi-label classification of topic,sentiment, and genre. Our observations are distilled into a list of trade-offs,concluding that there are scenarios, where not fine-tuning a language modelyields competitive effectiveness at faster training, requiring only a quarterof the memory compared to fine-tuning.", "output": "Using Language Models on Low-end Hardware."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As Machine Learning models are considered for autonomous decisions withsignificant social impact, the need for understanding how these models workrises rapidly. Explainable Artificial Intelligence (XAI) aims to provideinterpretations for predictions made by Machine Learning models, in order tomake the model trustworthy and more transparent for the user. For example,selecting relevant input variables for the problem directly impacts the model'sability to learn and make accurate predictions, so obtaining information aboutinput importance play a crucial role when training the model. One of the mainXAI techniques to obtain input variable importance is the sensitivity analysisbased on partial derivatives. However, existing literature of this methodprovide no justification of the aggregation metrics used to retrievedinformation from the partial derivatives.In this paper, a theoretical framework is proposed to study sensitivities ofML models using metric techniques. From this metric interpretation, a completefamily of new quantitative metrics called $alpha$-curves is extracted. These$alpha$-curves provide information with greater depth on the importance of theinput variables for a machine learning model than existing XAI methods in theliterature. We demonstrate the effectiveness of the $alpha$-curves usingsynthetic and real datasets, comparing the results against other XAI methodsfor variable importance and validating the analysis results with the groundtruth or literature information.", "output": "Metric Tools for Sensitivity Analysis with Applications to Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Detecting plagiarism involves finding similar items in two different sources.In this article, we propose a novel method for detecting plagiarism that isbased on attention mechanism-based long short-term memory (LSTM) andbidirectional encoder representations from transformers (BERT) word embedding,enhanced with optimized differential evolution (DE) method for pre-training anda focal loss function for training. BERT could be included in a downstream taskand fine-tuned as a task-specific BERT can be included in a downstream task andfine-tuned as a task-specific structure, while the trained BERT model iscapable of detecting various linguistic characteristics. Unbalancedclassification is one of the primary issues with plagiarism detection. Wesuggest a focal loss-based training technique that carefully learns minorityclass instances to solve this. Another issue that we tackle is the trainingphase itself, which typically employs gradient-based methods likeback-propagation for the learning process and thus suffers from some drawbacks,including sensitivity to initialization. To initiate the BP process, we suggesta novel DE algorithm that makes use of a clustering-based mutation operator.Here, a winning cluster is identified for the current DE population, and afresh updating method is used to produce potential answers. We evaluate ourproposed approach on three benchmark datasets ( MSRP, SNLI, and SemEval2014)and demonstrate that it performs well when compared to both conventional andpopulation-based methods.", "output": "A Novel Plagiarism Detection Approach Combining BERT-based Word Embedding, Attention-based LSTMs and an Improved Differential Evolution Algorithm."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning tasks over image databases often generate masks thatannotate image content (e.g., saliency maps, segmentation maps) and enable avariety of applications (e.g., determine if a model is learning spuriouscorrelations or if an image was maliciously modified to mislead a model). Whilequeries that retrieve examples based on mask properties are valuable topractitioners, existing systems do not support such queries efficiently. Inthis paper, we formalize the problem and propose a system, MaskSearch, thatfocuses on accelerating queries over databases of image masks. MaskSearchleverages a novel indexing technique and an efficient filter-verification queryexecution framework. Experiments on real-world datasets with our prototype showthat MaskSearch, using indexes approximately 5% the size of the data,accelerates individual queries by up to two orders of magnitude andconsistently outperforms existing methods on various multi-query workloads thatsimulate dataset exploration and analysis processes.", "output": "MaskSearch: Querying Image Masks at Scale."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work investigates pretrained audio representations for few shot SoundEvent Detection. We specifically address the task of few shot detection ofnovel acoustic sequences, or sound events with semantically meaningful temporalstructure, without assuming access to non-target audio. We develop proceduresfor pretraining suitable representations, and methods which transfer them toour few shot learning scenario. Our experiments evaluate the general purposeutility of our pretrained representations on AudioSet, and the utility ofproposed few shot methods via tasks constructed from real-world acousticsequences. Our pretrained embeddings are suitable to the proposed task, andenable multiple aspects of our few shot framework.", "output": "Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We investigate the ability of transformer models to approximate the CKYalgorithm, using them to directly predict a parse and thus avoid the CKYalgorithm's cubic dependence on sentence length. We find that on standardconstituency parsing benchmarks this approach achieves competitive or betterperformance than comparable parsers that make use of CKY, while being faster.We also evaluate the viability of this approach for parsing under random PCFGs.Here we find that performance declines as the grammar becomes more ambiguous,suggesting that the transformer is not fully capturing the CKY computation.However, we also find that incorporating additional inductive bias is helpful,and we propose a novel approach that makes use of gradients with respect tochart representations in predicting the parse, in analogy with the CKYalgorithm being the subgradient of a partition function variant with respect tothe chart.", "output": "Approximating CKY with Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Textual backdoor attack, as a novel attack model, has been shown to beeffective in adding a backdoor to the model during training. Defending againstsuch backdoor attacks has become urgent and important. In this paper, wepropose AttDef, an efficient attribution-based pipeline to defend against twoinsertion-based poisoning attacks, BadNL and InSent. Specifically, we regardthe tokens with larger attribution scores as potential triggers since largerattribution words contribute more to the false prediction results and thereforeare more likely to be poison triggers. Additionally, we further utilize anexternal pre-trained language model to distinguish whether input is poisoned ornot. We show that our proposed method can generalize sufficiently well in twocommon attack scenarios (poisoning training data and testing data), whichconsistently improves previous methods. For instance, AttDef can successfullymitigate both attacks with an average accuracy of 79.97% (56.59% up) and 48.34%(3.99% up) under pre-training and post-training attack defense respectively,achieving the new state-of-the-art performance on prediction recovery over fourbenchmark datasets.", "output": "Defending against Insertion-based Textual Backdoor Attacks via Attribution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the increasing number and sophistication of malware attacks, malwaredetection systems based on machine learning (ML) grow in importance. At thesame time, many popular ML models used in malware classification are supervisedsolutions. These supervised classifiers often do not generalize well to novelmalware. Therefore, they need to be re-trained frequently to detect new malwarespecimens, which can be time-consuming. Our work addresses this problem in ahybrid framework of theoretical Quantum ML, combined with feature selectionstrategies to reduce the data size and malware classifier training time. Thepreliminary results show that VQC with XGBoost selected features can get a78.91% test accuracy on the simulator. The average accuracy for the modeltrained using the features selected with XGBoost was 74% (+- 11.35%) on the IBM5 qubits machines.", "output": "Can Feature Engineering Help Quantum Machine Learning for Malware Detection?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Carbon offset programs are critical in the fight against climate change. Oneemerging threat to the long-term stability and viability of forest carbonoffset projects is wildfires, which can release large amounts of carbon andlimit the efficacy of associated offsetting credits. However, analysis ofwildfire risk to forest carbon projects is challenging because existing modelsfor forecasting long-term fire risk are limited in predictive accuracy.Therefore, we propose an explainable artificial intelligence (XAI) modeltrained on 7 million global satellite wildfire observations. Validation resultssuggest substantial potential for high resolution, enhanced accuracyprojections of global wildfire risk, and the model outperforms the U.S.National Center for Atmospheric Research's leading fire model. Applied to acollection of 190 global forest carbon projects, we find that fire exposure isprojected to increase 55% [37-76%] by 2080 under a mid-range scenario(SSP2-4.5). Our results indicate the large wildfire carbon project damages seenin the past decade are likely to become more frequent as forests become hotterand drier. In response, we hope the model can support wildfire managers,policymakers, and carbon market analysts to preemptively quantify and mitigatelong-term permanence risks to forest carbon projects.", "output": "Widespread Increases in Future Wildfire Risk to Global Forest Carbon Offset Projects Revealed by Explainable AI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning algorithms have the potential to improve patient outcomes indigital pathology. However, generalization of these tools is currently limitedby sensitivity to variations in tissue preparation, staining procedures andscanning equipment that lead to domain shift in digitized slides. To overcomethis limitation and improve model generalization, we studied the effectivenessof two Synthetic DOmain-Targeted Augmentation (S-DOTA) methods, namelyCycleGAN-enabled Scanner Transform (ST) and targeted Stain Vector Augmentation(SVA), and compared them against the International Color Consortium (ICC)profile-based color calibration (ICC Cal) method and a baseline method usingtraditional brightness, color and noise augmentations. We evaluated the abilityof these techniques to improve model generalization to various tasks andsettings: four models, two model types (tissue segmentation and cellclassification), two loss functions, six labs, six scanners, and threeindications (hepatocellular carcinoma (HCC), nonalcoholic steatohepatitis(NASH), prostate adenocarcinoma). We compared these methods based on themacro-averaged F1 scores on in-distribution (ID) and out-of-distribution (OOD)test sets across multiple domains, and found that S-DOTA methods (i.e., ST andSVA) led to significant improvements over ICC Cal and baseline on OOD datawhile maintaining comparable performance on ID data. Thus, we demonstrate thatS-DOTA may help address generalization due to domain shift in real worldapplications.", "output": "Synthetic DOmain-Targeted Augmentation (S-DOTA) Improves Model Generalization in Digital Pathology."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Applications of normalizing flows to the sampling of field configurations inlattice gauge theory have so far been explored almost exclusively in twospace-time dimensions. We report new algorithmic developments ofgauge-equivariant flow architectures facilitating the generalization tohigher-dimensional lattice geometries. Specifically, we discuss maskedautoregressive transformations with tractable and unbiased Jacobiandeterminants, a key ingredient for scalable and asymptotically exact flow-basedsampling algorithms. For concreteness, results from a proof-of-principleapplication to SU(3) lattice gauge theory in four space-time dimensions arereported.", "output": "Normalizing flows for lattice gauge theory in arbitrary space-time dimension."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Pre-trained large language models (LLMs) capture procedural knowledge aboutthe world. Recent work has leveraged LLM's ability to generate abstract plansto simplify challenging control tasks, either by action scoring, or actionmodeling (fine-tuning). However, the transformer architecture inherits severalconstraints that make it difficult for the LLM to directly serve as the agent:e.g. limited input lengths, fine-tuning inefficiency, bias from pre-training,and incompatibility with non-text environments. To maintain compatibility witha low-level trainable actor, we propose to instead use the knowledge in LLMs tosimplify the control problem, rather than solving it. We propose the Plan,Eliminate, and Track (PET) framework. The Plan module translates a taskdescription into a list of high-level sub-tasks. The Eliminate module masks outirrelevant objects and receptacles from the observation for the currentsub-task. Finally, the Track module determines whether the agent hasaccomplished each sub-task. On the AlfWorld instruction following benchmark,the PET framework leads to a significant 15% improvement over SOTA forgeneralization to human goal specifications.", "output": "Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The mobile cloud gaming industry has been rapidly growing over the lastdecade. When streaming gaming videos are transmitted to customers' clientdevices from cloud servers, algorithms that can monitor distorted video qualitywithout having any reference video available are desirable tools. However,creating No-Reference Video Quality Assessment (NR VQA) models that canaccurately predict the quality of streaming gaming videos rendered by computergraphics engines is a challenging problem, since gaming content generallydiffers statistically from naturalistic videos, often lacks detail, andcontains many smooth regions. Until recently, the problem has been furthercomplicated by the lack of adequate subjective quality databases of mobilegaming content. We have created a new gaming-specific NR VQA model called theGaming Video Quality Evaluator (GAMIVAL), which combines and leverages theadvantages of spatial and temporal gaming distorted scene statistics models, aneural noise model, and deep semantic features. Using a support vectorregression (SVR) as a regressor, GAMIVAL achieves superior performance on thenew LIVE-Meta Mobile Cloud Gaming (LIVE-Meta MCG) video quality database.", "output": "GAMIVAL: Video Quality Prediction on Mobile Cloud Gaming Content."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) power many state-of-the-art systems in naturallanguage processing. However, these models are extremely computationallyexpensive, even at inference time, raising the natural question: when is theextra cost of deploying a larger model worth the anticipated boost incapabilities? Better understanding this tradeoff fundamentally could benefitfrom an inference efficiency metric that is both (i) easily comparable acrossmodels from different providers, and (ii) representative of the true cost ofrunning queries in an isolated performance environment. Unfortunately, accessto LLMs today is largely restricted to black-box text generation APIs and rawruntimes measured through this interface do not satisfy these desiderata: modelproviders can apply various software and hardware optimizations orthogonal tothe model, and models served on shared infrastructure are susceptible toperformance contention. To circumvent these problems, we propose a new metricfor comparing inference efficiency across models. This metric puts models onequal footing as though they were served (i) on uniform hardware and software,and (ii) without performance contention. We call this metric theemph{idealized runtime}, and we propose a methodology to efficiently estimatethis metric for autoregressive Transformer models. We also propose cost-awarevariants that incorporate the number of accelerators needed to serve the model.Using these metrics, we compare ten state-of-the-art LLMs to provide the firstanalysis of inference efficiency-capability tradeoffs; we make severalobservations from this analysis, including the fact that the superior inferenceruntime performance of certain APIs is often a byproduct of optimizationswithin the API rather than the underlying model. Our methodology alsofacilitates the efficient comparison of different software and hardware stacks.", "output": "Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most of the existing federated multi-armed bandits (FMAB) designs are basedon the presumption that clients will implement the specified design tocollaborate with the server. In reality, however, it may not be possible tomodify the client's existing protocols. To address this challenge, this workfocuses on clients who always maximize their individual cumulative rewards, andintroduces a novel idea of \"reward teaching\", where the server guides theclients towards global optimality through implicit local reward adjustments.Under this framework, the server faces two tightly coupled tasks of banditlearning and target teaching, whose combination is non-trivial and challenging.A phased approach, called Teaching-After-Learning (TAL), is first designed toencourage and discourage clients' explorations separately. General performanceanalyses of TAL are established when the clients' strategies satisfy certainmild requirements. With novel technical approaches developed to analyze thewarm-start behaviors of bandit algorithms, particularized guarantees of TALwith clients running UCB or epsilon-greedy strategies are then obtained. Theseresults demonstrate that TAL achieves logarithmic regrets while only incurringlogarithmic adjustment costs, which is order-optimal w.r.t. a natural lowerbound. As a further extension, the Teaching-While-Learning (TWL) algorithm isdeveloped with the idea of successive arm elimination to break the non-adaptivephase separation in TAL. Rigorous analyses demonstrate that when facing clientswith UCB1, TWL outperforms TAL in terms of the dependencies on sub-optimalitygaps thanks to its adaptive design. Experimental results demonstrate theeffectiveness and generality of the proposed algorithms.", "output": "Reward Teaching for Federated Multi-armed Bandits."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurately estimating the probability of failure for safety-critical systemsis important for certification. Estimation is often challenging due tohigh-dimensional input spaces, dangerous test scenarios, and computationallyexpensive simulators; thus, efficient estimation techniques are important tostudy. This work reframes the problem of black-box safety validation as aBayesian optimization problem and introduces an algorithm, Bayesian safetyvalidation, that iteratively fits a probabilistic surrogate model toefficiently predict failures. The algorithm is designed to search for failures,compute the most-likely failure, and estimate the failure probability over anoperating domain using importance sampling. We introduce a set of threeacquisition functions that focus on reducing uncertainty by covering the designspace, optimizing the analytically derived failure boundaries, and sampling thepredicted failure regions. Mainly concerned with systems that only output abinary indication of failure, we show that our method also works well in caseswhere more output information is available. Results show that Bayesian safetyvalidation achieves a better estimate of the probability of failure usingorders of magnitude fewer samples and performs well across various safetyvalidation metrics. We demonstrate the algorithm on three test problems withaccess to ground truth and on a real-world safety-critical subsystem common inautonomous flight: a neural network-based runway detection system. This work isopen sourced and currently being used to supplement the FAA certificationprocess of the machine learning components for an autonomous cargo aircraft.", "output": "Bayesian Safety Validation for Black-Box Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Since its inception in Erikki Oja's seminal paper in 1982, Oja's algorithmhas become an established method for streaming principle component analysis(PCA). We study the problem of streaming PCA, where the data-points are sampledfrom an irreducible, aperiodic, and reversible Markov chain. Our goal is toestimate the top eigenvector of the unknown covariance matrix of the stationarydistribution. This setting has implications in situations where data can onlybe sampled from a Markov Chain Monte Carlo (MCMC) type algorithm, and the goalis to do inference for parameters of the stationary distribution of this chain.Most convergence guarantees for Oja's algorithm in the literature assume thatthe data-points are sampled IID. For data streams with Markovian dependence,one typically downsamples the data to get a \"nearly\" independent data stream.In this paper, we obtain the first sharp rate for Oja's algorithm on the entiredata, where we remove the logarithmic dependence on $n$ resulting from throwingdata away in downsampling strategies.", "output": "Streaming PCA for Markovian Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While transformer-based systems have enabled greater accuracies with fewertraining examples, data acquisition obstacles still persist for rare-classtasks -- when the class label is very infrequent (e.g. < 5% of samples). Activelearning has in general been proposed to alleviate such challenges, but choiceof selection strategy, the criteria by which rare-class examples are chosen,has not been systematically evaluated. Further, transformers enable iterativetransfer-learning approaches. We propose and investigate transfer- and activelearning solutions to the rare class problem of dissonance detection throughutilizing models trained on closely related tasks and the evaluation ofacquisition strategies, including a proposed probability-of-rare-class (PRC)approach. We perform these experiments for a specific rare class problem:collecting language samples of cognitive dissonance from social media. We findthat PRC is a simple and effective strategy to guide annotations and ultimatelyimprove model accuracy while transfer-learning in a specific order can improvethe cold-start performance of the learner but does not benefit iterations ofactive learning.", "output": "Transfer and Active Learning for Dissonance Detection: Addressing the Rare-Class Challenge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Fueled by the expressive power of deep neural networks, normalizing flowshave achieved spectacular success in generative modeling, or learning to drawnew samples from a distribution given a finite dataset of training samples.Normalizing flows have also been applied successfully to variational inference,wherein one attempts to learn a sampler based on an expression for thelog-likelihood or energy function of the distribution, rather than on data. Invariational inference, the unimodality of the reference Gaussian distributionused within the normalizing flow can cause difficulties in learning multimodaldistributions. We introduce an extension of normalizing flows in which theGaussian reference is replaced with a reference distribution that isconstructed via a tensor network, specifically a matrix product state or tensortrain. We show that by combining flows with tensor networks on difficultvariational inference tasks, we can improve on the results obtained by usingeither tool without the other.", "output": "Tensorizing flows: a tool for variational inference."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present Shap-E, a conditional generative model for 3D assets. Unlikerecent work on 3D generative models which produce a single outputrepresentation, Shap-E directly generates the parameters of implicit functionsthat can be rendered as both textured meshes and neural radiance fields. Wetrain Shap-E in two stages: first, we train an encoder that deterministicallymaps 3D assets into the parameters of an implicit function; second, we train aconditional diffusion model on outputs of the encoder. When trained on a largedataset of paired 3D and text data, our resulting models are capable ofgenerating complex and diverse 3D assets in a matter of seconds. When comparedto Point-E, an explicit generative model over point clouds, Shap-E convergesfaster and reaches comparable or better sample quality despite modeling ahigher-dimensional, multi-representation output space. We release modelweights, inference code, and samples at ", "output": "Shap-E: Generating Conditional 3D Implicit Functions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This is a speculative essay on interface design and artificial intelligence.Recently there has been a surge of attention to chatbots based on largelanguage models, including widely reported unsavory interactions. We contendthat part of the problem is that text is not all you need: sophisticated AIsystems should have dashboards, just like all other complicated devices.Assuming the hypothesis that AI systems based on neural networks will containinterpretable models of aspects of the world around them, we discuss what datasuch dashboards might display. We conjecture that, for many systems, the twomost important models will be of the user and of the system itself. We callthese the System Model and User Model. We argue that, for usability and safety,interfaces to dialogue-based AI systems should have a parallel display based onthe state of the System Model and the User Model. Finding ways to identify,interpret, and display these two models should be a core part of interfaceresearch for AI.", "output": "The System Model and the User Model: Exploring AI Dashboard Design."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most existing exoplanets are discovered using validation techniques ratherthan being confirmed by complementary observations. These techniques generate ascore that is typically the probability of the transit signal being anexoplanet (y(x)=exoplanet) given some information related to that signal(represented by x). Except for the validation technique in Rowe et al. (2014)that uses multiplicity information to generate these probability scores, theexisting validation techniques ignore the multiplicity boost information. Inthis work, we introduce a framework with the following premise: given anexisting transit signal vetter (classifier), improve its performance usingmultiplicity information. We apply this framework to several existingclassifiers, which include vespa (Morton et al. 2016), Robovetter (Coughlin etal. 2017), AstroNet (Shallue & Vanderburg 2018), ExoNet (Ansdel et al. 2018),GPC and RFC (Armstrong et al. 2020), and ExoMiner (Valizadegan et al. 2022), tosupport our claim that this framework is able to improve the performance of agiven classifier. We then use the proposed multiplicity boost framework forExoMiner V1.2, which addresses some of the shortcomings of the originalExoMiner classifier (Valizadegan et al. 2022), and validate 69 new exoplanetsfor systems with multiple KOIs from the Kepler catalog.", "output": "Multiplicity Boost Of Transit Signal Classifiers: Validation of 69 New Exoplanets Using The Multiplicity Boost of ExoMiner."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Random graphs are increasingly becoming objects of interest for modelingnetworks in a wide range of applications. Latent position random graph modelsposit that each node is associated with a latent position vector, and thatthese vectors follow some geometric structure in the latent space. In thispaper, we consider random dot product graphs, in which an edge is formedbetween two nodes with probability given by the inner product of theirrespective latent positions. We assume that the latent position vectors lie onan unknown one-dimensional curve and are coupled with a response covariate viaa regression model. Using the geometry of the underlying latent positionvectors, we propose a manifold learning and graph embedding technique topredict the response variable on out-of-sample nodes, and we establishconvergence guarantees for these responses. Our theoretical results aresupported by simulations and an application to Drosophila brain data.", "output": "Semisupervised regression in latent structure networks on unknown manifolds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine Learning Health Operations (MLHOps) is the combination of processesfor reliable, efficient, usable, and ethical deployment and maintenance ofmachine learning models in healthcare settings. This paper provides both asurvey of work in this area and guidelines for developers and clinicians todeploy and maintain their own models in clinical practice. We cover thefoundational concepts of general machine learning operations, describe theinitial setup of MLHOps pipelines (including data sources, preparation,engineering, and tools). We then describe long-term monitoring and updating(including data distribution shifts and model updating) and ethicalconsiderations (including bias, fairness, interpretability, and privacy). Thiswork therefore provides guidance across the full pipeline of MLHOps fromconception to initial and ongoing deployment.", "output": "MLHOps: Machine Learning for Healthcare Operations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Breast cancer is one of the most threatening diseases in women's life; thus,the early and accurate diagnosis plays a key role in reducing the risk of deathin a patient's life. Mammography stands as the reference technique for breastcancer screening; nevertheless, many countries still lack access to mammogramsdue to economic, social, and cultural issues. Latest advances in computationaltools, infrared cameras and devices for bio-impedance quantification, havegiven a chance to emerge other reference techniques like thermography, infraredthermography, electrical impedance tomography and biomarkers found in bloodtests, therefore being faster, reliable and cheaper than other methods. In thelast two decades, the techniques mentioned above have been considered asparallel and extended approaches for breast cancer diagnosis, as well manyauthors concluded that false positives and false negatives rates aresignificantly reduced. Moreover, when a screening method works together with acomputational technique, it generates a \"computer-aided diagnosis\" system. Thepresent work aims to review the last breakthroughs about the three techniquesmentioned earlier, suggested machine learning techniques to breast cancerdiagnosis, thus, describing the benefits of some methods in relation with otherones, such as, logistic regression, decision trees, random forest, deep andconvolutional neural networks. With this, we studied several hyperparametersoptimization approaches with parzen tree optimizers to improve the performanceof baseline models. An exploratory data analysis for each database and abenchmark of convolutional neural networks for the database of thermal imagesare presented. The benchmark process, reviews image classification techniqueswith convolutional neural networks, like, Resnet50, NasNetmobile,InceptionResnet and Xception.", "output": "Breast Cancer Diagnosis Using Machine Learning Techniques."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In face of the pressing need of decarbonization in the power sector, there-design of electricity market is necessary as a Marco-level approach toaccommodate the high penetration of renewable generations, and to achieve powersystem operation security, economic efficiency, and environmental friendliness.However, existing market design methodologies suffer from the lack ofcoordination among energy spot market (ESM), ancillary service market (ASM) andfinancial market (FM), i.e., the \"joint market\", and the lack of reliablesimulation-based verification. To tackle these deficiencies, this two-partpaper develops a paradigmatic theory and detailed methods of the joint marketdesign using reinforcement-learning (RL)-based simulation. In Part 1, thetheory and framework of this novel market design philosophy are proposed.First, the controversial market design options while designing the joint marketare summarized as the targeted research questions. Second, the Markov gamemodel is developed to describe the bidding game in the joint market,incorporating the market design options to be determined. Third, a framework ofdeploying multiple types of RL algorithms to simulate the market model isdeveloped. Finally, several market operation performance indicators areproposed to validate the market design based on the simulation results.", "output": "How to Use Reinforcement Learning to Facilitate Future Electricity Market Design? Part 1: A Paradigmatic Theory."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent years have witnessed the proliferation of traffic accidents, which ledwide researches on Automated Vehicle (AV) technologies to reduce vehicleaccidents, especially on risk assessment framework of AV technologies. However,existing time-based frameworks can not handle complex traffic scenarios andignore the motion tendency influence of each moving objects on the riskdistribution, leading to performance degradation. To address this problem, wenovelly propose a comprehensive driving risk management framework named RCP-RFbased on potential field theory under Connected and Automated Vehicles (CAV)environment, where the pedestrian risk metric are combined into a unifiedroad-vehicle driving risk management framework. Different from existingalgorithms, the motion tendency between ego and obstacle cars and thepedestrian factor are legitimately considered in the proposed framework, whichcan improve the performance of the driving risk model. Moreover, it requiresonly O(N 2) of time complexity in the proposed method. Empirical studiesvalidate the superiority of our proposed framework against state-of-the-artmethods on real-world dataset NGSIM and real AV platform.", "output": "RCP-RF: A Comprehensive Road-car-pedestrian Risk Management Framework based on Driving Risk Potential Field."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Combining Graph neural networks (GNNs) with contrastive learning for anomalydetection has drawn rising attention recently. Existing graph contrastiveanomaly detection (GCAD) methods have primarily focused on improving detectioncapability through graph augmentation and multi-scale contrast modules.However, the underlying mechanisms of how these modules work have not beenfully explored. We dive into the multi-scale and graph augmentation mechanismand observed that multi-scale contrast modules do not enhance the expression,while the multi-GNN modules are the hidden contributors. Previous studies havetended to attribute the benefits brought by multi-GNN to the multi-scalemodules. In the paper, we delve into the misconception and propose Multi-GNNand Augmented Graph contrastive framework MAG, which unified the existing GCADmethods in the contrastive self-supervised perspective. We extracted twovariants from the MAG framework, L-MAG and M-MAG. The L-MAG is the lightweightinstance of the MAG, which outperform the state-of-the-art on Cora and Pubmedwith the low computational cost. The variant M-MAG equipped with multi-GNNmodules further improve the detection performance. Our study sheds light on thedrawback of the existing GCAD methods and demonstrates the potential ofmulti-GNN and graph augmentation modules. Our code is available at", "output": "Revisiting Graph Contrastive Learning for Anomaly Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "AI tasks encompass a wide range of domains and fields. While numerous AImodels have been designed for specific tasks and applications, they oftenrequire considerable human efforts in finding the right model architecture,optimization algorithm, and hyperparameters. Recent advances in large languagemodels (LLMs) like ChatGPT show remarkable capabilities in various aspects ofreasoning, comprehension, and interaction. Consequently, we propose developingtask-oriented prompts and automatically utilizing LLMs to automate the trainingpipeline. To implement this concept, we present the AutoML-GPT, which employsGPT as the bridge to diverse AI models and dynamically trains models withoptimized hyperparameters. AutoML-GPT dynamically takes user requests from themodel and data cards and composes the corresponding prompt paragraph.Ultimately, with this prompt paragraph, AutoML-GPT will automatically conductthe experiments from data processing to model architecture, hyperparametertuning, and predicted training log. By leveraging {ours}'s robust languagecapabilities and the available AI models, AutoML-GPT can tackle numerousintricate AI tasks across various tasks and datasets. This approach achievesremarkable results in computer vision, natural language processing, and otherchallenging areas. Extensive experiments and ablation studies demonstrate thatour method can be general, effective, and beneficial for many AI tasks.", "output": "AutoML-GPT: Automatic Machine Learning with GPT."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Electronic Health Record (EHR) provides abundant information through variousmodalities. However, learning multi-modal EHR is currently facing two majorchallenges, namely, 1) data embedding and 2) cases with missing modality. Alack of shared embedding function across modalities can discard the temporalrelationship between different EHR modalities. On the other hand, most EHRstudies are limited to relying only on EHR Times-series, and therefore, missingmodality in EHR has not been well-explored. Therefore, in this study, weintroduce a Unified Multi-modal Set Embedding (UMSE) and Modality-AwareAttention (MAA) with Skip Bottleneck (SB). UMSE treats all EHR modalitieswithout a separate imputation module or error-prone carry-forward, whereas MAAwith SB learns missing modal EHR with effective modality-aware attention. Ourmodel outperforms other baseline models in mortality, vasopressor need, andintubation need prediction with the MIMIC-IV dataset.", "output": "Learning Missing Modal Electronic Health Records with Unified Multi-modal Data Embedding and Modality-Aware Attention."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A growing body of research on probabilistic programs and causal models hashighlighted the need to reason compositionally about model classes that extenddirected graphical models. Both probabilistic programs and causal models definea joint probability density over a set of random variables, and exhibit sparsestructure that can be used to reason about causation and conditionalindependence. This work builds on recent work on Markov categories ofprobabilistic mappings to define a category whose morphisms combine a jointdensity, factorized over each sample space, with a deterministic mapping fromsamples to return values. This is a step towards closing the gap between recentcategory-theoretic descriptions of probability measures, and the operationaldefinitions of factorized densities that are commonly employed in probabilisticprogramming and causal inference.", "output": "String Diagrams with Factorized Densities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Residual networks have shown great success and become indispensable in recentdeep neural network models. In this work, we aim to re-investigate the trainingprocess of residual networks from a novel social psychology perspective ofloafing, and further propose a new training scheme as well as three improvedstrategies for boosting residual networks beyond their performance limits.Previous research has suggested that residual networks can be considered asensembles of shallow networks, which implies that the final performance of aresidual network is influenced by a group of subnetworks. We identify apreviously overlooked problem that is analogous to social loafing, wheresubnetworks within a residual network are prone to exert less effort whenworking as part of a group compared to working alone. We define this problem astextit{network loafing}. Similar to the decreased individual productivity andoverall performance as demonstrated in society, network loafing inevitablycauses sub-par performance. Inspired by solutions from social psychology, wefirst propose a novel training scheme called stimulative training, whichrandomly samples a residual subnetwork and calculates the KL divergence lossbetween the sampled subnetwork and the given residual network for extrasupervision. In order to unleash the potential of stimulative training, wefurther propose three simple-yet-effective strategies, including a novel KL-loss that only aligns the network logits direction, random smaller inputs forsubnetworks, and inter-stage sampling rules. Comprehensive experiments andanalysis verify the effectiveness of stimulative training as well as its threeimproved strategies.", "output": "Stimulative Training++: Go Beyond The Performance Limits of Residual Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Magnetic resonance imaging (MRI) is known to have reduced signal-to-noiseratios (SNR) at lower field strengths, leading to signal degradation whenproducing a low-field MRI image from a high-field one. Therefore,reconstructing a high-field-like image from a low-field MRI is a complexproblem due to the ill-posed nature of the task. Additionally, obtaining pairedlow-field and high-field MR images is often not practical. We theoreticallyuncovered that the combination of these challenges renders conventional deeplearning methods that directly learn the mapping from a low-field MR image to ahigh-field MR image unsuitable. To overcome these challenges, we introduce anovel meta-learning approach that employs a teacher-student mechanism. Firstly,an optimal-transport-driven teacher learns the degradation process fromhigh-field to low-field MR images and generates pseudo-paired high-field andlow-field MRI images. Then, a score-based student solves the inverse problem ofreconstructing a high-field-like MR image from a low-field MRI within theframework of iterative regularization, by learning the joint distribution ofpseudo-paired images to act as a regularizer. Experimental results on reallow-field MRI data demonstrate that our proposed method outperformsstate-of-the-art unpaired learning methods.", "output": "Meta-Learning Enabled Score-Based Generative Model for 1.5T-Like Image Reconstruction from 0.5T MRI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent studies have shown that Binary Graph Neural Networks (GNNs) arepromising for saving computations of GNNs through binarized tensors. Priorwork, however, mainly focused on algorithm designs or training techniques,leaving it open to how to materialize the performance potential on acceleratorhardware fully. This work redesigns the binary GNN inference backend from theefficiency perspective. It fills the gap by proposing a series of abstractionsand techniques to map binary GNNs and their computations best to fit the natureof bit manipulations on GPUs. Results on real-world graphs with GCNs,GraphSAGE, and GraphSAINT show that the proposed techniques outperformstate-of-the-art binary GNN implementations by 8-22X with the same accuracymaintained. BitGNN code is publicly available.", "output": "BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We investigate an infinite-horizon average reward Markov Decision Process(MDP) with delayed, composite, and partially anonymous reward feedback. Thedelay and compositeness of rewards mean that rewards generated as a result oftaking an action at a given state are fragmented into different components, andthey are sequentially realized at delayed time instances. The partial anonymityattribute implies that a learner, for each state, only observes the aggregateof past reward components generated as a result of different actions taken atthat state, but realized at the observation instance. We propose an algorithmnamed $mathrm{DUCRL2}$ to obtain a near-optimal policy for this setting andshow that it achieves a regret bound of $tilde{mathcal{O}}left(DSsqrt{AT} +d (SA)^3right)$ where $S$ and $A$ are the sizes of the state and actionspaces, respectively, $D$ is the diameter of the MDP, $d$ is a parameter upperbounded by the maximum reward delay, and $T$ denotes the time horizon. Thisdemonstrates the optimality of the bound in the order of $T$, and an additiveimpact of the delay.", "output": "Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent research has shown that training low-rank neural networks caneffectively reduce the total number of trainable parameters without sacrificingpredictive accuracy, resulting in end-to-end speedups. However, low-rank modeltraining necessitates adjusting several additional factorizationhyperparameters, such as the rank of the factorization at each layer. In thispaper, we tackle this challenge by introducing Cuttlefish, an automatedlow-rank training approach that eliminates the need for tuning factorizationhyperparameters. Cuttlefish leverages the observation that after a few epochsof full-rank training, the stable rank (i.e., an approximation of the truerank) of each layer stabilizes at a constant value. Cuttlefish switches fromfull-rank to low-rank training once the stable ranks of all layers haveconverged, setting the dimension of each factorization to its correspondingstable rank. Our results show that Cuttlefish generates models up to 5.6 timessmaller than full-rank models, and attains up to a 1.2 times faster end-to-endtraining process while preserving comparable accuracy. Moreover, Cuttlefishoutperforms state-of-the-art low-rank model training methods and otherprominent baselines. The source code for our implementation can be found at:", "output": "Cuttlefish: Low-rank Model Training without All The Tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Interference is a ubiquitous problem in experiments conducted on two-sidedcontent marketplaces, such as Douyin (China's analog of TikTok). In many cases,creators are the natural unit of experimentation, but creators interfere witheach other through competition for viewers' limited time and attention. \"Naive\"estimators currently used in practice simply ignore the interference, but indoing so incur bias on the order of the treatment effect. We formalize theproblem of inference in such experiments as one of policy evaluation.Off-policy estimators, while unbiased, are impractically high variance. Weintroduce a novel Monte-Carlo estimator, based on \"Differences-in-Qs\" (DQ)techniques, which achieves bias that is second-order in the treatment effect,while remaining sample-efficient to estimate. On the theoretical side, ourcontribution is to develop a generalized theory of Taylor expansions for policyevaluation, which extends DQ theory to all major MDP formulations. On thepractical side, we implement our estimator on Douyin's experimentationplatform, and in the process develop DQ into a truly \"plug-and-play\" estimatorfor interference in real-world settings: one which provides robust, low-bias,low-variance treatment effect estimates; admits computationally cheap,asymptotically exact uncertainty quantification; and reduces MSE by 99%compared to the best existing alternatives in our applications.", "output": "Correcting for Interference in Experiments: A Case Study at Douyin."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study principal component analysis (PCA), where given a dataset in$mathbb{R}^d$ from a distribution, the task is to find a unit vector $v$ thatapproximately maximizes the variance of the distribution after being projectedalong $v$. Despite being a classical task, standard estimators fail drasticallyif the data contains even a small fraction of outliers, motivating the problemof robust PCA. Recent work has developed computationally-efficient algorithmsfor robust PCA that either take super-linear time or have sub-optimal errorguarantees. Our main contribution is to develop a nearly-linear time algorithmfor robust PCA with near-optimal error guarantees. We also develop asingle-pass streaming algorithm for robust PCA with memory usage nearly-linearin the dimension.", "output": "Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent advent of self-supervised pre-training techniques has led to asurge in the use of multimodal learning in form document understanding.However, existing approaches that extend the mask language modeling to othermodalities require careful multi-task tuning, complex reconstruction targetdesigns, or additional pre-training data. In FormNetV2, we introduce acentralized multimodal graph contrastive learning strategy to unifyself-supervised pre-training for all modalities in one loss. The graphcontrastive objective maximizes the agreement of multimodal representations,providing a natural interplay for all modalities without special customization.In addition, we extract image features within the bounding box that joins apair of tokens connected by a graph edge, capturing more targeted visual cueswithout loading a sophisticated and separately pre-trained image embedder.FormNetV2 establishes new state-of-the-art performance on FUNSD, CORD, SROIEand Payment benchmarks with a more compact model size.", "output": "FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With various AI tools such as ChatGPT becoming increasingly popular, we areentering a true AI era. We can foresee that exceptional AI tools will soon reapconsiderable profits. A crucial question arise: should AI tools share revenuewith their training data providers in additional to traditional stakeholdersand shareholders? The answer is Yes. Large AI tools, such as large languagemodels, always require more and better quality data to continuously improve,but current copyright laws limit their access to various types of data. Sharingrevenue between AI tools and their data providers could transform the currenthostile zero-sum game relationship between AI tools and a majority ofcopyrighted data owners into a collaborative and mutually beneficial one, whichis necessary to facilitate the development of a virtuous cycle among AI tools,their users and data providers that drives forward AI technology and builds ahealthy AI ecosystem. However, current revenue-sharing business models do notwork for AI tools in the forthcoming AI era, since the most widely used metricsfor website-based traffic and action, such as clicks, will be replaced by newmetrics such as prompts and cost per prompt for generative AI tools. Acompletely new revenue-sharing business model, which must be almost independentof AI tools and be easily explained to data providers, needs to establish aprompt-based scoring system to measure data engagement of each data provider.This paper systematically discusses how to build such a scoring system for alldata providers for AI tools based on classification and content similaritymodels, and outlines the requirements for AI tools or third parties to buildit. Sharing revenue with data providers using such a scoring system wouldencourage more data owners to participate in the revenue-sharing program. Thiswill be a utilitarian AI era where all parties benefit.", "output": "Should ChatGPT and Bard Share Revenue with Their Data Providers? A New Business Model for the AI Era."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present methods for conditional and residual coding in the context ofscalable coding for humans and machines. Our focus is on optimizing therate-distortion performance of the reconstruction task using the informationavailable in the computer vision task. We include an information analysis ofboth approaches to provide baselines and also propose an entropy model suitablefor conditional coding with increased modelling capacity and similartractability as previous work. We apply these methods to image reconstruction,using, in one instance, representations created for semantic segmentation onthe Cityscapes dataset, and in another instance, representations created forobject detection on the COCO dataset. In both experiments, we obtain similarperformance between the conditional and residual methods, with the resultingrate-distortion curves contained within our baselines.", "output": "Conditional and Residual Methods in Scalable Coding for Humans and Machines."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Laplacian regularized stratified models (LRSM) are models that utilize theexplicit or implicit network structure of the sub-problems as defined by thecategorical features called strata (e.g., age, region, time, forecast horizon,etc.), and draw upon data from neighboring strata to enhance the parameterlearning of each sub-problem. They have been widely applied in machine learningand signal processing problems, including but not limited to time seriesforecasting, representation learning, graph clustering, max-marginclassification, and general few-shot learning. Nevertheless, existing works onLRSM have either assumed a known graph or are restricted to specificapplications. In this paper, we start by showing the importance and sensitivityof graph weights in LRSM, and provably show that the sensitivity can bearbitrarily large when the parameter scales and sample sizes are heavilyimbalanced across nodes. We then propose a generic approach to jointly learnthe graph while fitting the model parameters by solving a single optimizationproblem. We interpret the proposed formulation from both a graph connectivityviewpoint and an end-to-end Bayesian perspective, and propose an efficientalgorithm to solve the problem. Convergence guarantees of the proposedoptimization algorithm is also provided despite the lack of global stronglysmoothness of the Laplacian regularization term typically required in theexisting literature, which may be of independent interest. Finally, weillustrate the efficiency of our approach compared to existing methods byvarious real-world numerical examples.", "output": "Joint Graph Learning and Model Fitting in Laplacian Regularized Stratified Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Layer Normalization (LayerNorm) is an inherent component in allTransformer-based models. In this paper, we show that LayerNorm is crucial tothe expressivity of the multi-head attention layer that follows it. This is incontrast to the common belief that LayerNorm's only role is to normalize theactivations during the forward pass, and their gradients during the backwardpass. We consider a geometric interpretation of LayerNorm and show that itconsists of two components: (a) projection of the input vectors to a $d-1$space that is orthogonal to the $left[1,1,...,1right]$ vector, and (b)scaling of all vectors to the same norm of $sqrt{d}$. We show that each ofthese components is important for the attention layer that follows it inTransformers: (a) projection allows the attention mechanism to create anattention query that attends to all keys equally, offloading the need to learnthis operation by the attention; and (b) scaling allows each key to potentiallyreceive the highest attention, and prevents keys from being \"un-select-able\".We show empirically that Transformers do indeed benefit from these propertiesof LayeNorm in general language modeling and even in computing simple functionssuch as \"majority\". Our code is available at .", "output": "On the Expressivity Role of LayerNorm in Transformers' Attention."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reinforcement learning (RL) agents are known to be vulnerable to evasionattacks during deployment. In single-agent environments, attackers can injectimperceptible perturbations on the policy or value network's inputs or outputs;in multi-agent environments, attackers can control an adversarial opponent toindirectly influence the victim's observation. Adversarial policies offer apromising solution to craft such attacks. Still, current approaches eitherrequire perfect or partial knowledge of the victim policy or suffer from sampleinefficiency due to the sparsity of task-related rewards. To overcome theselimitations, we propose the Intrinsically Motivated Adversarial Policy (IMAP)for efficient black-box evasion attacks in single- and multi-agent environmentswithout any knowledge of the victim policy. IMAP uses four intrinsic objectivesbased on state coverage, policy coverage, risk, and policy divergence toencourage exploration and discover stronger attacking skills. We also design anovel Bias-Reduction (BR) method to boost IMAP further. Our experimentsdemonstrate the effectiveness of these intrinsic objectives and BR in improvingadversarial policy learning in the black-box setting against multiple types ofvictim agents in various single- and multi-agent MuJoCo environments. Notably,our IMAP reduces the performance of the state-of-the-art robust WocaR-PPOagents by 34%-54% and achieves a SOTA attacking success rate of 83.91% inthe two-player zero-sum game YouShallNotPass.", "output": "IMAP: Intrinsically Motivated Adversarial Policy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Bayesian optimization (BO) is a powerful tool for seeking the global optimumof black-box functions. While evaluations of the black-box functions can behighly costly, it is desirable to reduce the use of expensive labeled data. Forthe first time, we introduce a teacher-student model to exploit semi-supervisedlearning that can make use of large amounts of unlabelled data under thecontext of BO. Importantly, we show that the selection of the validation andunlabeled data is key to the performance of BO. To optimize the sampling ofunlabeled data, we employ a black-box parameterized sampling distributionoptimized as part of the employed bi-level optimization framework. Taking onestep further, we demonstrate that the performance of BO can be further improvedby selecting unlabeled data from a dynamically fitted extreme valuedistribution. Our BO method operates in a learned latent space with reduceddimensionality, making it scalable to high-dimensional problems. The proposedapproach outperforms significantly the existing BO methods on several syntheticand real-world optimization tasks.", "output": "High-dimensional Bayesian Optimization via Semi-supervised Learning with Optimized Unlabeled Data Sampling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The critical heat flux (CHF) is an essential safety boundary in boiling heattransfer processes employed in high heat flux thermal-hydraulic systems.Identifying CHF is vital for preventing equipment damage and ensuring overallsystem safety, yet it is challenging due to the complexity of the phenomena.For an in-depth understanding of the complicated phenomena, variousmethodologies have been devised, but the acquisition of high-resolution data islimited by the substantial resource consumption required. This study presents adata-driven, image-to-image translation method for reconstructing thermal dataof a boiling system at CHF using conditional generative adversarial networks(cGANs). The supervised learning process relies on paired images, which includetotal reflection visualizations and infrared thermometry measurements obtainedfrom flow boiling experiments. Our proposed approach has the potential to notonly provide evidence connecting phase interface dynamics with thermaldistribution but also to simplify the laborious and time-consuming experimentalsetup and data-reduction procedures associated with infrared thermal imaging,thereby providing an effective solution for CHF diagnosis.", "output": "Critical heat flux diagnosis using conditional generative adversarial networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We provide a psychometric-grounded exposition of bias and fairness as appliedto a typical machine learning pipeline for affective computing. We expand on aninterpersonal communication framework to elucidate how to identify sources ofbias that may arise in the process of inferring human emotions and otherpsychological constructs from observed behavior. Various methods and metricsfor measuring fairness and bias are discussed along with pertinent implicationswithin the United States legal context. We illustrate how to measure some typesof bias and fairness in a case study involving automatic personality andhireability inference from multimodal data collected in video interviews formock job applications. We encourage affective computing researchers andpractitioners to encapsulate bias and fairness in their research processes andproducts and to consider their role, agency, and responsibility in promotingequitable and just systems.", "output": "Integrating Psychometrics and Computing Perspectives on Bias and Fairness in Affective Computing: A Case Study of Automated Video Interviews."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Artificial neural networks (ANNs) are increasingly used as research models,but questions remain about their generalizability and representationalinvariance. Biological neural networks under social constraints evolved toenable communicable representations, demonstrating generalization capabilities.This study proposes a communication protocol between cooperative agents toanalyze the formation of individual and shared abstractions and their impact ontask performance. This communication protocol aims to mimic language featuresby encoding high-dimensional information through low-dimensionalrepresentation. Using grid-world mazes and reinforcement learning, teacher ANNspass a compressed message to a student ANN for better task completion. Throughthis, the student achieves a higher goal-finding rate and generalizes the goallocation across task worlds. Further optimizing message content to maximizestudent reward improves information encoding, suggesting that an accuraterepresentation in the space of messages requires bi-directional input. Thishighlights the role of language as a common representation between agents andits implications on generalization capabilities.", "output": "A framework for the emergence and analysis of language in social learning agents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language models generate text based on successively sampling the next word. Adecoding procedure based on nucleus (top-$p$) sampling chooses from thesmallest possible set of words whose cumulative probability exceeds theprobability $p$. In this work, we assess whether a top-$p$ set is indeedaligned with its probabilistic meaning in various linguistic contexts. Weemploy conformal prediction, a calibration procedure that focuses on theconstruction of minimal prediction sets according to a desired confidencelevel, to calibrate the parameter $p$ as a function of the entropy of the nextword distribution. We find that OPT models are overconfident, and thatcalibration shows a moderate inverse scaling with model size.", "output": "Conformal Nucleus Sampling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In Causal Discovery with latent variables, We define two data paradigms:definite data: a single-skeleton structure with observed nodes single-value,and indefinite data: a set of multi-skeleton structures with observed nodesmulti-value. Multi,skeletons induce low sample utilization and multi valuesinduce incapability of the distribution assumption, both leading thatrecovering causal relations from indefinite data is, as of yet, largelyunexplored. We design the causal strength variational model to settle downthese two problems. Specifically, we leverage the causal strength instead ofindependent noise as latent variable to mediate evidence lower bound. By thisdesign ethos, The causal strength of different skeletons is regarded as adistribution and can be expressed as a single-valued causal graph matrix.Moreover, considering the latent confounders, we disentangle the causal graph Ginto two relatisubgraphs O and C. O contains pure relations between observednodes, while C represents the relations from latent variables to observednodes. We summarize the above designs as Confounding Disentanglement CausalDiscovery (biCD), which is tailored to learn causal representation fromindefinite data under the latent confounding. Finally, we conduct comprehensiveexperiments on synthetic and real-world data to demonstrate the effectivenessof our method.", "output": "Learning to Recover Causal Relationship from Indefinite Data in the Presence of Latent Confounders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Blahut-Arimoto (BA) algorithm has played a fundamental role in thenumerical computation of rate-distortion (RD) functions. This algorithmpossesses a desirable monotonic convergence property by alternativelyminimizing its Lagrangian with a fixed multiplier. In this paper, we propose anovel modification of the BA algorithm, letting the multiplier be updated ineach iteration via a one-dimensional root-finding step with respect to amonotonic univariate function, which can be efficiently implemented by Newton'smethod. This allows the multiplier to be updated in a flexible and efficientmanner, overcoming a major drawback of the original BA algorithm wherein themultiplier is fixed throughout iterations. Consequently, the modified algorithmis capable of directly computing the RD function for a given target distortion,without exploring the entire RD curve as in the original BA algorithm. Atheoretical analysis shows that the modified algorithm still converges to theRD function and the convergence rate is $Theta(1/n)$, where $n$ denotes thenumber of iterations. Numerical experiments demonstrate that the modifiedalgorithm directly computes the RD function with a given target distortion, andit significantly accelerates the original BA algorithm.", "output": "Variations on a Theme by Blahut and Arimoto."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we consider the generalization ability of deep widefeedforward ReLU neural networks defined on a bounded domain $mathcal Xsubset mathbb R^{d}$. We first demonstrate that the generalization ability ofthe neural network can be fully characterized by that of the corresponding deepneural tangent kernel (NTK) regression. We then investigate on the spectralproperties of the deep NTK and show that the deep NTK is positive definite on$mathcal{X}$ and its eigenvalue decay rate is $(d+1)/d$. Thanks to the wellestablished theories in kernel regression, we then conclude that multilayerwide neural networks trained by gradient descent with proper early stoppingachieve the minimax rate, provided that the regression function lies in thereproducing kernel Hilbert space (RKHS) associated with the corresponding NTK.Finally, we illustrate that the overfitted multilayer wide neural networks cannot generalize well on $mathbb S^{d}$.", "output": "Statistical Optimality of Deep Wide Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There has been a rapid growth in biomedical literature, yet capturing theheterogeneity of the bibliographic information of these articles remainsrelatively understudied. Although graph mining research via heterogeneous graphneural networks has taken center stage, it remains unclear whether theseapproaches capture the heterogeneity of the PubMed database, a vast digitalrepository containing over 33 million articles. We introduce PubMed GraphBenchmark (PGB), a new benchmark dataset for evaluating heterogeneous graphembeddings for biomedical literature. PGB is one of the largest heterogeneousnetworks to date and consists of 30 million English articles. The benchmarkcontains rich metadata including abstract, authors, citations, MeSH terms, MeSHhierarchy, and some other information. The benchmark contains an evaluationtask of 21 systematic reviews topics from 3 different datasets. In PGB, weaggregate the metadata associated with the biomedical articles from PubMed intoa unified source and make the benchmark publicly available for any futureworks.", "output": "PGB: A PubMed Graph Benchmark for Heterogeneous Network Representation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transforming a design into a high-quality product is a challenge in metaladditive manufacturing due to rare events which can cause defects to form.Detecting these events in-situ could, however, reduce inspection costs, enablecorrective action, and is the first step towards a future of tailored materialproperties. In this study a model is trained on laser input information topredict nominal laser melting conditions. An anomaly score is then calculatedby taking the difference between the predictions and new observations. Themodel is evaluated on a dataset with known defects achieving an F1 score of0.821. This study shows that anomaly detection methods are an important tool indeveloping robust defect detection methods.", "output": "In-situ Anomaly Detection in Additive Manufacturing with Graph Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We describe how interpretable boosting algorithms based on ridge-regularizedgeneralized linear models can be used to analyze high-dimensional environmentaldata. We illustrate this by using environmental, social, human and biophysicaldata to predict the financial vulnerability of farmers in Chile and Tunisiaagainst climate hazards. We show how group structures can be considered and howinteractions can be found in high-dimensional datasets using a novel 2-stepboosting approach. The advantages and efficacy of the proposed method are shownand discussed. Results indicate that the presence of interaction effects onlyimproves predictive power when included in two-step boosting. The mostimportant variable in predicting all types of vulnerabilities are naturalassets. Other important variables are the type of irrigation, economic assetsand the presence of crop damage of near farms.", "output": "Using interpretable boosting algorithms for modeling environmental and agricultural data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated Learning (FL) enables training ML models on edge clients withoutsharing data. However, the federated model's performance on local data varies,disincentivising the participation of clients who benefit little from FL. FairFL reduces accuracy disparity by focusing on clients with higher losses whilepersonalisation locally fine-tunes the model. Personalisation provides aparticipation incentive when an FL model underperforms relative to one trainedlocally. For situations where the federated model provides a lower accuracythan a model trained entirely locally by a client, personalisation improves theaccuracy of the pre-trained federated weights to be similar to or exceed thoseof the local client model. This paper evaluates two Fair FL (FFL) algorithms asstarting points for personalisation. Our results show that FFL provides nobenefit to relative performance in a language task and may double the number ofunderperforming clients for an image task. Instead, we proposePersonalisation-aware Federated Learning (PaFL) as a paradigm thatpre-emptively uses personalisation losses during training. Our technique showsa 50% reduction in the number of underperforming clients for the language taskwhile lowering the number of underperforming clients in the image task insteadof doubling it. Thus, evidence indicates that it may allow a broader set ofdevices to benefit from FL and represents a promising avenue for futureexperimentation and theoretical analysis.", "output": "Can Fair Federated Learning reduce the need for Personalisation?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generating explanations for reinforcement learning (RL) is challenging asactions may produce long-term effects on the future. In this paper, we developa novel framework for explainable RL by learning a causal world model withoutprior knowledge of the causal structure of the environment. The model capturesthe influence of actions, allowing us to interpret the long-term effects ofactions through causal chains, which present how actions influenceenvironmental variables and finally lead to rewards. Different from mostexplanatory models which suffer from low accuracy, our model remains accuratewhile improving explainability, making it applicable in model-based learning.As a result, we demonstrate that our causal model can serve as the bridgebetween explainability and learning.", "output": "Explainable Reinforcement Learning via a Causal World Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-domain learning (MDL) refers to simultaneously constructing a model ora set of models on datasets collected from different domains. Conventionalapproaches emphasize domain-shared information extraction and domain-privateinformation preservation, following the shared-private framework (SP models),which offers significant advantages over single-domain learning. However, thelimited availability of annotated data in each domain considerably hinders theeffectiveness of conventional supervised MDL approaches in real-worldapplications. In this paper, we introduce a novel method called multi-domaincontrastive learning (MDCL) to alleviate the impact of insufficient annotationsby capturing both semantic and structural information from both labeled andunlabeled data.Specifically, MDCL comprises two modules: inter-domain semanticalignment and intra-domain contrast. The former aims to align annotatedinstances of the same semantic category from distinct domains within a sharedhidden space, while the latter focuses on learning a cluster structure ofunlabeled instances in a private hidden space for each domain. MDCL is readilycompatible with many SP models, requiring no additional model parameters andallowing for end-to-end training. Experimental results across five textual andimage multi-domain datasets demonstrate that MDCL brings noticeable improvementover various SP models.Furthermore, MDCL can further be employed inmulti-domain active learning (MDAL) to achieve a superior initialization,eventually leading to better overall performance.", "output": "Multi-Domain Learning From Insufficient Annotations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The anonymity on the Darknet allows vendors to stay undetected by usingmultiple vendor aliases or frequently migrating between markets. Consequently,illegal markets and their connections are challenging to uncover on theDarknet. To identify relationships between illegal markets and their vendors,we propose VendorLink, an NLP-based approach that examines writing patterns toverify, identify, and link unique vendor accounts across text advertisements(ads) on seven public Darknet markets. In contrast to existing literature,VendorLink utilizes the strength of supervised pre-training to performclosed-set vendor verification, open-set vendor identification, andlow-resource market adaption tasks. Through VendorLink, we uncover (i) 15migrants and 71 potential aliases in the Alphabay-Dreams-Silk dataset, (ii) 17migrants and 3 potential aliases in the Valhalla-Berlusconi dataset, and (iii)75 migrants and 10 potential aliases in the Traderoute-Agora dataset.Altogether, our approach can help Law Enforcement Agencies (LEA) make moreinformed decisions by verifying and identifying migrating vendors and theirpotential aliases on existing and Low-Resource (LR) emerging Darknet markets.", "output": "VendorLink: An NLP approach for Identifying & Linking Vendor Migrants & Potential Aliases on Darknet Markets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated Learning (FL) aims to train machine learning models for multipleclients without sharing their own private data. Due to the heterogeneity ofclients' local data distribution, recent studies explore the personalized FLthat learns and deploys distinct local models with the help of auxiliary globalmodels. However, the clients can be heterogeneous in terms of not only localdata distribution, but also their computation and communication resources. Thecapacity and efficiency of personalized models are restricted by thelowest-resource clients, leading to sub-optimal performance and limitedpracticality of personalized FL. To overcome these challenges, we propose anovel approach named pFedGate for efficient personalized FL by adaptively andefficiently learning sparse local models. With a lightweight trainable gatinglayer, pFedGate enables clients to reach their full potential in model capacityby generating different sparse models accounting for both the heterogeneousdata distributions and resource constraints. Meanwhile, the computation andcommunication efficiency are both improved thanks to the adaptability betweenthe model sparsity and clients' resources. Further, we theoretically show thatthe proposed pFedGate has superior complexity with guaranteed convergence andgeneralization error. Extensive experiments show that pFedGate achievessuperior global accuracy, individual accuracy and efficiency simultaneouslyover state-of-the-art methods. We also demonstrate that pFedGate performsbetter than competitors in the novel clients participation and partial clientsparticipation scenarios, and can learn meaningful sparse local models adaptedto different data distributions.", "output": "Efficient Personalized Federated Learning via Sparse Model-Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work introduces interpretable regional descriptors, or IRDs, for local,model-agnostic interpretations. IRDs are hyperboxes that describe how anobservation's feature values can be changed without affecting its prediction.They justify a prediction by providing a set of \"even if\" arguments(semi-factual explanations), and they indicate which features affect aprediction and whether pointwise biases or implausibilities exist. A concreteuse case shows that this is valuable for both machine learning modelers andpersons subject to a decision. We formalize the search for IRDs as anoptimization problem and introduce a unifying framework for computing IRDs thatcovers desiderata, initialization techniques, and a post-processing method. Weshow how existing hyperbox methods can be adapted to fit into this unifiedframework. A benchmark study compares the methods based on several qualitymeasures and identifies two strategies to improve IRDs.", "output": "Interpretable Regional Descriptors: Hyperbox-Based Local Explanations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A large-scale dynamic network (LDN) is a source of data in many bigdata-related applications due to their large number of entities and large-scaledynamic interactions. They can be modeled as a high-dimensional incomplete(HDI) tensor that contains a wealth of knowledge about time patterns. A Latentfactorization of tensors (LFT) model efficiently extracts this time pattern,which can be established using stochastic gradient descent (SGD) solvers.However, LFT models based on SGD are often limited by training schemes and havepoor tail convergence. To solve this problem, this paper proposes a novelnonlinear LFT model (MNNL) based on momentum-incorporated SGD, which extractsnon-negative latent factors from HDI tensors to make training unconstrained andcompatible with general training schemes, while improving convergence accuracyand speed. Empirical studies on two LDN datasets show that compared to existingmodels, the MNNL model has higher prediction accuracy and convergence speed.", "output": "A Momentum-Incorporated Non-Negative Latent Factorization of Tensors Model for Dynamic Network Representation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, DeepNorm scales Transformers into extremely deep (i.e., 1000layers) and reveals the promising potential of deep scaling. To stabilize thetraining of deep models, DeepNorm (Wang et al., 2022) attempts to constrain themodel update to a constant value. Although applying such a constraint canbenefit the early stage of model training, it may lead to undertrained modelsduring the whole training procedure. In this paper, we propose BranchNorm,which dynamically rescales the non-residual branch of Transformer in accordancewith the training period. BranchNorm not only theoretically stabilizes thetraining with smooth gradient norms at the early stage, but also encouragesbetter convergence in the subsequent training stage. Experiment results onmultiple translation tasks demonstrate that BranchNorm achieves a bettertrade-off between training stability and converge performance.", "output": "BranchNorm: Robustly Scaling Extremely Deep Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Pseudo labeling is a popular and effective method to leverage the informationof unlabeled data. Conventional instance-aware pseudo labeling methods oftenassign each unlabeled instance with a pseudo label based on its predictedprobabilities. However, due to the unknown number of true labels, these methodscannot generalize well to semi-supervised multi-label learning (SSMLL)scenarios, since they would suffer from the risk of either introducing falsepositive labels or neglecting true positive ones. In this paper, we propose tosolve the SSMLL problems by performing Class-distribution-Aware Pseudo labeling(CAP), which encourages the class distribution of pseudo labels to approximatethe true one. Specifically, we design a regularized learning frameworkconsisting of the class-aware thresholds to control the number of pseudo labelsfor each class. Given that the labeled and unlabeled examples are sampledaccording to the same distribution, we determine the thresholds by exploitingthe empirical class distribution, which can be treated as a tight approximationto the true one. Theoretically, we show that the generalization performance ofthe proposed method is dependent on the pseudo labeling error, which can besignificantly reduced by the CAP strategy. Extensive experimental results onmultiple benchmark datasets validate that CAP can effectively solve the SSMLLproblems.", "output": "Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The aim of this paper is to present a mathematical framework for tensor PCA.The proposed approach is able to overcome the limitations of previous methodsthat extract a low dimensional subspace by iteratively solving an optimizationproblem. The core of the proposed approach is the derivation of a basis intensor space from a real self-adjoint tensor operator, thus reducing theproblem of deriving a basis to an eigenvalue problem. Three different caseshave been studied to derive: i) a basis from a self-adjoint tensor operator;ii) a rank-1 basis; iii) a basis in a subspace. In particular, the equivalencebetween eigenvalue equation for a real self-adjoint tensor operator andstandard matrix eigenvalue equation has been proven. For all the three casesconsidered, a subspace approach has been adopted to derive a tensor PCA.Experiments on image datasets validate the proposed mathematical framework.", "output": "Tensor PCA from basis in tensor space."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Subset selection tasks, arise in recommendation systems and search enginesand ask to select a subset of items that maximize the value for the user. Thevalues of subsets often display diminishing returns, and hence, submodularfunctions have been used to model them. If the inputs defining the submodularfunction are known, then existing algorithms can be used. In many applications,however, inputs have been observed to have social biases that reduce theutility of the output subset. Hence, interventions to improve the utility aredesired. Prior works focus on maximizing linear functions -- a special case ofsubmodular functions -- and show that fairness constraint-based interventionscan not only ensure proportional representation but also achieve near-optimalutility in the presence of biases. We study the maximization of a family ofsubmodular functions that capture functions arising in the aforementionedapplications. Our first result is that, unlike linear functions,constraint-based interventions cannot guarantee any constant fraction of theoptimal utility for this family of submodular functions. Our second result isan algorithm for submodular maximization. The algorithm provably outputssubsets that have near-optimal utility for this family under mild assumptionsand that proportionally represent items from each group. In empiricalevaluation, with both synthetic and real-world data, we observe that thisalgorithm improves the utility of the output subset for this family ofsubmodular functions over baselines.", "output": "Maximizing Submodular Functions for Recommendation in the Presence of Biases."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this thesis, we develop methods to enhance the interpretability of recentrepresentation learning techniques in natural language processing (NLP) whileaccounting for the unavailability of annotated data. We choose to leverageVariational Autoencoders (VAEs) due to their efficiency in relatingobservations to latent generative factors and their effectiveness indata-efficient learning and interpretable representation learning. As a firstcontribution, we identify and remove unnecessary components in the functioningscheme of semi-supervised VAEs making them faster, smaller and easier todesign. Our second and main contribution is to use VAEs and Transformers tobuild two models with inductive bias to separate information in latentrepresentations into understandable concepts without annotated data. The firstmodel, Attention-Driven VAE (ADVAE), is able to separately represent andcontrol information about syntactic roles in sentences. The second model,QKVAE, uses separate latent variables to form keys and values for itsTransformer decoder and is able to separate syntactic and semantic informationin its neural representations. In transfer experiments, QKVAE has competitiveperformance compared to supervised models and equivalent performance to asupervised model using 50K annotated samples. Additionally, QKVAE displaysimproved syntactic role disentanglement capabilities compared to ADVAE.Overall, we demonstrate that it is possible to enhance the interpretability ofstate-of-the-art deep learning architectures for language modeling withunannotated data in situations where text data is abundant but annotations arescarce.", "output": "Interpretable Sentence Representation with Variational Autoencoders and Attention."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Over the last few years Explainable Clustering has gathered a lot ofattention. Dasgupta et al. [ICML'20] initiated the study of explainable k-meansand k-median clustering problems where the explanation is captured by athreshold decision tree which partitions the space at each node using axisparallel hyperplanes. Recently, Laber et al. [Pattern Recognition'23] made acase to consider the depth of the decision tree as an additional complexitymeasure of interest.In this work, we prove that even when the input points are in the Euclideanplane, then any depth reduction in the explanation incurs unbounded loss in thek-means and k-median cost. Formally, we show that there exists a data set X inthe Euclidean plane, for which there is a decision tree of depth k-1 whosek-means/k-median cost matches the optimal clustering cost of X, but everydecision tree of depth less than k-1 has unbounded cost w.r.t. the optimal costof clustering. We extend our results to the k-center objective as well, albeitwith weaker guarantees.", "output": "Impossibility of Depth Reduction in Explainable Clustering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "When deploying artificial agents in real-world environments where theyinteract with humans, it is crucial that their behavior is aligned with thevalues, social norms or other requirements of that environment. However, manyenvironments have implicit constraints that are difficult to specify andtransfer to a learning agent. To address this challenge, we propose a novelmethod that utilizes the principle of maximum causal entropy to learnconstraints and an optimal policy that adheres to these constraints, usingdemonstrations of agents that abide by the constraints. We prove convergence ina tabular setting and provide an approximation which scales to complexenvironments. We evaluate the effectiveness of the learned policy by assessingthe reward received and the number of constraint violations, and we evaluatethe learned cost function based on its transferability to other agents. Ourmethod has been shown to outperform state-of-the-art approaches across avariety of tasks and environments, and it is able to handle problems withstochastic dynamics and a continuous state-action space.", "output": "Maximum Causal Entropy Inverse Constrained Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph Transformer is gaining increasing attention in the field of machinelearning and has demonstrated state-of-the-art performance on benchmarks forgraph representation learning. However, as current implementations of GraphTransformer primarily focus on learning representations of small-scale graphs,the quadratic complexity of the global self-attention mechanism presents achallenge for full-batch training when applied to larger graphs. Additionally,conventional sampling-based methods fail to capture necessary high-levelcontextual information, resulting in a significant loss of performance. In thispaper, we introduce the Hierarchical Scalable Graph Transformer (HSGT) as asolution to these challenges. HSGT successfully scales the Transformerarchitecture to node representation learning tasks on large-scale graphs, whilemaintaining high performance. By utilizing graph hierarchies constructedthrough coarsening techniques, HSGT efficiently updates and stores multi-scaleinformation in node embeddings at different levels. Together withsampling-based training methods, HSGT effectively captures and aggregatesmulti-level information on the hierarchical graph using only Transformerblocks. Empirical evaluations demonstrate that HSGT achieves state-of-the-artperformance on large-scale benchmarks with graphs containing millions of nodeswith high efficiency.", "output": "Hierarchical Transformer for Scalable Graph Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Quantum generative models, in providing inherently efficient samplingstrategies, show promise for achieving a near-term advantage on quantumhardware. Nonetheless, important questions remain regarding their scalability.In this work, we investigate the barriers to the trainability of quantumgenerative models posed by barren plateaus and exponential loss concentration.We explore the interplay between explicit and implicit models and losses, andshow that using implicit generative models (such as quantum circuit-basedmodels) with explicit losses (such as the KL divergence) leads to a new flavourof barren plateau. In contrast, the Maximum Mean Discrepancy (MMD), which is apopular example of an implicit loss, can be viewed as the expectation value ofan observable that is either low-bodied and trainable, or global anduntrainable depending on the choice of kernel. However, in parallel, wehighlight that the low-bodied losses required for trainability cannot ingeneral distinguish high-order correlations, leading to a fundamental tensionbetween exponential concentration and the emergence of spurious minima. Wefurther propose a new local quantum fidelity-type loss which, by leveragingquantum circuits to estimate the quality of the encoded distribution, is bothfaithful and enjoys trainability guarantees. Finally, we compare theperformance of different loss functions for modelling real-world data from theHigh-Energy-Physics domain and confirm the trends predicted by our theoreticalresults.", "output": "Trainability barriers and opportunities in quantum generative modeling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data augmentation is a widely used technique for improving model performancein machine learning, particularly in computer vision and natural languageprocessing. Recently, there has been increasing interest in applyingaugmentation techniques to reinforcement learning (RL) problems, with a focuson image-based augmentation. In this paper, we explore a set of genericwrappers designed to augment RL environments with noise and encourage agentexploration and improve training data diversity which are applicable to a broadspectrum of RL algorithms and environments. Specifically, we concentrate onaugmentations concerning states, rewards, and transition dynamics and introducetwo novel augmentation techniques. In addition, we introduce a noise ratehyperparameter for control over the frequency of noise injection. We presentexperimental results on the impact of these wrappers on return using threepopular RL algorithms, Soft Actor-Critic (SAC), Twin Delayed DDPG (TD3), andProximal Policy Optimization (PPO), across five MuJoCo environments. To supportthe choice of augmentation technique in practice, we also present analysis thatexplores the performance these techniques across environments. Lastly, wepublish the wrappers in our noisyenv repository for use with gym environments.", "output": "Simple Noisy Environment Augmentation for Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Binary Neural Networks (BNNs) use 1-bit weights and activations toefficiently execute deep convolutional neural networks on edge devices.Nevertheless, the binarization of the first layer is conventionally excluded,as it leads to a large accuracy loss. The few works addressing the first layerbinarization, typically increase the number of input channels to enhance datarepresentation; such data expansion raises the amount of operations needed andit is feasible only on systems with enough computational resources. In thiswork, we present a new method to binarize the first layer using directly the8-bit representation of input data; we exploit the standard bit-planes encodingto extract features bit-wise (using depth-wise convolutions); after are-weighting stage, features are fused again. The resulting model is fullybinarized and our first layer binarization approach is model independent. Theconcept is evaluated on three classification datasets (CIFAR10, SVHN andCIFAR100) for different model architectures (VGG and ResNet) and, the proposedtechnique outperforms state of the art methods both in accuracy and BMACsreduction.", "output": "Input Layer Binarization with Bit-Plane Encoding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning is an important framework in modern machine learning thatseeks to integrate the training of learning models from multiple users, eachuser having their own local data set, in a way that is sensitive to dataprivacy and to communication loss constraints. In clustered federated learning,one assumes an additional unknown group structure among users, and the goal isto train models that are useful for each group, rather than simply training asingle global model for all users. In this paper, we propose a novel solutionto the problem of clustered federated learning that is inspired by ideas inconsensus-based optimization (CBO). Our new CBO-type method is based on asystem of interacting particles that is oblivious to group memberships. Ourmodel is motivated by rigorous mathematical reasoning, including a mean fieldanalysis describing the large number of particles limit of our particle system,as well as convergence guarantees for the simultaneous global optimization ofgeneral non-convex objective functions (corresponding to the loss functions ofeach cluster of users) in the mean-field regime. Experimental resultsdemonstrate the efficacy of our FedCBO algorithm compared to otherstate-of-the-art methods and help validate our methodological and theoreticalwork.", "output": "FedCBO: Reaching Group Consensus in Clustered Federated Learning through Consensus-based Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph neural networks (GNNs) have achieved remarkable success in variousreal-world applications. However, recent studies highlight the vulnerability ofGNNs to malicious perturbations. Previous adversaries primarily focus on graphmodifications or node injections to existing graphs, yielding promising resultsbut with notable limitations. Graph modification attack~(GMA) requiresmanipulation of the original graph, which is often impractical, while graphinjection attack~(GIA) necessitates training a surrogate model in the black-boxsetting, leading to significant performance degradation due to divergencebetween the surrogate architecture and the actual victim model. Furthermore,most methods concentrate on a single attack goal and lack a generalizableadversary to develop distinct attack strategies for diverse goals, thuslimiting precise control over victim model behavior in real-world scenarios. Toaddress these issues, we present a gradient-free generalizable adversary thatinjects a single malicious node to manipulate the classification result of atarget node in the black-box evasion setting. We propose Gradient-freeGeneralizable Single Node Injection Attack, namely G$^2$-SNIA, a reinforcementlearning framework employing Proximal Policy Optimization. By directly queryingthe victim model, G$^2$-SNIA learns patterns from exploration to achievediverse attack goals with extremely limited attack budgets. Throughcomprehensive experiments over three acknowledged benchmark datasets and fourprominent GNNs in the most challenging and realistic scenario, we demonstratethe superior performance of our proposed G$^2$-SNIA over the existingstate-of-the-art baselines. Moreover, by comparing G$^2$-SNIA with multiplewhite-box evasion baselines, we confirm its capacity to generate solutionscomparable to those of the best adversaries.", "output": "Single Node Injection Label Specificity Attack on Graph Neural Networks via Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Since its inception, the field of unbiased learning to rank (ULTR) hasremained very active and has seen several impactful advancements in recentyears. This tutorial provides both an introduction to the core concepts of thefield and an overview of recent advancements in its foundations along withseveral applications of its methods. The tutorial is divided into four parts:Firstly, we give an overview of the different forms of bias that can beaddressed with ULTR methods. Secondly, we present a comprehensive discussion ofthe latest estimation techniques in the ULTR field. Thirdly, we surveypublished results of ULTR in real-world applications. Fourthly, we discuss theconnection between ULTR and fairness in ranking. We end by briefly reflectingon the future of ULTR research and its applications. This tutorial is intendedto benefit both researchers and industry practitioners who are interested indeveloping new ULTR solutions or utilizing them in real-world applications.", "output": "Recent Advances in the Foundations and Applications of Unbiased Learning to Rank."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Normalizing flows are an established approach for modelling complexprobability densities through invertible transformations from a basedistribution. However, the accuracy with which the target distribution can becaptured by the normalizing flow is strongly influenced by the topology of thebase distribution. A mismatch between the topology of the target and the basecan result in a poor performance, as is the case for multi-modal problems. Anumber of different works have attempted to modify the topology of the basedistribution to better match the target, either through the use of GaussianMixture Models [Izmailov et al., 2020, Ardizzone et al., 2020, Hagemann andNeumayer, 2021] or learned accept/reject sampling [Stimper et al., 2022]. Weintroduce piecewise normalizing flows which divide the target distribution intoclusters, with topologies that better match the standard normal basedistribution, and train a series of flows to model complex multi-modal targets.The piecewise nature of the flows can be exploited to significantly reduce thecomputational cost of training through parallelization. We demonstrate theperformance of the piecewise flows using standard benchmarks and compare theaccuracy of the flows to the approach taken in Stimper et al., 2022 formodelling multi-modal distributions.", "output": "Piecewise Normalizing Flows."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Obtaining high-quality data for collaborative training of machine learningmodels can be a challenging task due to A) the regulatory concerns and B) lackof incentive to participate. The first issue can be addressed through the useof privacy enhancing technologies (PET), one of the most frequently used onebeing differentially private (DP) training. The second challenge can beaddressed by identifying which data points can be beneficial for model trainingand rewarding data owners for sharing this data. However, DP in deep learningtypically adversely affects atypical (often informative) data samples, makingit difficult to assess the usefulness of individual contributions. In this workwe investigate how to leverage gradient information to identify trainingsamples of interest in private training settings. We show that there existtechniques which are able to provide the clients with the tools for principleddata selection even in strictest privacy settings.", "output": "Leveraging gradient-derived metrics for data selection and valuation in differentially private training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While off-policy reinforcement learning (RL) algorithms are sample efficientdue to gradient-based updates and data reuse in the replay buffer, theystruggle with convergence to local optima due to limited exploration. On theother hand, population-based algorithms offer a natural exploration strategy,but their heuristic black-box operators are inefficient. Recent algorithms haveintegrated these two methods, connecting them through a shared replay buffer.However, the effect of using diverse data from population optimizationiterations on off-policy RL algorithms has not been thoroughly investigated. Inthis paper, we first analyze the use of off-policy RL algorithms in combinationwith population-based algorithms, showing that the use of population data couldintroduce an overlooked error and harm performance. To test this, we propose auniform and scalable training design and conduct experiments on our tailoredframework in robot locomotion tasks from the OpenAI gym. Our resultssubstantiate that using population data in off-policy RL can cause instabilityduring training and even degrade performance. To remedy this issue, we furtherpropose a double replay buffer design that provides more on-policy data andshow its effectiveness through experiments. Our results offer practicalinsights for training these hybrid methods.", "output": "Rethinking Population-assisted Off-policy Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recommender system or crowdsourcing applications of online learning, ahuman's preferences or abilities are often a function of the algorithm's recentactions. Motivated by this, a significant line of work has formalized settingswhere an action's loss is a function of the number of times that action wasrecently played in the prior $m$ timesteps, where $m$ corresponds to a bound onhuman memory capacity. To more faithfully capture decay of human memory withtime, we introduce the Weighted Tallying Bandit (WTB), which generalizes thissetting by requiring that an action's loss is a function of a emph{weighted}summation of the number of times that arm was played in the last $m$ timesteps.This WTB setting is intractable without further assumption. So we study itunder Repeated Exposure Optimality (REO), a condition motivated by theliterature on human physiology, which requires the existence of an action thatwhen repetitively played will eventually yield smaller loss than any othersequence of actions. We study the minimization of the complete policy regret(CPR), which is the strongest notion of regret, in WTB under REO. Since $m$ istypically unknown, we assume we only have access to an upper bound $M$ on $m$.We show that for problems with $K$ actions and horizon $T$, a simplemodification of the successive elimination algorithm has $O left( sqrt{KT} +(m+M)K right)$ CPR. Interestingly, upto an additive (in lieu ofmutliplicative) factor in $(m+M)K$, this recovers the classical guarantee forthe simpler stochastic multi-armed bandit with traditional regret. Weadditionally show that in our setting, any algorithm will suffer additive CPRof $Omega left( mK + M right)$, demonstrating our result is nearly optimal.Our algorithm is computationally efficient, and we experimentally demonstrateits practicality and superiority over natural baselines.", "output": "Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many machine learning (ML) libraries are accessible online for MLpractitioners. Typical ML pipelines are complex and consist of a series ofsteps, each of them invoking several ML libraries. In this demo paper, wepresent ExeKGLib, a Python library that allows users with coding skills andminimal ML knowledge to build ML pipelines. ExeKGLib relies on knowledge graphsto improve the transparency and reusability of the built ML workflows, and toensure that they are executable. We demonstrate the usage of ExeKGLib andcompare it with conventional ML code to show its benefits.", "output": "ExeKGLib: Knowledge Graphs-Empowered Machine Learning Analytics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce Masked Trajectory Models (MTM) as a generic abstraction forsequential decision making. MTM takes a trajectory, such as a state-actionsequence, and aims to reconstruct the trajectory conditioned on random subsetsof the same trajectory. By training with a highly randomized masking pattern,MTM learns versatile networks that can take on different roles or capabilities,by simply choosing appropriate masks at inference time. For example, the sameMTM network can be used as a forward dynamics model, inverse dynamics model, oreven an offline RL agent. Through extensive experiments in several continuouscontrol tasks, we show that the same MTM network -- i.e. same weights -- canmatch or outperform specialized networks trained for the aforementionedcapabilities. Additionally, we find that state representations learned by MTMcan significantly accelerate the learning speed of traditional RL algorithms.Finally, in offline RL benchmarks, we find that MTM is competitive withspecialized offline RL algorithms, despite MTM being a generic self-supervisedlearning method without any explicit RL components. Code is available at", "output": "Masked Trajectory Models for Prediction, Representation, and Control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper describes the results of SemEval 2023 task 7 -- Multi-EvidenceNatural Language Inference for Clinical Trial Data (NLI4CT) -- consisting of 2tasks, a Natural Language Inference (NLI) task, and an evidence selection taskon clinical trial data. The proposed challenges require multi-hop biomedicaland numerical reasoning, which are of significant importance to the developmentof systems capable of large-scale interpretation and retrieval of medicalevidence, to provide personalized evidence-based care.Task 1, the entailment task, received 643 submissions from 40 participants,and Task 2, the evidence selection task, received 364 submissions from 23participants. The tasks are challenging, with the majority of submitted systemsfailing to significantly outperform the majority class baseline on theentailment task, and we observe significantly better performance on theevidence selection task than on the entailment task. Increasing the number ofmodel parameters leads to a direct increase in performance, far moresignificant than the effect of biomedical pre-training. Future works couldexplore the limitations of large models for generalization and numericalinference, and investigate methods to augment clinical datasets to allow formore rigorous testing and to facilitate fine-tuning.We envisage that the dataset, models, and results of this task will be usefulto the biomedical NLI and evidence retrieval communities. The dataset,competition leaderboard, and website are publicly available.", "output": "SemEval-2023 Task 7: Multi-Evidence Natural Language Inference for Clinical Trial Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Understanding the performance of machine learning (ML) models across diversedata distributions is critically important for reliable applications. Despiterecent empirical studies positing a near-perfect linear correlation betweenin-distribution (ID) and out-of-distribution (OOD) accuracies, we empiricallydemonstrate that this correlation is more nuanced under subpopulation shifts.Through rigorous experimentation and analysis across a variety of datasets,models, and training epochs, we demonstrate that OOD performance often has anonlinear correlation with ID performance in subpopulation shifts. Ourfindings, which contrast previous studies that have posited a linearcorrelation in model performance during distribution shifts, reveal a \"moonshape\" correlation (parabolic uptrend curve) between the test performance onthe majority subpopulation and the minority subpopulation. This non-trivialnonlinear correlation holds across model architectures, hyperparameters,training durations, and the imbalance between subpopulations. Furthermore, wefound that the nonlinearity of this \"moon shape\" is causally influenced by thedegree of spurious correlations in the training data. Our controlledexperiments show that stronger spurious correlation in the training datacreates more nonlinear performance correlation. We provide complementaryexperimental and theoretical analyses for this phenomenon, and discuss itsimplications for ML reliability and fairness. Our work highlights theimportance of understanding the nonlinear effects of model improvement onperformance in different subpopulations, and has the potential to inform thedevelopment of more equitable and responsible machine learning models.", "output": "On the nonlinear correlation of ML performance between data subpopulations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cross-encoder models, which jointly encode and score a query-item pair, aretypically prohibitively expensive for k-nearest neighbor search. Consequently,k-NN search is performed not with a cross-encoder, but with a heuristicretrieve (e.g., using BM25 or dual-encoder) and re-rank approach. Recent workproposes ANNCUR (Yadav et al., 2022) which uses CUR matrix factorization toproduce an embedding space for efficient vector-based search that directlyapproximates the cross-encoder without the need for dual-encoders. ANNCURdefines this shared query-item embedding space by scoring the test queryagainst anchor items which are sampled uniformly at random. While thisminimizes average approximation error over all items, unsuitably highapproximation error on top-k items remains and leads to poor recall of top-k(and especially top-1) items. Increasing the number of anchor items is astraightforward way of improving the approximation error and hence k-NN recallof ANNCUR but at the cost of increased inference latency. In this paper, wepropose a new method for adaptively choosing anchor items that minimizes theapproximation error for the practically important top-k neighbors for a querywith minimal computational overhead. Our proposed method incrementally selectsa suitable set of anchor items for a given test query over several rounds,using anchors chosen in previous rounds to inform selection of more anchoritems. Empirically, our method consistently improves k-NN recall as compared toboth ANNCUR and the widely-used dual-encoder-based retrieve-and-rerankapproach.", "output": "Adaptive Selection of Anchor Items for CUR-based k-NN search with Cross-Encoders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Tabular data is one of the most commonly used types of data in machinelearning. Despite recent advances in neural nets (NNs) for tabular data, thereis still an active discussion on whether or not NNs generally outperformgradient-boosted decision trees (GBDTs) on tabular data, with several recentworks arguing either that GBDTs consistently outperform NNs on tabular data, orvice versa. In this work, we take a step back and ask, 'does it matter?' Weconduct the largest tabular data analysis to date, by comparing 19 algorithmsacross 176 datasets, and we find that the 'NN vs. GBDT' debate isoveremphasized: for a surprisingly high number of datasets, either theperformance difference between GBDTs and NNs is negligible, or lighthyperparameter tuning on a GBDT is more important than selecting the bestalgorithm. Next, we analyze 965 metafeatures to determine what properties of adataset make NNs or GBDTs better-suited to perform well. For example, we findthat GBDTs are much better than NNs at handling skewed feature distributions,heavy-tailed feature distributions, and other forms of dataset irregularities.Our insights act as a guide for practitioners to decide whether or not theyneed to run a neural net to reach top performance on their dataset. Ourcodebase and all raw results are available at", "output": "When Do Neural Nets Outperform Boosted Trees on Tabular Data?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The study of code example recommendation has been conducted extensively inthe past and recently in order to assist developers in their softwaredevelopment tasks. This is because developers often spend significant timesearching for relevant code examples on the internet, utilizing open-sourceprojects and informal documentation. For finding useful code examples, informaldocumentation, such as Stack Overflow discussions and forums, can beinvaluable. We have focused our research on Stack Overflow, which is a popularresource for discussing different topics among software developers. Forincreasing the quality of the recommended code examples, we have collected andrecommended the best code examples in the Java programming language. We haveutilized BERT in our approach, which is a Large Language Model (LLM) for textrepresentation that can effectively extract semantic information from textualdata. Our first step involved using BERT to convert code examples intonumerical vectors. Subsequently, we applied LSH to identify Approximate NearestNeighbors (ANN). Our research involved the implementation of two variants ofthis approach, namely the Random Hyperplane-based LSH and the Query-Aware LSH.Our study compared two algorithms using four parameters: HitRate, MeanReciprocal Rank (MRR), Average Execution Time, and Relevance. The results ofour analysis revealed that the Query- Aware (QA) approach outperformed theRandom Hyperplane-based (RH) approach in terms of HitRate. Specifically, the QAapproach achieved a HitRate improvement of 20% to 35% for query pairs comparedto the RH approach. Creating hashing tables and assigning data samples tobuckets using the QA approach is at least four times faster than the RHapproach. The QA approach returns code examples within milliseconds, while ittakes several seconds (sec) for the RH approach to recommend code examples.", "output": "Improving Code Example Recommendations on Informal Documentation Using BERT and Query-Aware LSH: A Comparative Study."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Clustering is at the very core of machine learning, and its applicationsproliferate with the increasing availability of data. However, as datasetsgrow, comparing clusterings with an adjustment for chance becomescomputationally difficult, preventing unbiased ground-truth comparisons andsolution selection. We propose FastAMI, a Monte Carlo-based method toefficiently approximate the Adjusted Mutual Information (AMI) and extend it tothe Standardized Mutual Information (SMI). The approach is compared with theexact calculation and a recently developed variant of the AMI based on pairwisepermutations, using both synthetic and real data. In contrast to the exactcalculation our method is fast enough to enable these adjustedinformation-theoretic comparisons for large datasets while maintainingconsiderably more accurate results than the pairwise approach.", "output": "FastAMI -- a Monte Carlo Approach to the Adjustment for Chance in Clustering Comparison Metrics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Prior works for reconstructing hand-held objects from a single image rely ondirect 3D shape supervision which is challenging to gather in real world atscale. Consequently, these approaches do not generalize well when presentedwith novel objects in in-the-wild settings. While 3D supervision is a majorbottleneck, there is an abundance of in-the-wild raw video data showinghand-object interactions. In this paper, we automatically extract 3Dsupervision (via multiview 2D supervision) from such raw video data to scale upthe learning of models for hand-held object reconstruction. This requirestackling two key challenges: unknown camera pose and occlusion. For the former,we use hand pose (predicted from existing techniques, e.g. FrankMocap) as aproxy for object pose. For the latter, we learn data-driven 3D shape priorsusing synthetic objects from the ObMan dataset. We use these indirect 3D cuesto train occupancy networks that predict the 3D shape of objects from a singleRGB image. Our experiments on the MOW and HO3D datasets show the effectivenessof these supervisory signals at predicting the 3D shape for real-worldhand-held objects without any direct real-world 3D supervision.", "output": "Learning Hand-Held Object Reconstruction from In-The-Wild Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Computational notebooks such as Jupyter Notebook have become data scientists'de facto programming environments. Many visualization researchers andpractitioners have developed interactive visualization tools that supportnotebooks. However, little is known about the appropriate design of visualanalytics (VA) tools in notebooks. To bridge this critical research gap, weinvestigate the design strategies in this space by analyzing 159 notebook VAtools and their users' feedback. Our analysis encompasses 62 systems fromacademic papers and 103 systems sourced from a pool of 55k notebooks containinginteractive visualizations that we obtain via scraping 8.6 million notebooks onGitHub. We also examine findings from 15 user studies and user feedback in 379GitHub issues. Through this work, we identify unique design opportunities andconsiderations for future notebook VA tools, such as using and manipulatingmultimodal data in notebooks as well as balancing the degree ofvisualization-notebook integration. Finally, we develop SuperNOVA, anopen-source interactive tool to help researchers explore existing notebook VAtools and search for related work.", "output": "SuperNOVA: Design Strategies and Opportunities for Interactive Visualization in Computational Notebooks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many contemporary generative models of molecules are variationalauto-encoders of molecular graphs. One term in their training loss pertains toreconstructing the input, yet reconstruction capabilities of state-of-the-artmodels have not yet been thoroughly compared on a large and chemically diversedataset. In this work, we show that when several state-of-the-art generativemodels are evaluated under the same conditions, their reconstruction accuracyis surprisingly low, worse than what was previously reported on seeminglyharder datasets. However, we show that improving reconstruction does notdirectly lead to better sampling or optimization performance. Failedreconstructions from the MoLeR model are usually similar to the inputs,assembling the same motifs in a different way, and possess similar chemicalproperties such as solubility. Finally, we show that the input molecule and itsfailed reconstruction are usually mapped by the different encoders tostatistically distinguishable posterior distributions, hinting that posteriorcollapse may not fully explain why VAEs are bad at reconstructing moleculargraphs.", "output": "Are VAEs Bad at Reconstructing Molecular Graphs?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervisedfine-tuning (SFT) with human annotations and reinforcement learning from humanfeedback (RLHF) to align the output of large language models (LLMs) with humanintentions, ensuring they are helpful, ethical, and reliable. However, thisdependence can significantly constrain the true potential of AI-assistantagents due to the high cost of obtaining human supervision and the relatedissues on quality, reliability, diversity, self-consistency, and undesirablebiases. To address these challenges, we propose a novel approach calledSELF-ALIGN, which combines principle-driven reasoning and the generative powerof LLMs for the self-alignment of AI agents with minimal human supervision. Ourapproach encompasses four stages: first, we use an LLM to generate syntheticprompts, and a topic-guided method to augment the prompt diversity; second, weuse a small set of human-written principles for AI models to follow, and guidethe LLM through in-context learning from demonstrations (of principlesapplication) to produce helpful, ethical, and reliable responses to user'squeries; third, we fine-tune the original LLM with the high-qualityself-aligned responses so that the resulting model can generate desirableresponses for each query directly without the principle set and thedemonstrations anymore; and finally, we offer a refinement step to address theissues of overly-brief or indirect responses. Applying SELF-ALIGN to theLLaMA-65b base language model, we develop an AI assistant named Dromedary. Withfewer than 300 lines of human annotations (including < 200 seed prompts, 16generic principles, and 5 exemplars for in-context learning). Dromedarysignificantly surpasses the performance of several state-of-the-art AI systems,including Text-Davinci-003 and Alpaca, on benchmark datasets with varioussettings.", "output": "Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Driven by large-data pre-training, Segment Anything Model (SAM) has beendemonstrated as a powerful and promptable framework, revolutionizing thesegmentation models. Despite the generality, customizing SAM for specificvisual concepts without man-powered prompting is under explored, e.g.,automatically segmenting your pet dog in different images. In this paper, wepropose a training-free Personalization approach for SAM, termed as PerSAM.Given only a single image with a reference mask, PerSAM first localizes thetarget concept by a location prior, and segments it within other images orvideos via three techniques: target-guided attention, target-semanticprompting, and cascaded post-refinement. In this way, we effectively adapt SAMfor private use without any training. To further alleviate the mask ambiguity,we present an efficient one-shot fine-tuning variant, PerSAM-F. Freezing theentire SAM, we introduce two learnable weights for multi-scale masks, onlytraining 2 parameters within 10 seconds for improved performance. Todemonstrate our efficacy, we construct a new segmentation dataset, PerSeg, forpersonalized evaluation, and test our methods on video object segmentation withcompetitive performance. Besides, our approach can also enhance DreamBooth topersonalize Stable Diffusion for text-to-image generation, which discards thebackground disturbance for better target appearance learning. Code is releasedat ", "output": "Personalize Segment Anything Model with One Shot."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep generative models have various content creation applications such asgraphic design, e-commerce, and virtual Try-on. However, current works mainlyfocus on synthesizing realistic visual outputs, often ignoring other sensorymodalities, such as touch, which limits physical interaction with users. Inthis work, we leverage deep generative models to create a multi-sensoryexperience where users can touch and see the synthesized object when slidingtheir fingers on a haptic surface. The main challenges lie in the significantscale discrepancy between vision and touch sensing and the lack of explicitmapping from touch sensing data to a haptic rendering device. To bridge thisgap, we collect high-resolution tactile data with a GelSight sensor and createa new visuotactile clothing dataset. We then develop a conditional generativemodel that synthesizes both visual and tactile outputs from a single sketch. Weevaluate our method regarding image quality and tactile rendering accuracy.Finally, we introduce a pipeline to render high-quality visual and tactileoutputs on an electroadhesion-based haptic device for an immersive experience,allowing for challenging materials and editable sketch inputs.", "output": "Controllable Visual-Tactile Synthesis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Tracking objects with persistence in cluttered and dynamic environmentsremains a difficult challenge for computer vision systems. In this paper, weintroduce $textbf{TCOW}$, a new benchmark and model for visual trackingthrough heavy occlusion and containment. We set up a task where the goal is to,given a video sequence, segment both the projected extent of the target object,as well as the surrounding container or occluder whenever one exists. To studythis task, we create a mixture of synthetic and annotated real datasets tosupport both supervised learning and structured evaluation of model performanceunder various forms of task variation, such as moving or nested containment. Weevaluate two recent transformer-based video models and find that while they canbe surprisingly capable of tracking targets under certain settings of taskvariation, there remains a considerable performance gap before we can claim atracking model to have acquired a true notion of object permanence.", "output": "Tracking through Containers and Occluders in the Wild."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Typical deep visual recognition models are capable of performing the one taskthey were trained on. In this paper, we tackle the extremely difficult problemof combining completely distinct models with different initializations, eachsolving a separate task, into one multi-task model without any additionaltraining. Prior work in model merging permutes one model to the space of theother then adds them together. While this works for models trained on the sametask, we find that this fails to account for the differences in models trainedon disjoint tasks. Thus, we introduce \"ZipIt!\", a general method for mergingtwo arbitrary models of the same architecture that incorporates two simplestrategies. First, in order to account for features that aren't shared betweenmodels, we expand the model merging problem to additionally allow for mergingfeatures within each model by defining a general \"zip\" operation. Second, weadd support for partially zipping the models up until a specified layer,naturally creating a multi-head model. We find that these two changes combinedaccount for a staggering 20-60% improvement over prior work, making the mergingof models trained on disjoint tasks feasible.", "output": "ZipIt! Merging Models from Different Tasks without Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we focus on the solution of online optimization problems thatarise often in signal processing and machine learning, in which we have accessto streaming sources of data. We discuss algorithms for online optimizationbased on the prediction-correction paradigm, both in the primal and dual space.In particular, we leverage the typical regularized least-squares structureappearing in many signal processing problems to propose a novel and tailoredprediction strategy, which we call extrapolation-based. By using tools fromoperator theory, we then analyze the convergence of the proposed methods asapplied both to primal and dual problems, deriving an explicit bound for thetracking error, that is, the distance from the time-varying optimal solution.We further discuss the empirical performance of the algorithm when applied tosignal processing, machine learning, and robotics problems.", "output": "Extrapolation-based Prediction-Correction Methods for Time-varying Convex Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vertex nomination is a lightly-supervised network information retrieval taskin which vertices of interest in one graph are used to query a second graph todiscover vertices of interest in the second graph. Similar to other informationretrieval tasks, the output of a vertex nomination scheme is a ranked list ofthe vertices in the second graph, with the heretofore unknown vertices ofinterest ideally concentrating at the top of the list. Vertex nominationschemes provide a useful suite of tools for efficiently mining complex networksfor pertinent information. In this paper, we explore, both theoretically andpractically, the dual roles of content (i.e., edge and vertex attributes) andcontext (i.e., network topology) in vertex nomination. We provide necessary andsufficient conditions under which vertex nomination schemes that leverage bothcontent and context outperform schemes that leverage only content or contextseparately. While the joint utility of both content and context has beendemonstrated empirically in the literature, the framework presented in thispaper provides a novel theoretical basis for understanding the potentialcomplementary roles of network features and topology.", "output": "Vertex Nomination in Richly Attributed Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose the Graph Temporal Edge Aggregation (GTEA)framework for inductive learning on Temporal Interaction Graphs (TIGs).Different from previous works, GTEA models the temporal dynamics of interactionsequences in the continuous-time space and simultaneously takes advantage ofboth rich node and edge/ interaction attributes in the graph. Concretely, weintegrate a sequence model with a time encoder to learn pairwise interactionaldynamics between two adjacent nodes.This helps capture complex temporalinteractional patterns of a node pair along the history, which generates edgeembeddings that can be fed into a GNN backbone. By aggregating features ofneighboring nodes and the corresponding edge embeddings, GTEA jointly learnsboth topological and temporal dependencies of a TIG. In addition, asparsity-inducing self-attention scheme is incorporated for neighboraggregation, which highlights more important neighbors and suppresses trivialnoises for GTEA. By jointly optimizing the sequence model and the GNN backbone,GTEA learns more comprehensive node representations capturing both temporal andgraph structural characteristics. Extensive experiments on five large-scalereal-world datasets demonstrate the superiority of GTEA over other inductivemodels.", "output": "GTEA: Inductive Representation Learning on Temporal Interaction Graphs via Temporal Edge Aggregation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Quantum Natural Language Processing (QNLP) deals with the design andimplementation of NLP models intended to be run on quantum hardware. In thispaper, we present results on the first NLP experiments conducted on NoisyIntermediate-Scale Quantum (NISQ) computers for datasets of size greater than100 sentences. Exploiting the formal similarity of the compositional model ofmeaning by Coecke, Sadrzadeh and Clark (2010) with quantum theory, we createrepresentations for sentences that have a natural mapping to quantum circuits.We use these representations to implement and successfully train NLP modelsthat solve simple sentence classification tasks on quantum hardware. We conductquantum simulations that compare the syntax-sensitive model of Coecke et al.with two baselines that use less or no syntax; specifically, we implement thequantum analogues of a \"bag-of-words\" model, where syntax is not taken intoaccount at all, and of a word-sequence model, where only word order isrespected. We demonstrate that all models converge smoothly both in simulationsand when run on quantum hardware, and that the results are the expected onesbased on the nature of the tasks and the datasets used. Another important goalof this paper is to describe in a way accessible to AI and NLP researchers themain principles, process and challenges of experiments on quantum hardware. Ouraim in doing this is to take the first small steps in this unexplored researchterritory and pave the way for practical Quantum Natural Language Processing.", "output": "QNLP in Practice: Running Compositional Models of Meaning on a Quantum Computer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multiple Kernel Learning is a conventional way to learn the kernel functionin kernel-based methods. MKL algorithms enhance the performance of kernelmethods. However, these methods have a lower complexity compared to deeplearning models and are inferior to these models in terms of recognitionaccuracy. Deep learning models can learn complex functions by applyingnonlinear transformations to data through several layers. In this paper, weshow that a typical MKL algorithm can be interpreted as a one-layer neuralnetwork with linear activation functions. By this interpretation, we propose aNeural Generalization of Multiple Kernel Learning (NGMKL), which extends theconventional multiple kernel learning framework to a multi-layer neural networkwith nonlinear activation functions. Our experiments on several benchmarks showthat the proposed method improves the complexity of MKL algorithms and leads tohigher recognition accuracy.", "output": "Neural Generalization of Multiple Kernel Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data sharing remains a major hindering factor when it comes to adoptingemerging AI technologies in general, but particularly in the agri-food sector.Protectiveness of data is natural in this setting; data is a precious commodityfor data owners, which if used properly can provide them with useful insightson operations and processes leading to a competitive advantage. Unfortunately,novel AI technologies often require large amounts of training data in order toperform well, something that in many scenarios is unrealistic. However, recentmachine learning advances, e.g. federated learning and privacy-preservingtechnologies, can offer a solution to this issue via providing theinfrastructure and underpinning technologies needed to use data from varioussources to train models without ever sharing the raw data themselves. In thispaper, we propose a technical solution based on federated learning that usesdecentralized data, (i.e. data that are not exchanged or shared but remain withthe owners) to develop a cross-silo machine learning model that facilitatesdata sharing across supply chains. We focus our data sharing proposition onimproving production optimization through soybean yield prediction, and providepotential use-cases that such methods can assist in other problem settings. Ourresults demonstrate that our approach not only performs better than each of themodels trained on an individual data source, but also that data sharing in theagri-food sector can be enabled via alternatives to data exchange, whilst alsohelping to adopt emerging machine learning technologies to boost productivity.", "output": "The Role of Cross-Silo Federated Learning in Facilitating Data Sharing in the Agri-Food Sector."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a new class of non-linear models for functional data based onneural networks. Deep learning has been very successful in non-linear modeling,but there has been little work done in the functional data setting. We proposetwo variations of our framework: a functional neural network with continuoushidden layers, called the Functional Direct Neural Network (FDNN), and a secondversion that utilizes basis expansions and continuous hidden layers, called theFunctional Basis Neural Network (FBNN). Both are designed explicitly to exploitthe structure inherent in functional data. To fit these models we derive afunctional gradient based optimization algorithm. The effectiveness of theproposed methods in handling complex functional models is demonstrated bycomprehensive simulation studies and real data examples.", "output": "Non-linear Functional Modeling using Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Selecting a minimal feature set that is maximally informative about a targetvariable is a central task in machine learning and statistics. Informationtheory provides a powerful framework for formulating feature selectionalgorithms -- yet, a rigorous, information-theoretic definition of featurerelevancy, which accounts for feature interactions such as redundant andsynergistic contributions, is still missing. We argue that this lack isinherent to classical information theory which does not provide measures todecompose the information a set of variables provides about a target intounique, redundant, and synergistic contributions. Such a decomposition has beenintroduced only recently by the partial information decomposition (PID)framework. Using PID, we clarify why feature selection is a conceptuallydifficult problem when approached using information theory and provide a noveldefinition of feature relevancy and redundancy in PID terms. From thisdefinition, we show that the conditional mutual information (CMI) maximizesrelevancy while minimizing redundancy and propose an iterative, CMI-basedalgorithm for practical feature selection. We demonstrate the power of ourCMI-based algorithm in comparison to the unconditional mutual information onbenchmark examples and provide corresponding PID estimates to highlight how PIDallows to quantify information contribution of features and their interactionsin feature-selection problems.", "output": "A Rigorous Information-Theoretic Definition of Redundancy and Relevancy in Feature Selection Based on (Partial) Information Decomposition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we propose a novel evolutionary algorithm for neuralarchitecture search, applicable to global search spaces. The algorithm'sarchitectural representation organizes the topology in multiple hierarchicalmodules, while the design process exploits this representation, in order toexplore the search space. We also employ a curation system, which promotes theutilization of well performing sub-structures to subsequent generations. Weapply our method to Fashion-MNIST and NAS-Bench101, achieving accuracies of$93.2%$ and $94.8%$ respectively in a relatively small number of generations.", "output": "A Novel Evolutionary Algorithm for Hierarchical Neural Architecture Search."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Since conventional knowledge embedding models cannot take full advantage ofthe abundant textual information, there have been extensive research efforts inenhancing knowledge embedding using texts. However, existing enhancementapproaches cannot apply to temporal knowledge graphs (tKGs), which containtime-dependent event knowledge with complex temporal dynamics. Specifically,existing enhancement approaches often assume knowledge embedding istime-independent. In contrast, the entity embedding in tKG models usuallyevolves, which poses the challenge of aligning temporally relevant texts withentities. To this end, we propose to study enhancing temporal knowledgeembedding with textual data in this paper. As an approach to this task, wepropose Enhanced Temporal Knowledge Embeddings with Contextualized LanguageRepresentations (ECOLA), which takes the temporal aspect into account andinjects textual information into temporal knowledge embedding. To evaluateECOLA, we introduce three new datasets for training and evaluating ECOLA.Extensive experiments show that ECOLA significantly enhances temporal KGembedding models with up to 287% relative improvements regarding Hits@1 on thelink prediction task. The code and models are publicly available on", "output": "ECOLA: Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Few-shot learning aims to transfer the knowledge acquired from training on adiverse set of tasks to unseen tasks from the same task distribution with alimited amount of labeled data. The underlying requirement for effectivefew-shot generalization is to learn a good representation of the task manifold.This becomes more difficult when only a limited number of tasks are availablefor training. In such a few-task few-shot setting, it is beneficial toexplicitly preserve the local neighborhoods from the task manifold and exploitthis to generate artificial tasks for training. To this end, we introduce thenotion of interval bounds from the provably robust training literature tofew-shot learning. The interval bounds are used to characterize neighborhoodsaround the training tasks. These neighborhoods can then be preserved byminimizing the distance between a task and its respective bounds. We then use anovel strategy to artificially form new tasks for training by interpolatingbetween the available tasks and their respective interval bounds. We apply ourframework to both model-agnostic meta-learning as well as prototype-basedmetric-learning paradigms. The efficacy of our proposed approach is evidentfrom the improved performance on several datasets from diverse domains comparedto current methods.", "output": "Interval Bound Interpolation for Few-shot Learning with Few Tasks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Influence estimation analyzes how changes to the training data can lead todifferent model predictions; this analysis can help us better understand thesepredictions, the models making those predictions, and the data sets they'retrained on. However, most influence-estimation techniques are designed for deeplearning models with continuous parameters. Gradient-boosted decision trees(GBDTs) are a powerful and widely-used class of models; however, these modelsare black boxes with opaque decision-making processes. In the pursuit of betterunderstanding GBDT predictions and generally improving these models, we adaptrecent and popular influence-estimation methods designed for deep learningmodels to GBDTs. Specifically, we adapt representer-point methods and TracIn,denoting our new methods TREX and BoostIn, respectively; source code isavailable at We compare thesemethods to LeafInfluence and other baselines using 5 different evaluationmeasures on 22 real-world data sets with 4 popular GBDT implementations. Theseexperiments give us a comprehensive overview of how different approaches toinfluence estimation work in GBDT models. We find BoostIn is an efficientinfluence-estimation method for GBDTs that performs equally well or better thanexisting work while being four orders of magnitude faster. Our evaluation alsosuggests the gold-standard approach of leave-one-out~(LOO) retrainingconsistently identifies the single-most influential training example butperforms poorly at finding the most influential set of training examples for agiven target prediction.", "output": "Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent studies have uncovered that language model distillation is lesseffective when facing a large capacity gap between the teacher and the student,and introduced teacher assistant-based distillation to bridge the gap. As aconnection, the scale and the performance of the teacher assistant is of vitalimportance to bring the knowledge from the teacher to the student. However,existing teacher assistant-based methods require maximally many trials beforescheduling an optimal teacher assistant. To this end, we propose a minimaldistillation schedule (MiniDisc) for scheduling the optimal teacher assistantin minimally one trial. In particular, motivated by the finding that theperformance of the student is positively correlated to the scale-performancetradeoff of the teacher assistant, MiniDisc is designed with a$lambda$-tradeoff to measure the optimality of the teacher assistant withouttrial distillation to the student. MiniDisc then can schedule the optimalteacher assistant with the best $lambda$-tradeoff in a sandwich framework.MiniDisc is evaluated with an extensive set of experiments on GLUE.Experimental results demonstrate the improved efficiency our MiniDisc comparedto several state-of-the-art baselines. We further apply MiniDisc to a languagemodel with billions of parameters and show its scalability.", "output": "MiniDisc: Minimal Distillation Schedule for Language Model Compression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Societal biases are reflected in large pre-trained language models and theirfine-tuned versions on downstream tasks. Common in-processing bias mitigationapproaches, such as adversarial training and mutual information removal,introduce additional optimization criteria, and update the model to reach a newdebiased state. However, in practice, end-users and practitioners might preferto switch back to the original model, or apply debiasing only on a specificsubset of protected attributes. To enable this, we propose a novel modular biasmitigation approach, consisting of stand-alone highly sparse debiasingsubnetworks, where each debiasing module can be integrated into the core modelon-demand at inference time. Our approach draws from the concept of emph{diff}pruning, and proposes a novel training regime adaptable to variousrepresentation disentanglement optimizations. We conduct experiments on threeclassification tasks with gender, race, and age as protected attributes. Theresults show that our modular approach, while maintaining task performance,improves (or at least remains on-par with) the effectiveness of bias mitigationin comparison with baseline finetuning. Particularly on a two-attributedataset, our approach with separately learned debiasing subnetworks showseffective utilization of either or both the subnetworks for selective biasmitigation.", "output": "Modular and On-demand Bias Mitigation with Attribute-Removal Subnetworks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) has recently emerged as a distributed machinelearning paradigm for systems with limited and intermittent connectivity. Thispaper presents the new context brought to FL by satellite constellations, wherethe connectivity patterns are significantly different from the ones observed inconventional terrestrial FL. The focus is on large constellations in low Earthorbit (LEO), where each satellites participates in a data-driven FL task usinga locally stored dataset. This scenario is motivated by the trend towards megaconstellations of interconnected small satellites in LEO and the integration ofartificial intelligence in satellites. We propose a classification of satelliteFL based on the communication capabilities of the satellites, the constellationdesign, and the location of the parameter server. A comprehensive overview ofthe current state-of-the-art in this field is provided and the uniquechallenges and opportunities of satellite FL are discussed. Finally, we outlineseveral open research directions for FL in satellite constellations and presentsome future perspectives on this topic.", "output": "Federated Learning in Satellite Constellations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Model-based approaches to reinforcement learning (MBRL) exhibit favorableperformance in practice, but their theoretical guarantees in large spaces aremostly restricted to the setting when transition model is Gaussian orLipschitz, and demands a posterior estimate whose representational complexitygrows unbounded with time. In this work, we develop a novel MBRL method (i)which relaxes the assumptions on the target transition model to belong to ageneric family of mixture models; (ii) is applicable to large-scale training byincorporating a compression step such that the posterior estimate consists of aBayesian coreset of only statistically significant past state-action pairs; and(iii) exhibits a sublinear Bayesian regret. To achieve these results, we adoptan approach based upon Stein's method, which, under a smoothness condition onthe constructed posterior and target, allows distributional distance to beevaluated in closed form as the kernelized Stein discrepancy (KSD). Theaforementioned compression step is then computed in terms of greedily retainingonly those samples which are more than a certain KSD away from the previousmodel estimate. Experimentally, we observe that this approach is competitivewith several state-of-the-art RL methodologies, and can achieve up-to 50percent reduction in wall clock time in some continuous control environments.", "output": "Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider a federated representation learning framework, where with theassistance of a central server, a group of $N$ distributed clients traincollaboratively over their private data, for the representations (orembeddings) of a set of entities (e.g., users in a social network). Under thisframework, for the key step of aggregating local embeddings trained privatelyat the clients, we develop a secure embedding aggregation protocol namedscheme, which leverages all potential aggregation opportunities among all theclients, while providing privacy guarantees for the set of local entities andcorresponding embeddings emph{simultaneously} at each client, against acurious server and up to $T < N/2$ colluding clients.", "output": "Secure Embedding Aggregation for Federated Representation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks (DNNs) often have to be compressed, via pruning and/orquantization, before they can be deployed in practical settings. In this workwe propose a new compression-aware minimizer dubbed CrAM that modifies theoptimization step in a principled way, in order to produce models whose localloss behavior is stable under compression operations such as pruning. Thus,dense models trained via CrAM should be compressible post-training, in a singlestep, without significant accuracy loss. Experimental results on standardbenchmarks, such as residual networks for ImageNet classification and BERTmodels for language modelling, show that CrAM produces dense models that can bemore accurate than the standard SGD/Adam-based baselines, but which are stableunder weight pruning: specifically, we can prune models in one-shot to 70-80%sparsity with almost no accuracy loss, and to 90% with reasonable ($sim 1%$)accuracy loss, which is competitive with gradual compression methods.Additionally, CrAM can produce sparse models which perform well for transferlearning, and it also works for semi-structured 2:4 pruning patterns supportedby GPU hardware. The code for reproducing the results is available at .", "output": "CrAM: A Compression-Aware Minimizer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce the first work to explore web-scale diffusion models forrobotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by firstinferring a text description of those objects, then generating an imagerepresenting a natural, human-like arrangement of those objects, and finallyphysically arranging the objects according to that goal image. We show thatthis is possible zero-shot using DALL-E, without needing any further examplearrangements, data collection, or training. DALL-E-Bot is fully autonomous andis not restricted to a pre-defined set of objects or scenes, thanks to DALL-E'sweb-scale pre-training. Encouraging real-world results, with both human studiesand objective metrics, show that integrating web-scale diffusion models intorobotics pipelines is a promising direction for scalable, unsupervised robotlearning.", "output": "DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Controller synthesis is in essence a case of model-based planning fornon-deterministic environments in which plans (actually ''strategies'') aremeant to preserve system goals indefinitely. In the case of supervisory controlenvironments are specified as the parallel composition of state machines andvalid strategies are required to be ''non-blocking'' (i.e., always enabling theenvironment to reach certain marked states) in addition to safe (i.e., keep thesystem within a safe zone). Recently, On-the-fly Directed Controller Synthesistechniques were proposed to avoid the exploration of the entire -andexponentially large-environment space, at the cost of non-maximalpermissiveness, to either find a strategy or conclude that there is none. Theincremental exploration of the plant is currently guided by adomain-independent human-designed heuristic. In this work, we propose a newmethod for obtaining heuristics based on Reinforcement Learning (RL). Thesynthesis algorithm is thus framed as an RL task with an unbounded action spaceand a modified version of DQN is used. With a simple and general set offeatures that abstracts both states and actions, we show that it is possible tolearn heuristics on small versions of a problem that generalize to the largerinstances, effectively doing zero-shot policy transfer. Our agents learn fromscratch in a highly partially observable RL task and outperform the existingheuristic overall, in instances unseen during training.", "output": "Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Urban rail transit provides significant comprehensive benefits such as largetraffic volume and high speed, serving as one of the most important componentsof urban traffic construction management and congestion solution. Using realpassenger flow data of an Asian subway system from April to June of 2018, thiswork analyzes the space-time distribution of the passenger flow usingshort-term traffic flow prediction. Stations are divided into four types forpassenger flow forecasting, and meteorological records are collected for thesame period. Then, machine learning methods with different inputs are appliedand multivariate regression is performed to evaluate the improvement effect ofeach weather element on passenger flow forecasting of representative metrostations on hourly basis. Our results show that by inputting weather variablesthe precision of prediction on weekends enhanced while the performance onweekdays only improved marginally, while the contribution of different elementsof weather differ. Also, different categories of stations are affecteddifferently by weather. This study provides a possible method to furtherimprove other prediction models, and attests to the promise of data-drivenanalytics for optimization of short-term scheduling in transit management.", "output": "Exploring the impact of weather on Metro demand forecasting using machine learning method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Rates of missing data often depend on record-keeping policies and thus maychange across times and locations, even when the underlying features arecomparatively stable. In this paper, we introduce the problem of DomainAdaptation under Missingness Shift (DAMS). Here, (labeled) source data and(unlabeled) target data would be exchangeable but for different missing datamechanisms. We show that if missing data indicators are available, DAMS reducesto covariate shift. Addressing cases where such indicators are absent, weestablish the following theoretical results for underreporting completely atrandom: (i) covariate shift is violated (adaptation is required); (ii) theoptimal linear source predictor can perform arbitrarily worse on the targetdomain than always predicting the mean; (iii) the optimal target predictor canbe identified, even when the missingness rates themselves are not; and (iv) forlinear models, a simple analytic adjustment yields consistent estimates of theoptimal target parameters. In experiments on synthetic and semi-synthetic data,we demonstrate the promise of our methods when assumptions hold. Finally, wediscuss a rich family of future extensions.", "output": "Domain Adaptation under Missingness Shift."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many datasets are biased, namely they contain easy-to-learn features that arehighly correlated with the target class only in the dataset but not in the trueunderlying distribution of the data. For this reason, learning unbiased modelsfrom biased data has become a very relevant research topic in the last years.In this work, we tackle the problem of learning representations that are robustto biases. We first present a margin-based theoretical framework that allows usto clarify why recent contrastive losses (InfoNCE, SupCon, etc.) can fail whendealing with biased data. Based on that, we derive a novel formulation of thesupervised contrastive loss (epsilon-SupInfoNCE), providing more accuratecontrol of the minimal distance between positive and negative samples.Furthermore, thanks to our theoretical framework, we also propose FairKL, a newdebiasing regularization loss, that works well even with extremely biased data.We validate the proposed losses on standard vision datasets including CIFAR10,CIFAR100, and ImageNet, and we assess the debiasing capability of FairKL withepsilon-SupInfoNCE, reaching state-of-the-art performance on a number of biaseddatasets, including real instances of biases in the wild.", "output": "Unbiased Supervised Contrastive Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning can generate black-box surrogate models which are bothextremely fast and highly accurate. Rigorously verifying the accuracy of theseblack-box models, however, is computationally challenging. When it comes topower systems, learning AC power flow is the cornerstone of any machinelearning surrogate model wishing to drastically accelerate computations,whether it is for optimization, control, or dynamics. This paper develops forthe first time, to our knowledge, a tractable neural network verificationprocedure which incorporates the ground truth of the non-linear AC power flowequations to determine worst-case neural network performance. Our approach,termed Sequential Targeted Tightening (STT), leverages a loosely convexifiedreformulation of the original verification problem, which is a mixed integerquadratic program (MIQP). Using the sequential addition of targeted cuts, weiteratively tighten our formulation until either the solution is sufficientlytight or a satisfactory performance guarantee has been generated. Afterlearning neural network models of the 14, 57, 118, and 200-bus PGLib testcases, we compare the performance guarantees generated by our STT procedurewith ones generated by a state-of-the-art MIQP solver, Gurobi 9.5. We show thatSTT often generates performance guarantees which are orders of magnitudetighter than the MIQP upper bound.", "output": "Global Performance Guarantees for Neural Network Models of AC Power Flow."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Separation of multiple singing voices into each voice is a rarely studiedarea in music source separation research. The absence of a benchmark datasethas hindered its progress. In this paper, we present an evaluation dataset andprovide baseline studies for multiple singing voices separation. First, weintroduce MedleyVox, an evaluation dataset for multiple singing voicesseparation. We specify the problem definition in this dataset by categorizingit into i) unison, ii) duet, iii) main vs. rest, and iv) N-singing separation.Second, to overcome the absence of existing multi-singing datasets for atraining purpose, we present a strategy for construction of multiple singingmixtures using various single-singing datasets. Third, we propose the improvedsuper-resolution network (iSRNet), which greatly enhances initial estimates ofseparation networks. Jointly trained with the Conv-TasNet and the multi-singingmixture construction strategy, the proposed iSRNet achieved comparableperformance to ideal time-frequency masks on duet and unison subsets ofMedleyVox. Audio samples, the dataset, and codes are available on our website(", "output": "MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a sparse algebra for samplet compressed kernel matrices, to enableefficient scattered data analysis. We show the compression of kernel matricesby means of samplets produces optimally sparse matrices in a certain S-format.It can be performed in cost and memory that scale essentially linearly with thematrix size $N$, for kernels of finite differentiability, along with additionand multiplication of S-formatted matrices. We prove and exploit the fact thatthe inverse of a kernel matrix (if it exists) is compressible in the S-formatas well. Selected inversion allows to directly compute the entries in thecorresponding sparsity pattern. The S-formatted matrix operations enable theefficient, approximate computation of more complicated matrix functions such as${bm A}^alpha$ or $exp({bm A})$. The matrix algebra is justifiedmathematically by pseudo differential calculus. As an application, efficientGaussian process learning algorithms for spatial statistics is considered.Numerical results are presented to illustrate and quantify our findings.", "output": "Multiresolution kernel matrix algebra."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reasoning, as an essential ability for complex problem-solving, can provideback-end support for various real-world applications, such as medicaldiagnosis, negotiation, etc. This paper provides a comprehensive survey ofcutting-edge research on reasoning with language model prompting. We introduceresearch works with comparisons and summaries and provide systematic resourcesto help beginners. We also discuss the potential reasons for emerging suchreasoning abilities and highlight future research directions. Resources areavailable at (updatedperiodically).", "output": "Reasoning with Language Model Prompting: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "For improving short-length codes, we demonstrate that classic decoders canalso be used with real-valued, neural encoders, i.e., deep-learning basedcodeword sequence generators. Here, the classical decoder can be a valuabletool to gain insights into these neural codes and shed light on weaknesses.Specifically, the turbo-autoencoder is a recently developed channel codingscheme where both encoder and decoder are replaced by neural networks. We firstshow that the limited receptive field of convolutional neural network(CNN)-based codes enables the application of the BCJR algorithm to optimallydecode them with feasible computational complexity. These maximum a posteriori(MAP) component decoders then are used to form classical (iterative) turbodecoders for parallel or serially concatenated CNN encoders, offering aclose-to-maximum likelihood (ML) decoding of the learned codes. To the best ofour knowledge, this is the first time that a classical decoding algorithm isapplied to a non-trivial, real-valued neural code. Furthermore, as the BCJRalgorithm is fully differentiable, it is possible to train, or fine-tune, theneural encoder in an end-to-end fashion.", "output": "Optimizing Serially Concatenated Neural Codes with Classical Decoders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, the stochastic Polyak step size (SPS) has emerged as a competitiveadaptive step size scheme for stochastic gradient descent. Here we developProxSPS, a proximal variant of SPS that can handle regularization terms.Developing a proximal variant of SPS is particularly important, since SPSrequires a lower bound of the objective function to work well. When theobjective function is the sum of a loss and a regularizer, available estimatesof a lower bound of the sum can be loose. In contrast, ProxSPS only requires alower bound for the loss which is often readily available. As a consequence, weshow that ProxSPS is easier to tune and more stable in the presence ofregularization. Furthermore for image classification tasks, ProxSPS performs aswell as AdamW with little to no tuning, and results in a network with smallerweight parameters. We also provide an extensive convergence analysis forProxSPS that includes the non-smooth, smooth, weakly convex and strongly convexsetting.", "output": "A Stochastic Proximal Polyak Step Size."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Class-incremental learning (CIL) aims to train a classification model whilethe number of classes increases phase-by-phase. An inherent challenge of CIL isthe stability-plasticity tradeoff, i.e., CIL models should keep stable toretain old knowledge and keep plastic to absorb new knowledge. However, none ofthe existing CIL models can achieve the optimal tradeoff in differentdata-receiving settings--where typically the training-from-half (TFH) settingneeds more stability, but the training-from-scratch (TFS) needs moreplasticity. To this end, we design an online learning method that canadaptively optimize the tradeoff without knowing the setting as a priori.Specifically, we first introduce the key hyperparameters that influence thetrade-off, e.g., knowledge distillation (KD) loss weights, learning rates, andclassifier types. Then, we formulate the hyperparameter optimization process asan online Markov Decision Process (MDP) problem and propose a specificalgorithm to solve it. We apply local estimated rewards and a classic banditalgorithm Exp3 to address the issues when applying online MDP methods to theCIL protocol. Our method consistently improves top-performing CIL methods inboth TFH and TFS settings, e.g., boosting the average accuracy of TFH and TFSby 2.2 percentage points on ImageNet-Full, compared to the state-of-the-art.", "output": "Online Hyperparameter Optimization for Class-Incremental Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spectral bias is an important observation of neural network training, statingthat the network will learn a low frequency representation of the targetfunction before converging to higher frequency components. This property isinteresting due to its link to good generalization in over-parameterizednetworks. However, in low dimensional settings, a severe spectral bias occursthat obstructs convergence to high frequency components entirely. In order toovercome this limitation, one can encode the inputs using a high frequencysinusoidal encoding. Previous works attempted to explain this phenomenon usingNeural Tangent Kernel (NTK) and Fourier analysis. However, NTK does not capturereal network dynamics, and Fourier analysis only offers a global perspective onthe network properties that induce this bias. In this paper, we provide a novelapproach towards understanding spectral bias by directly studying ReLU MLPtraining dynamics. Specifically, we focus on the connection between thecomputations of ReLU networks (activation regions), and the speed of gradientdescent convergence. We study these dynamics in relation to the spatialinformation of the signal to understand how they influence spectral bias. Wethen use this formulation to study the severity of spectral bias in lowdimensional settings, and how positional encoding overcomes this.", "output": "Understanding the Spectral Bias of Coordinate Based MLPs Via Training Dynamics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper provide several mathematical analyses of the diffusion model inmachine learning. The drift term of the backwards sampling process isrepresented as a conditional expectation involving the data distribution andthe forward diffusion. The training process aims to find such a drift functionby minimizing the mean-squared residue related to the conditional expectation.Using small-time approximations of the Green's function of the forwarddiffusion, we show that the analytical mean drift function in DDPM and thescore function in SGM asymptotically blow up in the final stages of thesampling process for singular data distributions such as those concentrated onlower-dimensional manifolds, and is therefore difficult to approximate by anetwork. To overcome this difficulty, we derive a new target function andassociated loss, which remains bounded even for singular data distributions. Weillustrate the theoretical findings with several numerical examples.", "output": "Mathematical analysis of singularities in the diffusion model under the submanifold assumption."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, pre-trained large language models have demonstratedremarkable efficiency in achieving an inference-time few-shot learningcapability known as in-context learning. However, existing literature hashighlighted the sensitivity of this capability to the selection of few-shotdemonstrations. The underlying mechanisms by which this capability arises fromregular language model pretraining objectives remain poorly understood. In thisstudy, we aim to examine the in-context learning phenomenon through a Bayesianlens, viewing large language models as topic models that implicitly infertask-related information from demonstrations. On this premise, we propose analgorithm for selecting optimal demonstrations from a set of annotated data anddemonstrate a significant 12.5% improvement relative to the random selectionbaseline, averaged over eight GPT2 and GPT3 models on eight differentreal-world text classification datasets. Our empirical findings support ourhypothesis that large language models implicitly infer a latent conceptvariable.", "output": "Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Symbolic Regression (SR) can generate interpretable, concise expressions thatfit a given dataset, allowing for more human understanding of the structurethan black-box approaches. The addition of background knowledge (in the form ofsymbolic mathematical constraints) allows for the generation of expressionsthat are meaningful with respect to theory while also being consistent withdata. We specifically examine the addition of constraints to traditionalgenetic algorithm (GA) based SR (PySR) as well as a Markov-chain Monte Carlo(MCMC) based Bayesian SR architecture (Bayesian Machine Scientist), and applythese to rediscovering adsorption equations from experimental, historicaldatasets. We find that, while hard constraints prevent GA and MCMC SR fromsearching, soft constraints can lead to improved performance both in terms ofsearch effectiveness and model meaningfulness, with computational costsincreasing by about an order-of-magnitude. If the constraints do not correlatewell with the dataset or expected models, they can hinder the search ofexpressions. We find Bayesian SR is better these constraints (as the Bayesianprior) than by modifying the fitness function in the GA", "output": "Incorporating Background Knowledge in Symbolic Regression using a Computer Algebra System."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Assortment optimization has received active explorations in the past fewdecades due to its practical importance. Despite the extensive literaturedealing with optimization algorithms and latent score estimation, uncertaintyquantification for the optimal assortment still needs to be explored and is ofgreat practical significance. Instead of estimating and recovering the completeoptimal offer set, decision-makers may only be interested in testing whether agiven property holds true for the optimal assortment, such as whether theyshould include several products of interest in the optimal set, or how manycategories of products the optimal set should include. This paper proposes anovel inferential framework for testing such properties. We consider the widelyadopted multinomial logit (MNL) model, where we assume that each customer willpurchase an item within the offered products with a probability proportional tothe underlying preference score associated with the product. We reduceinferring a general optimal assortment property to quantifying the uncertaintyassociated with the sign change point detection of the marginal revenue gaps.We show the asymptotic normality of the marginal revenue gap estimator, andconstruct a maximum statistic via the gap estimators to detect the sign changepoint. By approximating the distribution of the maximum statistic withmultiplier bootstrap techniques, we propose a valid testing procedure. We alsoconduct numerical experiments to assess the performance of our method.", "output": "Combinatorial Inference on the Optimal Assortment in Multinomial Logit Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Few-shot learning is valuable in many real-world applications, but learning ageneralizable model without overfitting to the few labeled datapoints ischallenging. In this work, we focus on Few-shot Learning with Auxiliary Data(FLAD), a training paradigm that assumes access to auxiliary data duringfew-shot learning in hopes of improving generalization. Previous works haveproposed automated methods for mixing auxiliary and target data, but thesemethods typically scale linearly (or worse) with the number of auxiliarydatasets, limiting their practicality. In this work we relate FLAD to theexplore-exploit dilemma that is central to the multi-armed bandit setting andderive algorithms whose computational complexity is independent of the numberof auxiliary datasets, allowing us to scale to 100x more auxiliary datasetsthan prior methods. We propose two algorithms -- EXP3-FLAD and UCB1-FLAD -- andcompare them with prior FLAD methods that either explore or exploit, findingthat the combination of exploration and exploitation is crucial. Throughextensive experimentation we find that our methods outperform all pre-existingFLAD methods by 4% and lead to the first 3 billion parameter language modelsthat outperform the 175 billion parameter GPT-3. Overall, our work suggeststhat the discovery of better, more efficient mixing strategies for FLAD mayprovide a viable path towards substantially improving generalization infew-shot learning.", "output": "Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances in Transformers have come with a huge requirement oncomputing resources, highlighting the importance of developing efficienttraining techniques to make Transformer training faster, at lower cost, and tohigher accuracy by the efficient use of computation and memory resources. Thissurvey provides the first systematic overview of the efficient training ofTransformers, covering the recent progress in acceleration arithmetic andhardware, with a focus on the former. We analyze and compare methods that savecomputation and memory costs for intermediate tensors during training, togetherwith techniques on hardware/algorithm co-design. We finally discuss challengesand promising areas for future research.", "output": "A Survey on Efficient Training of Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transfer learning is known to perform efficiently in many applicationsempirically, yet limited literature reports the mechanism behind the scene.This study establishes both formal derivations and heuristic analysis toformulate the theory of transfer learning in deep learning. Our frameworkutilizing layer variational analysis proves that the success of transferlearning can be guaranteed with corresponding data conditions. Moreover, ourtheoretical calculation yields intuitive interpretations towards the knowledgetransfer process. Subsequently, an alternative method for network-basedtransfer learning is derived. The method shows an increase in efficiency andaccuracy for domain adaptation. It is particularly advantageous when new domaindata is sufficiently sparse during adaptation. Numerical experiments overdiverse tasks validated our theory and verified that our analytic expressionachieved better performance in domain adaptation than the gradient descentmethod.", "output": "Interpretations of Domain Adaptations via Layer Variational Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Analysis of Electrochemical Impedance Spectroscopy (EIS) data forelectrochemical systems often consists of defining an Equivalent Circuit Model(ECM) using expert knowledge and then optimizing the model parameters todeconvolute various resistance, capacitive, inductive, or diffusion responses.For small data sets, this procedure can be conducted manually; however, it isnot feasible to manually define a proper ECM for extensive data sets with awide range of EIS responses. Automatic identification of an ECM wouldsubstantially accelerate the analysis of large sets of EIS data. We showcasemachine learning methods to classify the ECMs of 9,300 impedance spectraprovided by QuantumScape for the BatteryDEV hackathon. The best-performingapproach is a gradient-boosted tree model utilizing a library to automaticallygenerate features, followed by a random forest model using the raw spectraldata. A convolutional neural network using boolean images of Nyquistrepresentations is presented as an alternative, although it achieves a loweraccuracy. We publish the data and open source the associated code. Theapproaches described in this article can serve as benchmarks for furtherstudies. A key remaining challenge is the identifiability of the labels,underlined by the model performances and the comparison of misclassifiedspectra.", "output": "Machine Learning Benchmarks for the Classification of Equivalent Circuit Models from Electrochemical Impedance Spectra."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the problem of detecting the correlation between two Gaussiandatabases $mathsf{X}inmathbb{R}^{ntimes d}$ and $mathsf{Y}^{ntimes d}$,each composed of $n$ users with $d$ features. This problem is relevant in theanalysis of social media, computational biology, etc. We formulate this as ahypothesis testing problem: under the null hypothesis, these two databases arestatistically independent. Under the alternative, however, there exists anunknown permutation $sigma$ over the set of $n$ users (or, row permutation),such that $mathsf{X}$ is $rho$-correlated with $mathsf{Y}^sigma$, apermuted version of $mathsf{Y}$. We determine sharp thresholds at whichoptimal testing exhibits a phase transition, depending on the asymptotic regimeof $n$ and $d$. Specifically, we prove that if $rho^2dto0$, as $dtoinfty$,then weak detection (performing slightly better than random guessing) isstatistically impossible, irrespectively of the value of $n$. This complimentsthe performance of a simple test that thresholds the sum all entries of$mathsf{X}^Tmathsf{Y}$. Furthermore, when $d$ is fixed, we prove that strongdetection (vanishing error probability) is impossible for any$rho<rho^star$, where $rho^star$ is an explicit function of $d$, whileweak detection is again impossible as long as $rho^2dto0$. These resultsclose significant gaps in current recent related studies.", "output": "Phase Transitions in the Detection of Correlated Databases."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "To generalize across tasks, an agent should acquire knowledge from past tasksthat facilitate adaptation and exploration in future tasks. We focus on theproblem of in-context adaptation and exploration, where an agent only relies oncontext, i.e., history of states, actions and/or rewards, rather thangradient-based updates. Posterior sampling (extension of Thompson sampling) isa promising approach, but it requires Bayesian inference and dynamicprogramming, which often involve unknowns (e.g., a prior) and costlycomputations. To address these difficulties, we use a transformer to learn aninference process from training tasks and consider a hypothesis space ofpartial models, represented as small Markov decision processes that are cheapfor dynamic programming. In our version of the Symbolic Alchemy benchmark, ourmethod's adaptation speed and exploration-exploitation balance approach thoseof an exact posterior sampling oracle. We also show that even though partialmodels exclude relevant information from the environment, they can neverthelesslead to good policies.", "output": "Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we present a novel physics-based data-driven framework forreduced-order modeling of laser ignition in a model rocket combustor based onparameterized neural ordinary differential equations (PNODE). Deep neuralnetworks are embedded as functions of high-dimensional parameters of laserignition to predict various terms in a 0D flow model including the heat sourcefunction, pre-exponential factors, and activation energy. Using the governingequations of a 0D flow model, our PNODE needs only a limited number of trainingsamples and predicts trajectories of various quantities such as temperature,pressure, and mass fractions of species while satisfying physical constraints.We validate our physics-based PNODE on solution snapshots of high-fidelityComputational Fluid Dynamics (CFD) simulations of laser-induced ignition in aprototype rocket combustor. We compare the performance of our physics-basedPNODE with that of kernel ridge regression and fully connected neural networks.Our results show that our physics-based PNODE provides solutions with lowermean absolute errors of average temperature over time, thus improving theprediction of successful laser ignition with high-dimensional parameters.", "output": "Physics-based parameterized neural ordinary differential equations: prediction of laser ignition in a rocket combustor."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Whole Slide Images (WSIs) or histopathology images are used in digitalpathology. WSIs pose great challenges to deep learning models for clinicaldiagnosis, owing to their size and lack of pixel-level annotations. With therecent advancements in computational pathology, newer multiple-instancelearning-based models have been proposed. Multiple-instance learning for WSIsnecessitates creating patches and uses the encoding of these patches fordiagnosis. These models use generic pre-trained models (ResNet-50 pre-trainedon ImageNet) for patch encoding. The recently proposed KimiaNet, a DenseNet121model pre-trained on TCGA slides, is a domain-specific pre-trained model. Thispaper shows the effect of domain-specific pre-training on WSI classification.To investigate the effect of domain-specific pre-training, we considered thecurrent state-of-the-art multiple-instance learning models, 1) CLAM, anattention-based model, and 2) TransMIL, a self-attention-based model, andevaluated the models' confidence and predictive performance in detectingprimary brain tumors - gliomas. Domain-specific pre-training improves theconfidence of the models and also achieves a new state-of-the-art performanceof WSI-based glioma subtype classification, showing a high clinicalapplicability in assisting glioma diagnosis. We will publicly share our codeand experimental results at", "output": "Domain-Specific Pre-training Improves Confidence in Whole Slide Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep unsupervised approaches are gathering increased attention forapplications such as pathology detection and segmentation in medical imagessince they promise to alleviate the need for large labeled datasets and aremore generalizable than their supervised counterparts in detecting any kind ofrare pathology. As the Unsupervised Anomaly Detection (UAD) literaturecontinuously grows and new paradigms emerge, it is vital to continuouslyevaluate and benchmark new methods in a common framework, in order to reassessthe state-of-the-art (SOTA) and identify promising research directions. To thisend, we evaluate a diverse selection of cutting-edge UAD methods on multiplemedical datasets, comparing them against the established SOTA in UAD for brainMRI. Our experiments demonstrate that newly developed feature-modeling methodsfrom the industrial and medical literature achieve increased performancecompared to previous work and set the new SOTA in a variety of modalities anddatasets. Additionally, we show that such methods are capable of benefitingfrom recently developed self-supervised pre-training algorithms, furtherincreasing their performance. Finally, we perform a series of experiments inorder to gain further insights into some unique characteristics of selectedmodels and datasets. Our code can be found under", "output": "Unsupervised Pathology Detection: A Deep Dive Into the State of the Art."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Determining causal effects of interventions onto outcomes from real-world,observational (non-randomized) data, e.g., treatment repurposing usingelectronic health records, is challenging due to underlying bias. Causal deeplearning has improved over traditional techniques for estimating individualizedtreatment effects (ITE). We present the Doubly Robust VariationalInformation-theoretic Deep Adversarial Learning (DR-VIDAL), a novel generativeframework that combines two joint models of treatment and outcome, ensuring anunbiased ITE estimation even when one of the two is misspecified. DR-VIDALintegrates: (i) a variational autoencoder (VAE) to factorize confounders intolatent variables according to causal assumptions; (ii) an information-theoreticgenerative adversarial network (Info-GAN) to generate counterfactuals; (iii) adoubly robust block incorporating treatment propensities for outcomepredictions. On synthetic and real-world datasets (Infant Health andDevelopment Program, Twin Birth Registry, and National Supported Work Program),DR-VIDAL achieves better performance than other non-generative and generativemethods. In conclusion, DR-VIDAL uniquely fuses causal assumptions, VAE,Info-GAN, and doubly robustness into a comprehensive, performant framework.Code is available at: MIT license.", "output": "DR-VIDAL -- Doubly Robust Variational Information-theoretic Deep Adversarial Learning for Counterfactual Prediction and Treatment Effect Estimation on Real World Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In autonomous driving, data augmentation is commonly used for improving 3Dobject detection. The most basic methods include insertion of copied objectsand rotation and scaling of the entire training frame. Numerous variants havebeen developed as well. The existing methods, however, are considerably limitedwhen compared to the variety of the real world possibilities. In this work, wedevelop a diversified and realistic augmentation method that can flexiblyconstruct a whole-body object, freely locate and rotate the object, and applyself-occlusion and external-occlusion accordingly. To improve the diversity ofthe whole-body object construction, we develop an iterative method thatstochastically combines multiple objects observed from the real world into asingle object. Unlike the existing augmentation methods, the constructedobjects can be randomly located and rotated in the training frame becauseproper occlusions can be reflected to the whole-body objects in the final step.Finally, proper self-occlusion at each local object level andexternal-occlusion at the global frame level are applied using the Hidden PointRemoval (HPR) algorithm that is computationally efficient. HPR is also used foradaptively controlling the point density of each object according to theobject's distance from the LiDAR. Experiment results show that the proposedDR.CPO algorithm is data-efficient and model-agnostic without incurring anycomputational overhead. Also, DR.CPO can improve mAP performance by 2.08% whencompared to the best 3D detection result known for KITTI dataset. The code isavailable at ", "output": "DR.CPO: Diversified and Realistic 3D Augmentation via Iterative Construction, Random Placement, and HPR Occlusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances in generative artificial intelligence (AI) have capturedworldwide attention. Tools such as Dalle-2 and ChatGPT suggest that taskspreviously thought to be beyond the capabilities of AI may now augment theproductivity of creative media in various new ways, including through thegeneration of synthetic video. This research paper explores the utility ofusing AI-generated synthetic video to create viable educational content foronline educational settings. To date, there is limited research investigatingthe real-world educational value of AI-generated synthetic media. To addressthis gap, we examined the impact of using AI-generated synthetic video in anonline learning platform on both learners content acquisition and learningexperience. We took a mixed-method approach, randomly assigning adult learners(n=83) into one of two micro-learning conditions, collecting pre- andpost-learning assessments, and surveying participants on their learningexperience. The control condition included a traditionally produced instructorvideo, while the experimental condition included a synthetic video with arealistic AI-generated character. The results show that learners in bothconditions demonstrated significant improvement from pre- to post-learning(p<.001), with no significant differences in gains between the two conditions(p=.80). In addition, no differences were observed in how learners perceivedthe traditional and synthetic videos. These findings suggest that AI-generatedsynthetic learning videos have the potential to be a viable substitute forvideos produced via traditional methods in online educational settings, makinghigh quality educational content more accessible across the globe.", "output": "Generative AI for learning: Investigating the potential of synthetic learning videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We provide a unified framework for characterizing pure and approximatedifferentially private (DP) learnabiliity. The framework uses the language ofgraph theory: for a concept class $mathcal{H}$, we define the contradictiongraph $G$ of $mathcal{H}$. It vertices are realizable datasets, and twodatasets $S,S'$ are connected by an edge if they contradict each other (i.e.,there is a point $x$ that is labeled differently in $S$ and $S'$). Our mainfinding is that the combinatorial structure of $G$ is deeply related tolearning $mathcal{H}$ under DP. Learning $mathcal{H}$ under pure DP iscaptured by the fractional clique number of $G$. Learning $mathcal{H}$ underapproximate DP is captured by the clique number of $G$. Consequently, weidentify graph-theoretic dimensions that characterize DP learnability: theclique dimension and fractional clique dimension. Along the way, we revealproperties of the contradiction graph which may be of independent interest. Wealso suggest several open questions and directions for future research.", "output": "A Unified Characterization of Private Learnability via Graph Theory."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unsupervised discovery of stories with correlated news articles in real-timehelps people digest massive news streams without expensive human annotations. Acommon approach of the existing studies for unsupervised online story discoveryis to represent news articles with symbolic- or graph-based embedding andincrementally cluster them into stories. Recent large language models areexpected to improve the embedding further, but a straightforward adoption ofthe models by indiscriminately encoding all information in articles isineffective to deal with text-rich and evolving news streams. In this work, wepropose a novel thematic embedding with an off-the-shelf pretrained sentenceencoder to dynamically represent articles and stories by considering theirshared temporal themes. To realize the idea for unsupervised online storydiscovery, a scalable framework USTORY is introduced with two main techniques,theme- and time-aware dynamic embedding and novelty-aware adaptive clustering,fueled by lightweight story summaries. A thorough evaluation with real newsdata sets demonstrates that USTORY achieves higher story discovery performancesthan baselines while being robust and scalable to various streaming settings.", "output": "Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent years have witnessed an exponential increase in the demand for facevideo compression, and the success of artificial intelligence has expanded theboundaries beyond traditional hybrid video coding. Generative coding approacheshave been identified as promising alternatives with reasonable perceptualrate-distortion trade-offs, leveraging the statistical priors of face videos.However, the great diversity of distortion types in spatial and temporaldomains, ranging from the traditional hybrid coding frameworks to generativemodels, present grand challenges in compressed face video quality assessment(VQA). In this paper, we introduce the large-scale Compressed Face VideoQuality Assessment (CFVQA) database, which is the first attempt tosystematically understand the perceptual quality and diversified compressiondistortions in face videos. The database contains 3,240 compressed face videoclips in multiple compression levels, which are derived from 135 source videoswith diversified content using six representative video codecs, including twotraditional methods based on hybrid coding frameworks, two end-to-end methods,and two generative methods. In addition, a FAce VideO IntegeRity (FAVOR) indexfor face video compression was developed to measure the perceptual quality,considering the distinct content characteristics and temporal priors of theface videos. Experimental results exhibit its superior performance on theproposed CFVQA dataset. The benchmark is now made publicly available at:", "output": "Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wilddatasets is important to capture the diversity in human speech such as speakeridentities, prosodies, and styles (e.g., singing). Current large TTS systemsusually quantize speech into discrete tokens and use language models togenerate these tokens one by one, which suffer from unstable prosody, wordskipping/repeating issue, and poor voice quality. In this paper, we developNaturalSpeech 2, a TTS system that leverages a neural audio codec with residualvector quantizers to get the quantized latent vectors and uses a diffusionmodel to generate these latent vectors conditioned on text input. To enhancethe zero-shot capability that is important to achieve diverse speech synthesis,we design a speech prompting mechanism to facilitate in-context learning in thediffusion model and the duration/pitch predictor. We scale NaturalSpeech 2 tolarge-scale datasets with 44K hours of speech and singing data and evaluate itsvoice quality on unseen speakers. NaturalSpeech 2 outperforms previous TTSsystems by a large margin in terms of prosody/timbre similarity, robustness,and voice quality in a zero-shot setting, and performs novel zero-shot singingsynthesis with only a speech prompt. Audio samples are available at", "output": "NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks have demonstrated remarkable performance in supervisedlearning tasks but require large amounts of labeled data. Self-supervisedlearning offers an alternative paradigm, enabling the model to learn from datawithout explicit labels. Information theory has been instrumental inunderstanding and optimizing deep neural networks. Specifically, theinformation bottleneck principle has been applied to optimize the trade-offbetween compression and relevant information preservation in supervisedsettings. However, the optimal information objective in self-supervisedlearning remains unclear. In this paper, we review various approaches toself-supervised learning from an information-theoretic standpoint and present aunified framework that formalizes the self-supervised information-theoreticlearning problem. We integrate existing research into a coherent framework,examine recent self-supervised methods, and identify research opportunities andchallenges. Moreover, we discuss empirical measurement of information-theoreticquantities and their estimators. This paper offers a comprehensive review ofthe intersection between information theory, self-supervised learning, and deepneural networks.", "output": "To Compress or Not to Compress- Self-Supervised Learning and Information Theory: A Review."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The aim of this paper is to investigate the connection between learningtrajectories of the Deep Neural Networks (DNNs) and their correspondinggeneralization capabilities when being optimized with broadly used gradientdescent and stochastic gradient descent algorithms. In this paper, we constructLinear Approximation Function to model the trajectory information and wepropose a new generalization bound with richer trajectory information based onit. Our proposed generalization bound relies on the complexity of learningtrajectory and the ratio between the bias and diversity of training set.Experimental results indicate that the proposed method effectively captures thegeneralization trend across various training steps, learning rates, and labelnoise levels.", "output": "Learning Trajectories are Generalization Indicators."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large multimodal datasets have been instrumental in recent breakthroughs suchas CLIP, Stable Diffusion, and GPT-4. At the same time, datasets rarely receivethe same research attention as model architectures or training algorithms. Toaddress this shortcoming in the machine learning ecosystem, we introduceDataComp, a benchmark where the training code is fixed and researchers innovateby proposing new training sets. We provide a testbed for dataset experimentscentered around a new candidate pool of 12.8B image-text pairs from CommonCrawl. Participants in our benchmark design new filtering techniques or curatenew data sources and then evaluate their new dataset by running ourstandardized CLIP training code and testing on 38 downstream test sets. Ourbenchmark consists of multiple scales, with four candidate pool sizes andassociated compute budgets ranging from 12.8M to 12.8B samples seen duringtraining. This multi-scale design facilitates the study of scaling trends andmakes the benchmark accessible to researchers with varying resources.Our baseline experiments show that the DataComp workflow is a promising wayof improving multimodal datasets. We introduce DataComp-1B, a dataset createdby applying a simple filtering algorithm to the 12.8B candidate pool. Theresulting 1.4B subset enables training a CLIP ViT-L/14 from scratch to 79.2%zero-shot accuracy on ImageNet. Our new ViT-L/14 model outperforms a largerViT-g/14 trained on LAION-2B by 0.7 percentage points while requiring 9x lesstraining compute. We also outperform OpenAI's CLIP ViT-L/14 by 3.7 percentagepoints, which is trained with the same compute budget as our model. These gainshighlight the potential for improving model performance by carefully curatingtraining sets. We view DataComp-1B as only the first step and hope thatDataComp paves the way toward the next generation of multimodal datasets.", "output": "DataComp: In search of the next generation of multimodal datasets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurate 3D object detection in all weather conditions remains a keychallenge to enable the widespread deployment of autonomous vehicles, as mostwork to date has been performed on clear weather data. In order to generalizeto adverse weather conditions, supervised methods perform best if trained fromscratch on all weather data instead of finetuning a model pretrained on clearweather data. Training from scratch on all data will eventually becomecomputationally infeasible and expensive as datasets continue to grow andencompass the full extent of possible weather conditions. On the other hand,naive finetuning on data from a different weather domain can result incatastrophic forgetting of the previously learned domain. Inspired by thesuccess of replay-based continual learning methods, we propose Gradient-basedMaximally Interfered Retrieval (GMIR), a gradient based sampling strategy forreplay. During finetuning, GMIR periodically retrieves samples from theprevious domain dataset whose gradient vectors show maximal interference withthe gradient vector of the current update. Our 3D object detection experimentson the SeeingThroughFog (STF) dataset show that GMIR not only overcomesforgetting but also offers competitive performance compared to scratch trainingon all data with a 46.25% reduction in total training time.", "output": "Gradient-based Maximally Interfered Retrieval for Domain Incremental 3D Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Experts advising decision-makers are likely to display expertise which variesas a function of the problem instance. In practice, this may lead tosub-optimal or discriminatory decisions against minority cases. In this work wemodel such changes in depth and breadth of knowledge as a partitioning of theproblem space into regions of differing expertise. We provide here newalgorithms that explicitly consider and adapt to the relationship betweenproblem instances and experts' knowledge. We first propose and highlight thedrawbacks of a naive approach based on nearest neighbor queries. To addressthese drawbacks we then introduce a novel algorithm - expertise trees - thatconstructs decision trees enabling the learner to select appropriate models. Weprovide theoretical insights and empirically validate the improved performanceof our novel approach on a range of problems for which existing methods provedto be inadequate.", "output": "Expertise Trees Resolve Knowledge Limitations in Collective Decision-Making."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most exploration research on reinforcement learning (RL) has paid attentionto `the way of exploration', which is `how to explore'. The other explorationresearch, `when to explore', has not been the main focus of RL explorationresearch. The issue of `when' of a monolithic exploration in the usual RLexploration behaviour binds an exploratory action to an exploitational actionof an agent. Recently, a non-monolithic exploration research has emerged toexamine the mode-switching exploration behaviour of humans and animals. Theultimate purpose of our research is to enable an agent to decide when toexplore or exploit autonomously. We describe the initial research of anautonomous multi-mode exploration of non-monolithic behaviour in an optionsframework. The higher performance of our method is shown against the existingnon-monolithic exploration method through comparative experimental results.", "output": "An Autonomous Non-monolithic Agent with Multi-mode Exploration based on Options Framework."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "PySR is an open-source library for practical symbolic regression, a type ofmachine learning which aims to discover human-interpretable symbolic models.PySR was developed to democratize and popularize symbolic regression for thesciences, and is built on a high-performance distributed back-end, a flexiblesearch algorithm, and interfaces with several deep learning packages. PySR'sinternal search algorithm is a multi-population evolutionary algorithm, whichconsists of a unique evolve-simplify-optimize loop, designed for optimizationof unknown scalar constants in newly-discovered empirical expressions. PySR'sbackend is the extremely optimized Julia library SymbolicRegression.jl, whichcan be used directly from Julia. It is capable of fusing user-defined operatorsinto SIMD kernels at runtime, performing automatic differentiation, anddistributing populations of expressions to thousands of cores across a cluster.In describing this software, we also introduce a new benchmark,\"EmpiricalBench,\" to quantify the applicability of symbolic regressionalgorithms in science. This benchmark measures recovery of historical empiricalequations from original and synthetic datasets.", "output": "Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dataset Distillation aims to distill an entire dataset's knowledge into a fewsynthetic images. The idea is to synthesize a small number of synthetic datapoints that, when given to a learning algorithm as training data, result in amodel approximating one trained on the original data. Despite recent progressin the field, existing dataset distillation methods fail to generalize to newarchitectures and scale to high-resolution datasets. To overcome the aboveissues, we propose to use the learned prior from pre-trained deep generativemodels to synthesize the distilled data. To achieve this, we present a newoptimization algorithm that distills a large number of images into a fewintermediate feature vectors in the generative model's latent space. Our methodaugments existing techniques, significantly improving cross-architecturegeneralization in all settings.", "output": "Generalizing Dataset Distillation via Deep Generative Prior."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning methods have advanced quickly in brain imaging analysis overthe past few years, but they are usually restricted by the limited labeleddata. Pre-trained model on unlabeled data has presented promising improvementin feature learning in many domains, including natural language processing andcomputer vision. However, this technique is under-explored in brain networkanalysis. In this paper, we focused on pre-training methods with Transformernetworks to leverage existing unlabeled data for brain functional networkclassification. First, we proposed a Transformer-based neural network, named asBrainNPT, for brain functional network classification. The proposed methodleveraged <cls> token as a classification embedding vector for the Transformermodel to effectively capture the representation of brain network. Second, Weproposed a pre-training architecture with two pre-training strategies forBrainNPT model to leverage unlabeled brain network data to learn the structureinformation of brain networks. The results of classification experimentsdemonstrated the BrainNPT model without pre-training achieved the bestperformance with the state-of-the-art models, and the BrainNPT model withpre-training strongly outperformed the state-of-the-art models. Thepre-training BrainNPT model improved 8.75% of accuracy compared with the modelwithout pre-training. We further compared the pre-training strategies, analyzedthe influence of the parameters of the model, and interpreted the fine-tunedmodel.", "output": "BrainNPT: Pre-training of Transformer networks for brain network classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work proposes an algorithm for explicitly constructing a pair of neuralnetworks that linearize and reconstruct an embedded submanifold, from finitesamples of this manifold. Our such-generated neural networks, called flatteningnetworks (FlatNet), are theoretically interpretable, computationally feasibleat scale, and generalize well to test data, a balance not typically found inmanifold-based learning methods. We present empirical results and comparisonsto other models on synthetic high-dimensional manifold data and 2D image data.Our code is publicly available.", "output": "Representation Learning via Manifold Flattening and Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the field of antibody engineering, an essential task is to design a novelantibody whose paratopes bind to a specific antigen with correct epitopes.Understanding antibody structure and its paratope can facilitate a mechanisticunderstanding of its function. Therefore, antibody structure prediction fromits sequence alone has always been a highly valuable problem for de novoantibody design. AlphaFold2, a breakthrough in the field of structural biology,provides a solution to predict protein structure based on protein sequences andcomputationally expensive coevolutionary multiple sequence alignments (MSAs).However, the computational efficiency and undesirable prediction accuracy ofantibodies, especially on the complementarity-determining regions (CDRs) ofantibodies limit their applications in the industrially high-throughput drugdesign. To learn an informative representation of antibodies, we employed adeep antibody language model (ALM) on curated sequences from the observedantibody space database via a transformer model. We also developed a novelmodel named xTrimoABFold to predict antibody structure from antibody sequencebased on the pretrained ALM as well as efficient evoformers and structuralmodules. The model was trained end-to-end on the antibody structures in PDB byminimizing the ensemble loss of domain-specific focal loss on CDR and theframe-aligned point loss. xTrimoABFold outperforms AlphaFold2 and other proteinlanguage model based SOTAs, e.g., OmegaFold, HelixFold-Single, and IgFold witha large significant margin (30+% improvement on RMSD) while performing 151times faster than AlphaFold2. To the best of our knowledge, xTrimoABFoldachieved state-of-the-art antibody structure prediction. Its improvement inboth accuracy and efficiency makes it a valuable tool for de novo antibodydesign and could make further improvements in immuno-theory.", "output": "xTrimoABFold: De novo Antibody Structure Prediction without MSA."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a method for learning topology-preserving data representations(dimensionality reduction). The method aims to provide topological similaritybetween the data manifold and its latent representation via enforcing thesimilarity in topological features (clusters, loops, 2D voids, etc.) and theirlocalization. The core of the method is the minimization of the RepresentationTopology Divergence (RTD) between original high-dimensional data andlow-dimensional representation in latent space. RTD minimization providescloseness in topological features with strong theoretical guarantees. Wedevelop a scheme for RTD differentiation and apply it as a loss term for theautoencoder. The proposed method \"RTD-AE\" better preserves the global structureand topology of the data manifold than state-of-the-art competitors as measuredby linear correlation, triplet distance ranking accuracy, and Wassersteindistance between persistence barcodes.", "output": "Learning Topology-Preserving Data Representations."}] \ No newline at end of file